Episodes
ArXiv Computer Vision research for Thursday, June 13, 2024.
00:21: LRM-Zero: Training Large Reconstruction Models with Synthesized Data
01:56: Scale-Invariant Monocular Depth Estimation via SSI Depth
03:08: GGHead: Fast and Generalizable 3D Gaussian Heads
04:55: Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
06:34: Towards Vision-Language Geo-Foundation Model: A Survey
08:11: SimGen: Simulator-conditioned Driving Scene Generation
09:44: Exploring the Spectrum of...
Published 06/15/24
ArXiv Computer Vision research for Thursday, June 13, 2024.
00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques
03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era
06:41: Auto-Vocabulary Segmentation for LiDAR Points
07:30: AdaRevD: Adaptive Patch...
Published 06/15/24
ArXiv Computer Vision research for Thursday, June 13, 2024.
00:21: FouRA: Fourier Low Rank Adaptation
01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning
04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
06:46: ToSA: Token Selective Attention for Efficient Vision Transformers
08:00: Computer vision-based model for detecting...
Published 06/15/24
ArXiv Computer Vision research for Wednesday, June 12, 2024.
00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio
03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor
06:58:...
Published 06/13/24
ArXiv Computer Vision research for Wednesday, June 12, 2024.
00:21: From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization
01:44: Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
03:20: Adversarial Patch for 3D Local Feature Extractor
04:00: Valeo4Cast: A Modular Approach to End-to-End Forecasting
05:38: The impact of deep learning aid on the workload and interpretation accuracy of...
Published 06/13/24
ArXiv Computer Vision research for Wednesday, June 12, 2024.
00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image
01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification
04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts
05:52: Robust 3D Face Alignment with Multi-Path Neural...
Published 06/13/24
ArXiv Computer Vision research for Tuesday, June 11, 2024.
00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses
01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration
02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
06:01: 4Real: Towards Photorealistic 4D Scene Generation via...
Published 06/13/24
ArXiv Computer Vision research for Tuesday, June 11, 2024.
00:21: NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
01:27: Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph
03:14: T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text
04:45: Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images
06:23: FaceGPT: Self-supervised Learning to Chat about 3D...
Published 06/13/24
ArXiv Computer Vision research for Tuesday, June 11, 2024.
00:20: Explaining Representation Learning with Perceptual Components
01:28: Optimal Matrix-Mimetic Tensor Algebras via Variable Projection
03:03: Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis
04:24: Neural Visibility Field for Uncertainty-Driven Active Mapping
05:21: Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target...
Published 06/13/24
ArXiv Computer Vision research for Monday, June 10, 2024.
00:20: ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
01:59: Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
03:44: Vript: A Video Is Worth Thousands of Words
05:38: FRAG: Frequency Adapting Group for Diffusion Video Editing
06:50: Synthesizing Efficient Data with Diffusion Models for Person Re-Identification...
Published 06/11/24
ArXiv Computer Vision research for Monday, June 10, 2024.
00:20: DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
01:41: NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks
03:22: Vehicle Vectors and Traffic Patterns from Planet Imagery
04:15: A Guide to Stochastic Optimisation for Large-Scale Inverse Problems
05:37: Cascading Unknown Detection with Known Classification for Open Set Recognition
06:42: Latent Directions: A...
Published 06/11/24
ArXiv Computer Vision research for Sunday, June 09, 2024.
00:20: ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving
02:23: Unified Text-to-Image Generation and Retrieval
03:51: F-LMM: Grounding Frozen Large Multimodal Models
05:34: Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation
07:43: BOSC: A toolbox for aerial imagery mapping
08:27: Mamba YOLO: SSMs-Based YOLO For Object Detection
10:12: Solution...
Published 06/11/24
ArXiv Computer Vision research for Sunday, June 09, 2024.
00:20: PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
01:47: Anomaly Multi-classification in Industrial Scenarios: Transferring Few-shot Learning to a New Task
02:51: GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
04:51: Visual Prompt Tuning in Null Space for Continual Learning
06:20: SRC-Net: Bi-Temporal Spatial Relationship Concerned Network for Change...
Published 06/11/24
ArXiv Computer Vision research for Saturday, June 08, 2024.
00:20: Blurry-Consistency Segmentation Framework with Selective Stacking on Differential Interference Contrast 3D Breast Cancer Spheroid
01:31: 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation
03:01: Metric Convolutions: A Unifying Theory to Adaptive Convolutions
04:13: Layered Image...
Published 06/11/24
ArXiv Computer Vision research for Friday, June 07, 2024.
00:21: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
01:52: AGBD: A Global-scale Biomass Dataset
03:30: MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
04:52: Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
06:03: Leveraging Activations for Superpixel Explanations
07:02: Joint Spatial-Temporal Modeling and Contrastive Learning for ...
Published 06/10/24
ArXiv Computer Vision research for Friday, June 07, 2024.
00:20: Image Processing Based Forest Fire Detection
01:08: STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
03:05: UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection
04:47: UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping
06:14: SMART: Scene-motion-aware human action recognition framework for mental disorder group
08:12:...
Published 06/10/24
ArXiv Computer Vision research for Thursday, June 06, 2024.
00:20: M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data
02:34: Understanding Information Storage and Transfer in Multi-modal Large Language Models
04:27: Conv-INR: Convolutional Implicit Neural Representation for Multimodal Visual Signals
06:01: Localized Gaussian Point Management
07:59: A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation
09:25:...
Published 06/07/24
ArXiv Computer Vision research for Thursday, June 06, 2024.
00:20: ReDistill: Residual Encoded Distillation for Peak Memory Reduction
01:58: Instance Segmentation and Teeth Classification in Panoramic X-rays
03:34: Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge
04:44: Amortized Equation Discovery in Hybrid Dynamical Systems
05:57: Monocular Localization with Semantics Map for Autonomous Vehicles
07:22: From operculum and body tail movements to different...
Published 06/07/24
ArXiv Computer Vision research for Wednesday, June 05, 2024.
00:20: Image Copy-Move Forgery Detection and Localization Scheme: How to Avoid Missed Detection and False Alarm
01:52: VWise: A novel benchmark for evaluating scene classification for vehicular applications
03:03: Text-to-Image Rectified Flow as Plug-and-Play Priors
04:25: L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration
06:17: Learning Visual Prompts for Guiding the...
Published 06/06/24
ArXiv Computer Vision research for Wednesday, June 05, 2024.
00:20: Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision
02:03: A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
03:42: Exploiting LMM-based knowledge for image classification tasks
04:37: EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
06:09: EpidermaQuant: Unsupervised detection and quantification of epidermal differentiation...
Published 06/06/24
ArXiv Computer Vision research for Wednesday, June 05, 2024.
00:20: A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries
01:26: Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
02:40: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
04:14: Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models
06:09: P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction...
Published 06/06/24
ArXiv Computer Vision research for Tuesday, June 04, 2024.
00:20: FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning
02:06: EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections
03:14: Learning to Edit Visual Programs with Self-Supervision
04:15: Low-Rank Adaption on Transformer-based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images
06:12: WE-GS: An In-the-wild Efficient 3D...
Published 06/05/24
ArXiv Computer Vision research for Tuesday, June 04, 2024.
00:20: Plug-and-Play Diffusion Distillation
01:29: Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
02:33: The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
04:03: Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization
05:38: Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via...
Published 06/05/24
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering
01:35: An expert-driven data generation pipeline for histological images
02:26: Sensitivity-Informed Augmentation for Robust Segmentation
04:10: EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
05:44: ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation...
Published 06/04/24