Ep. 246 - Part 3 - June 12, 2024 - Listen - TechcraftingAI

Ep. 246 - Part 3 - June 12, 2024

Listen now

Description

ArXiv Computer Vision research for Wednesday, June 12, 2024. 00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition 02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio 03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction 05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor 06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze 08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors 09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation 11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks 12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos 14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text 16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images 18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models 19:58: Coherent Optical Modems for Full-Wavefield Lidar 21:32: Transformation-Dependent Adversarial Attacks 22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement 24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices 25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery 27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement 28:51: Real2Code: Reconstruct Articulated Objects via Code Generation 30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models 31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation 33:12: What If We Recaption Billions of Web Images with LLaMA-3? 34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images 36:07: Enhancing End-to-End Autonomous Driving with Latent World Model 37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation 38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models 40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models 42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

More Episodes

See all »

Ep. 247 - Part 3 - June 13, 2024

ArXiv Computer Vision research for Thursday, June 13, 2024. 00:21: LRM-Zero: Training Large Reconstruction Models with Synthesized Data 01:56: Scale-Invariant Monocular Depth Estimation via SSI Depth 03:08: GGHead: Fast and Generalizable 3D Gaussian Heads 04:55: Multiagent Multitraversal...

Published 06/15/24

Ep. 247 - Part 2 - June 13, 2024

ArXiv Computer Vision research for Thursday, June 13, 2024. 00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance 02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques 03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model...

Published 06/15/24

TechcraftingAI Computer Vision

Published 06/15/24