Description
本期的 18 篇论文如下:
[00:28] 🖥 ShowUI: One Vision-Language-Action Model for GUI Visual Agent(ShowUI:一种用于GUI视觉代理的视觉-语言-动作模型)
[01:08] 🎥 Pathways on the Image Manifold: Image Editing via Video Generation(图像流形上的路径:通过视频生成进行图像编辑)
[01:45] ⭐ Star Attention: Efficient LLM Inference over Long Sequences(星型注意力:长序列上高效的大型语言模型推理)
[02:24] ⚡ Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration(重新思考MLLMs中的Token减少:迈向无训练加速的统一范式)
[03:01] 📊 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs(MME-Survey: 多模态大语言模型评估的综合调查)
[03:44] 🎨 TEXGen: a Generative Diffusion Model for Mesh Textures(TEXGen:一种用于网格纹理的生成扩散模型)
[04:27] 🎨 SketchAgent: Language-Driven Sequential Sketch Generation(SketchAgent:语言驱动的顺序草图生成)
[05:11] 🔄 Learning 3D Representations from Procedural 3D Programs(从程序化3D程序中学习3D表示)
[05:55] 🧠 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models(VLRewardBench:视觉语言生成奖励模型的挑战性基准)
[06:50] 🔄 SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE(SAR3D:通过多尺度3D VQVAE实现自回归3D物体生成与理解)
[07:27] 🖼 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity(精细标题:聚焦任意粒度的组合图像描述)
[08:09] 🎨 DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting(DreamMix:解耦对象属性以增强定制化图像修复的可编辑性)
[08:41] 📹 SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis(SALOVA:长视频助手在长视频分析中的目标检索与路由)
[09:19] 📉 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens(低比特量化有利于未充分训练的大型语言模型:基于100万亿训练标记的量化大型语言模型缩放规律)
[10:05] 🧬 MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts(MolReFlect:面向分子与文本之间细粒度对齐的研究)
[10:40] 👕 Controllable Human Image Generation with Personalized Multi-Garments(个性化多服装的可控人体图像生成)
[11:12] 🤖 Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)(视觉反图灵测试(VCT²):发现AI生成图像检测的挑战并引入视觉AI指数(V_AI))
[11:55] 🎥 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation(锚点创作者:通过人-物交互视频生成动画网络锚点推广产品)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递