2024.10.28 每日AI论文 | 视觉-时间提示提升交互,连续扩散模型优化语音合成
Listen now
Description
本期的 13 篇论文如下: [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互) [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成) [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像) [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能) [03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality(FasterCache:无训练视频扩散模型加速与高质量生成) [03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark(大规模多任务音频理解与推理基准) [04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization(大型语言模型的计数能力及其对分词的影响) [05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning(通过先决学习利用虚构合成数据提升LLM事实性) [05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection(反射-基准:通过反射探测AI智能) [06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback(混合偏好:学习路由实例以进行人机反馈) [06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration(利用未标注的先验数据进行高效在线探索) [07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance(LLM是否优于报告?检测标签错误并减轻其对模型性能的影响) [08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling(基于图神经网络的动态三维高斯跟踪用于神经动力学建模) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
More Episodes
本期的 8 篇论文如下: [00:28] ⚡ SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(SageAttention2技术报告:用于即插即用推理加速的精确4比特注意力机制) [01:10] 📹 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models(VBench++:全面且多功能的视频生成模型基准套件) [01:51] 🎮...
Published 11/21/24
本期的 7 篇论文如下: [00:33] ⚡ Continuous Speculative Decoding for Autoregressive Image Generation(自回归图像生成的连续推测解码) [01:14] 📚 RedPajama: an Open Dataset for Training Large Language Models(红睡衣:用于训练大型语言模型的开放数据集) [01:58] 🤖 Soft Robotic Dynamic In-Hand Pen Spinning(软体机器人动态手内笔旋转) [02:39] 🚀 ITACLIP: Boosting...
Published 11/20/24