2024.10.28 每日AI论文 | 视觉-时间提示提升交互，连续扩散模型优化语音合成 - Listen -

2024.10.28 每日AI论文 | 视觉-时间提示提升交互，连续扩散模型优化语音合成

Listen now

Description

本期的 13 篇论文如下： [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting（ROCKET-1：利用视觉-时间上下文提示掌握开放世界交互） [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion（基于每标记潜在扩散的连续语音合成） [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images（教授多模态大语言模型理解心电图图像） [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data（无限多模态：通过大规模高质量指令数据扩展多模态性能） [03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality（FasterCache：无训练视频扩散模型加速与高质量生成） [03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark（大规模多任务音频理解与推理基准） [04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization（大型语言模型的计数能力及其对分词的影响） [05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning（通过先决学习利用虚构合成数据提升LLM事实性） [05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection（反射-基准：通过反射探测AI智能） [06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback（混合偏好：学习路由实例以进行人机反馈） [06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration（利用未标注的先验数据进行高效在线探索） [07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance（LLM是否优于报告？检测标签错误并减轻其对模型性能的影响） [08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling（基于图神经网络的动态三维高斯跟踪用于神经动力学建模）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

More Episodes

See all »

2024.11.21 每日AI论文 | 4比特注意力加速显著，视频生成基准全面评估。

本期的 8 篇论文如下： [00:28] ⚡ SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration（SageAttention2技术报告：用于即插即用推理加速的精确4比特注意力机制） [01:10] 📹 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models（VBench++：全面且多功能的视频生成模型基准套件） [01:51] 🎮...

Published 11/21/24

2024.11.20 每日AI论文 | 图像生成加速，语言模型数据集创新

本期的 7 篇论文如下： [00:33] ⚡ Continuous Speculative Decoding for Autoregressive Image Generation（自回归图像生成的连续推测解码） [01:14] 📚 RedPajama: an Open Dataset for Training Large Language Models（红睡衣：用于训练大型语言模型的开放数据集） [01:58] 🤖 Soft Robotic Dynamic In-Hand Pen Spinning（软体机器人动态手内笔旋转） [02:39] 🚀 ITACLIP: Boosting...

Published 11/20/24

HuggingFace 每日AI论文速递

Published 11/20/24