Description
本期的 13 篇论文如下:
[00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互)
[01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成)
[01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像)
[02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能)
[03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality(FasterCache:无训练视频扩散模型加速与高质量生成)
[03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark(大规模多任务音频理解与推理基准)
[04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization(大型语言模型的计数能力及其对分词的影响)
[05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning(通过先决学习利用虚构合成数据提升LLM事实性)
[05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection(反射-基准:通过反射探测AI智能)
[06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback(混合偏好:学习路由实例以进行人机反馈)
[06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration(利用未标注的先验数据进行高效在线探索)
[07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance(LLM是否优于报告?检测标签错误并减轻其对模型性能的影响)
[08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling(基于图神经网络的动态三维高斯跟踪用于神经动力学建模)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递