Episodes
本期的 7 篇论文如下: [00:33] ⚡ Continuous Speculative Decoding for Autoregressive Image Generation(自回归图像生成的连续推测解码) [01:14] 📚 RedPajama: an Open Dataset for Training Large Language Models(红睡衣:用于训练大型语言模型的开放数据集) [01:58] 🤖 Soft Robotic Dynamic In-Hand Pen Spinning(软体机器人动态手内笔旋转) [02:39] 🚀 ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements(ITACLIP:通过图像、文本和架构增强提升无训练语义分割) [03:13] 🔒 Building Trust: Foundations of Security, Safety and Transparency in...
Published 11/20/24
本期的 16 篇论文如下: [00:25] 📱 BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices(BlueLM-V-3B:移动设备上多模态大语言模型的算法与系统协同设计) [01:06] 🌍 Generative World Explorer(生成世界探索者) [01:43] 🔍 Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering(搜索、验证与反馈:通过验证器工程实现下一代基础模型的后训练范式) [02:24] 🎥 AnimateAnything: Consistent and Controllable Animation for Video Generation(动画任何事物:视频生成的连贯可控动画) [03:08] 🧠 Top-$nσ$:...
Published 11/19/24
本期的 6 篇论文如下: [00:28] 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step(LLaVA-o1:让视觉语言模型逐步推理) [01:14] 🎨 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement(区域感知文本到图像生成:硬绑定与软优化) [01:51] 🌐 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation(高斯任意:交互式点云潜在扩散用于3D生成) [02:25] 🌅 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use(GUI代理的黎明:基于Claude 3.5计算机使用的初步案例研究) [03:00] 📖 Number it: Temporal Grounding Videos like...
Published 11/18/24
本期的 5 篇论文如下: [00:44] TOP1(🔥54) | 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像无训练对象插入) [02:31] TOP2(🔥44) | 🤖 Large Language Models Can Self-Improve in Long-context Reasoning(大型语言模型在长上下文推理中的自我改进) [04:15] TOP3(🔥43) | 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一) [06:12] TOP4(🔥42) | 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist...
Published 11/16/24
本期的 7 篇论文如下: [00:27] ✨ MagicQuill: An Intelligent Interactive Image Editing System(魔法羽毛笔:智能交互式图像编辑系统) [01:15] 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一) [01:50] 💾 Cut Your Losses in Large-Vocabulary Language Models(在大词汇量语言模型中减少损失) [02:22] 🏥 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?(临床基准:LLMs能否在临床预测中超越传统ML模型?) [03:02] 🤖 Hermes: A Large Language Model Framework on the Journey to Autonomous...
Published 11/15/24
本期的 7 篇论文如下: [00:26] 🤖 Large Language Models Can Self-Improve in Long-context Reasoning(大型语言模型在长上下文推理中的自我改进) [01:09] 🎥 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation(EgoVid-5M:用于第一人称视频生成的大规模视频动作数据集) [01:58] 🔍 Direct Preference Optimization Using Sparse Feature-Level Constraints(利用稀疏特征级约束进行直接偏好优化) [02:37] 🇫 CamemBERT 2.0: A Smarter French Language Model Aged to Perfection(CamemBERT 2.0:更智能的法语语言模型,完美成熟) [03:18] 🧠 Can sparse autoencoders be used to decompose and...
Published 11/14/24
本期的 6 篇论文如下: [00:28] 🔍 SAMPart3D: Segment Any Part in 3D Objects(SAMPart3D:三维物体任意部分分割) [01:06] 🌐 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation(JanusFlow:统一自回归与校正流的多模态理解与生成) [01:42] 🤔 Stronger Models are NOT Stronger Teachers for Instruction Tuning(更强的模型并非更强的指令调优教师) [02:21] 🌐 Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings(小波潜在扩散(WaLa):具有紧凑小波编码的十亿参数3D生成模型) [03:02] 📚 BLIP3-KALE:...
Published 11/13/24
本期的 14 篇论文如下: [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像中无训练对象插入) [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models(中文简单问答:大语言模型的中文事实性评估) [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning...
Published 11/12/24
本期的 6 篇论文如下: [00:30] ⚖ Balancing Pipeline Parallelism with Vocabulary Parallelism(平衡流水线并行与词汇并行) [01:15] 🎮 StdGEN: Semantic-Decomposed 3D Character Generation from Single Images(StdGEN:从单张图像生成语义分解的3D角色) [01:56] 🔄 DELIFT: Data Efficient Language model Instruction Fine Tuning(DELIFT:数据高效语言模型指令微调) [02:29] 🧪 Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study(大型语言模型参数高效微调用于单元测试生成:一项实证研究) [03:06] 🧠 LLM2CLIP: Powerful Language Model Unlock Richer...
Published 11/11/24
本期的 5 篇论文如下: [00:38] TOP1(🔥73) | 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱) [02:40] TOP2(🔥53) | 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制) [04:22] TOP3(🔥52) | 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识) [06:44] TOP4(🔥47) | ⚡ BitNet a4.8: 4-bit Activations for 1-bit...
Published 11/09/24
本期的 14 篇论文如下: [00:25] 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱) [01:03] 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制) [01:46] ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活) [02:25] 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion(DimensionX:从单张图像生成可控视频扩散的3D和4D场景) [03:04] 🤖...
Published 11/08/24
本期的 4 篇论文如下: [00:28] 🔍 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination(文本与图像均泄露!多模态大语言模型数据污染的系统分析) [01:07] 🤖 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level(大型语言模型协调结构化推理达到Kaggle大师级别) [01:53] 🧠 Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models(多项式组合激活函数:释放大型语言模型的动态) [02:28] 🔄 Self-Consistency Preference Optimization(自一致性偏好优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
Published 11/07/24
本期的 11 篇论文如下: [00:30] 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识) [01:12] 🧬 LLaMo: Large Language Model-based Molecular Graph Assistant(基于大型语言模型的分子图助手) [01:52] 🤖 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution(DeeR-VLA:动态推理多模态大语言模型以实现高效机器人执行) [02:28] 🤖 Sample-Efficient Alignment for LLMs(LLM的高效对齐方法) [03:01] 🚦 Controlling Language and Diffusion Models by Transporting...
Published 11/06/24
本期的 17 篇论文如下: [00:26] 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents(AndroidLab:Android自主代理的训练与系统基准测试) [01:15] 🌐 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning(WebRL:通过自进化在线课程强化学习训练LLM网络代理) [01:55] 🌐 Training-free Regional Prompting for Diffusion Transformers(无需训练的扩散变换器区域提示) [02:36] 🌍 Survey of Cultural Awareness in Language Models: Text and Beyond(语言模型中的文化意识调查:文本与超越) [03:15] 🤖 Hunyuan-Large: An Open-Source MoE Model...
Published 11/05/24
本期的 17 篇论文如下: [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents(OS-ATLAS:通用GUI代理的基础动作模型) [01:07] ⚙ Constant Acceleration Flow(恒定加速度流) [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models(番茄:评估多模态基础模型在视觉时间推理能力) [02:33] 🎨 Randomized Autoregressive Visual Generation(随机自回归视觉生成) [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation(边学习边适应:通过智能工具使用适应性将LLMs应用于科学问题) [03:50] 📚...
Published 11/04/24
本期的 5 篇论文如下: [00:41] TOP1(🔥191) | 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities(CLEAR:文本与视觉模态中的字符遗忘) [02:58] TOP2(🔥70) | 🤖 GPT-4o System Card(GPT-4o系统卡片) [04:50] TOP3(🔥50) | 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders(解构SDXL Turbo:使用稀疏自编码器解释文本到图像模型) [06:53] TOP4(🔥49) | 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation(CORAL:多轮对话增强生成基准测试) [08:44] TOP5(🔥48) | 🚀 ROCKET-1: Master Open-World Interaction with...
Published 11/02/24
本期的 11 篇论文如下: [00:27] 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders(解构SDXL Turbo:使用稀疏自编码器解释文本到图像模型) [01:05] 🧠 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective(LLMs训练中快速与慢速思考的层级差异:梯度视角) [01:43] 🔍 A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents(基于指针网络的多标签多类别意图联合提取与检测方法) [02:23] 🔄 Constraint Back-translation Improves Complex Instruction Following of Large...
Published 11/01/24
本期的 5 篇论文如下: [00:29] 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation(CORAL:多轮对话增强生成基准测试) [01:09] 🤖 A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks(大型递归动作模型:xLSTM为机器人任务实现快速推理) [01:50] 🔍 Stealing User Prompts from Mixture of Experts(从混合专家模型中窃取用户提示) [02:26] 🩺 AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels(自动医疗信息检索:无需相关标签的有效零样本检索) [02:58] 🔄 TokenFormer: Rethinking Transformer Scaling with...
Published 10/31/24
本期的 8 篇论文如下: [00:33] 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities(CLEAR:文本与视觉模态中的字符遗忘) [01:10] 🤖 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions(AutoKaggle:一种用于自主数据科学竞赛的多智能体框架) [01:46] 🤖 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization(社交GPT:通过贪婪段优化提示LLMs进行社交关系推理) [02:26] 🌐 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and...
Published 10/30/24
本期的 17 篇论文如下: [00:24] 🇵 Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation(Bielik 7B v0.1:波兰语言模型——开发、洞察与评估) [01:00] 🤖 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant(AgentStore:可扩展的异构代理作为专业化通才计算机助手集成) [01:39] 🤖 GPT-4o System Card(GPT-4o系统卡片) [02:21] 📄 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction(文档解析揭秘:结构化信息提取的技术、挑战与前景) [03:08] 🤖 LongReward: Improving...
Published 10/29/24
本期的 13 篇论文如下: [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting(ROCKET-1:利用视觉-时间上下文提示掌握开放世界交互) [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion(基于每标记潜在扩散的连续语音合成) [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images(教授多模态大语言模型理解心电图图像) [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data(无限多模态:通过大规模高质量指令数据扩展多模态性能) [03:23] ⚡ FasterCache: Training-Free Video...
Published 10/28/24
本期的 5 篇论文如下: [00:44] TOP1(🔥79) | ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors(节俭NeRF:无学习先验的少样本新视角合成快速收敛) [02:42] TOP2(🔥60) | 🌳 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree(SAM2Long:通过无训练记忆树增强SAM 2以实现长视频分割) [04:19] TOP3(🔥58) | 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss(打破内存壁垒:对比损失的近无限批量规模扩展) [06:11] TOP4(🔥55) | 🤖 CompassJudger-1: All-in-one Judge Model Helps Model...
Published 10/26/24
本期的 21 篇论文如下: [00:26] 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss(打破内存壁垒:对比损失的近无限批量规模扩展) [01:09] 🔄 LOGO -- Long cOntext aliGnment via efficient preference Optimization(LOGO -- 通过高效偏好优化实现长上下文对齐) [01:45] 🧠 Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch(从零开始释放LLMs的推理能力:可扩展的问题合成方法) [02:30] 🤔 Can Knowledge Editing Really Correct Hallucinations?(知识编辑真的能纠正幻觉吗?) [03:17] 🎮 Unbounded: A Generative Infinite Game of Character...
Published 10/25/24
本期的 10 篇论文如下: [00:25] 🖼 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models(多图像增强直接偏好优化:大型视觉语言模型) [01:09] 🌍 WorldSimBench: Towards Video Generation Models as World Simulators(世界模拟器:迈向视频生成模型作为世界模拟器) [01:47] 🌊 Scaling Diffusion Language Models via Adaptation from Autoregressive Models(通过自回归模型适应扩展扩散语言模型) [02:20] 📱 Lightweight Neural App Control(轻量级神经应用控制) [03:01] 🏠 ARKit LabelMaker: A New Scale for Indoor 3D Scene...
Published 10/24/24