All episodes of HuggingFace 每日AI论文速递

Episodes

2024.11.20 每日AI论文 | 图像生成加速，语言模型数据集创新

本期的 7 篇论文如下： [00:33] ⚡ Continuous Speculative Decoding for Autoregressive Image Generation（自回归图像生成的连续推测解码） [01:14] 📚 RedPajama: an Open Dataset for Training Large Language Models（红睡衣：用于训练大型语言模型的开放数据集） [01:58] 🤖 Soft Robotic Dynamic In-Hand Pen Spinning（软体机器人动态手内笔旋转） [02:39] 🚀 ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements（ITACLIP：通过图像、文本和架构增强提升无训练语义分割） [03:13] 🔒 Building Trust: Foundations of Security, Safety and Transparency in...

Published 11/20/24

HuggingFace 每日AI论文速递

Published 11/20/24

2024.11.19 每日AI论文 | 移动设备高效部署，具身AI虚拟探索

本期的 16 篇论文如下： [00:25] 📱 BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices（BlueLM-V-3B：移动设备上多模态大语言模型的算法与系统协同设计） [01:06] 🌍 Generative World Explorer（生成世界探索者） [01:43] 🔍 Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering（搜索、验证与反馈：通过验证器工程实现下一代基础模型的后训练范式） [02:24] 🎥 AnimateAnything: Consistent and Controllable Animation for Video Generation（动画任何事物：视频生成的连贯可控动画） [03:08] 🧠 Top-$nσ$:...

Published 11/19/24

2024.11.18 每日AI论文 | 视觉语言模型推理提升，图像生成精细控制优化

本期的 6 篇论文如下： [00:28] 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step（LLaVA-o1：让视觉语言模型逐步推理） [01:14] 🎨 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement（区域感知文本到图像生成：硬绑定与软优化） [01:51] 🌐 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation（高斯任意：交互式点云潜在扩散用于3D生成） [02:25] 🌅 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use（GUI代理的黎明：基于Claude 3.5计算机使用的初步案例研究） [03:00] 📖 Number it: Temporal Grounding Videos like...

Published 11/18/24

【周末特辑】11月第3周最火AI论文 | Add-it提升图像插入性能，LLMs实现长上下文自我改进。

本期的 5 篇论文如下： [00:44] TOP1(🔥54) | 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models（Add-it：基于预训练扩散模型的图像无训练对象插入） [02:31] TOP2(🔥44) | 🤖 Large Language Models Can Self-Improve in Long-context Reasoning（大型语言模型在长上下文推理中的自我改进） [04:15] TOP3(🔥43) | 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models（LLaMA-Mesh：将3D网格生成与语言模型统一） [06:12] TOP4(🔥42) | 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist...

Published 11/16/24

2024.11.15 每日AI论文 | 高效图像编辑，3D网格生成

本期的 7 篇论文如下： [00:27] ✨ MagicQuill: An Intelligent Interactive Image Editing System（魔法羽毛笔：智能交互式图像编辑系统） [01:15] 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models（LLaMA-Mesh：将3D网格生成与语言模型统一） [01:50] 💾 Cut Your Losses in Large-Vocabulary Language Models（在大词汇量语言模型中减少损失） [02:22] 🏥 ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?（临床基准：LLMs能否在临床预测中超越传统ML模型？） [03:02] 🤖 Hermes: A Large Language Model Framework on the Journey to Autonomous...

Published 11/15/24

2024.11.14 每日AI论文 | LLMs自我改进显著，EgoVid-5M数据集创新。

本期的 7 篇论文如下： [00:26] 🤖 Large Language Models Can Self-Improve in Long-context Reasoning（大型语言模型在长上下文推理中的自我改进） [01:09] 🎥 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation（EgoVid-5M：用于第一人称视频生成的大规模视频动作数据集） [01:58] 🔍 Direct Preference Optimization Using Sparse Feature-Level Constraints（利用稀疏特征级约束进行直接偏好优化） [02:37] 🇫 CamemBERT 2.0: A Smarter French Language Model Aged to Perfection（CamemBERT 2.0：更智能的法语语言模型，完美成熟） [03:18] 🧠 Can sparse autoencoders be used to decompose and...

Published 11/14/24

2024.11.13 每日AI论文 | 三维物体分割新框架，多模态理解生成模型

本期的 6 篇论文如下： [00:28] 🔍 SAMPart3D: Segment Any Part in 3D Objects（SAMPart3D：三维物体任意部分分割） [01:06] 🌐 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation（JanusFlow：统一自回归与校正流的多模态理解与生成） [01:42] 🤔 Stronger Models are NOT Stronger Teachers for Instruction Tuning（更强的模型并非更强的指令调优教师） [02:21] 🌐 Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings（小波潜在扩散（WaLa）：具有紧凑小波编码的十亿参数3D生成模型） [03:02] 📚 BLIP3-KALE:...

Published 11/13/24

2024.11.12 每日AI论文 | 对象无缝插入，通用编辑模型提升精度

本期的 14 篇论文如下： [00:23] 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models（Add-it：基于预训练扩散模型的图像中无训练对象插入） [01:05] 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision（全能编辑器：通过专家监督构建图像编辑通用模型） [01:49] 📚 Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models（中文简单问答：大语言模型的中文事实性评估） [02:27] 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning...

Published 11/12/24

2024.11.11 每日AI论文 | 提升训练吞吐量，减少内存使用

本期的 6 篇论文如下： [00:30] ⚖ Balancing Pipeline Parallelism with Vocabulary Parallelism（平衡流水线并行与词汇并行） [01:15] 🎮 StdGEN: Semantic-Decomposed 3D Character Generation from Single Images（StdGEN：从单张图像生成语义分解的3D角色） [01:56] 🔄 DELIFT: Data Efficient Language model Instruction Fine Tuning（DELIFT：数据高效语言模型指令微调） [02:29] 🧪 Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study（大型语言模型参数高效微调用于单元测试生成：一项实证研究） [03:06] 🧠 LLM2CLIP: Powerful Language Model Unlock Richer...

Published 11/11/24

【周末特辑】11月第2周最火AI论文 | 开放编码器加速代码AI研究，ReCapture提升视频生成质量。

本期的 5 篇论文如下： [00:38] TOP1(🔥73) | 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models（开放编码器：顶级代码大语言模型的开放食谱） [02:40] TOP2(🔥53) | 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning（ReCapture：使用掩码视频微调生成用户提供视频的生成性摄像机控制） [04:22] TOP3(🔥52) | 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems（HtmlRAG：在RAG系统中，HTML比纯文本更适合建模检索知识） [06:44] TOP4(🔥47) | ⚡ BitNet a4.8: 4-bit Activations for 1-bit...

Published 11/09/24

2024.11.08 每日AI论文 | 开放编码器提升代码生成，ReCapture优化视频轨迹

本期的 14 篇论文如下： [00:25] 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models（开放编码器：顶级代码大语言模型的开放食谱） [01:03] 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning（ReCapture：使用掩码视频微调生成用户提供视频的生成性摄像机控制） [01:46] ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs（BitNet a4.8：1位大语言模型的4位激活） [02:25] 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion（DimensionX：从单张图像生成可控视频扩散的3D和4D场景） [03:04] 🤖...

Published 11/08/24

2024.11.07 每日AI论文 | 数据污染影响模型评估，结构化推理提升LLMs性能

本期的 4 篇论文如下： [00:28] 🔍 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination（文本与图像均泄露！多模态大语言模型数据污染的系统分析） [01:07] 🤖 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level（大型语言模型协调结构化推理达到Kaggle大师级别） [01:53] 🧠 Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models（多项式组合激活函数：释放大型语言模型的动态） [02:28] 🔄 Self-Consistency Preference Optimization（自一致性偏好优化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

Published 11/07/24

2024.11.06 每日AI论文 | HTML提升RAG性能，分子图助手优化多模态任务

本期的 11 篇论文如下： [00:30] 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems（HtmlRAG：在RAG系统中，HTML比纯文本更适合建模检索知识） [01:12] 🧬 LLaMo: Large Language Model-based Molecular Graph Assistant（基于大型语言模型的分子图助手） [01:52] 🤖 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution（DeeR-VLA：动态推理多模态大语言模型以实现高效机器人执行） [02:28] 🤖 Sample-Efficient Alignment for LLMs（LLM的高效对齐方法） [03:01] 🚦 Controlling Language and Diffusion Models by Transporting...

Published 11/06/24

2024.11.05 每日AI论文 | AndroidLab提升代理性能，WebRL优化网络任务表现。

本期的 17 篇论文如下： [00:26] 🤖 AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents（AndroidLab：Android自主代理的训练与系统基准测试） [01:15] 🌐 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning（WebRL：通过自进化在线课程强化学习训练LLM网络代理） [01:55] 🌐 Training-free Regional Prompting for Diffusion Transformers（无需训练的扩散变换器区域提示） [02:36] 🌍 Survey of Cultural Awareness in Language Models: Text and Beyond（语言模型中的文化意识调查：文本与超越） [03:15] 🤖 Hunyuan-Large: An Open-Source MoE Model...

Published 11/05/24

2024.11.04 每日AI论文 | OS-ATLAS提升GUI代理性能，CAF优化生成模型效率。

本期的 17 篇论文如下： [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents（OS-ATLAS：通用GUI代理的基础动作模型） [01:07] ⚙ Constant Acceleration Flow（恒定加速度流） [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models（番茄：评估多模态基础模型在视觉时间推理能力） [02:33] 🎨 Randomized Autoregressive Visual Generation（随机自回归视觉生成） [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation（边学习边适应：通过智能工具使用适应性将LLMs应用于科学问题） [03:50] 📚...

Published 11/04/24

【周末特辑】11月第1周最火AI论文 | 多模态遗忘新基准CLEAR，GPT-4o系统卡片详解。

本期的 5 篇论文如下： [00:41] TOP1(🔥191) | 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities（CLEAR：文本与视觉模态中的字符遗忘） [02:58] TOP2(🔥70) | 🤖 GPT-4o System Card（GPT-4o系统卡片） [04:50] TOP3(🔥50) | 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders（解构SDXL Turbo：使用稀疏自编码器解释文本到图像模型） [06:53] TOP4(🔥49) | 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation（CORAL：多轮对话增强生成基准测试） [08:44] TOP5(🔥48) | 🚀 ROCKET-1: Master Open-World Interaction with...

Published 11/02/24

2024.11.01 每日AI论文 | 稀疏自编码器提升图像模型可解释性，梯度视角揭示LLMs层级差异。

本期的 11 篇论文如下： [00:27] 🔍 Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders（解构SDXL Turbo：使用稀疏自编码器解释文本到图像模型） [01:05] 🧠 What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective（LLMs训练中快速与慢速思考的层级差异：梯度视角） [01:43] 🔍 A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents（基于指针网络的多标签多类别意图联合提取与检测方法） [02:23] 🔄 Constraint Back-translation Improves Complex Instruction Following of Large...

Published 11/01/24

2024.10.31 每日AI论文 | 多轮对话评估新基准，机器人任务高效推理模型。

本期的 5 篇论文如下： [00:29] 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation（CORAL：多轮对话增强生成基准测试） [01:09] 🤖 A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks（大型递归动作模型：xLSTM为机器人任务实现快速推理） [01:50] 🔍 Stealing User Prompts from Mixture of Experts（从混合专家模型中窃取用户提示） [02:26] 🩺 AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels（自动医疗信息检索：无需相关标签的有效零样本检索） [02:58] 🔄 TokenFormer: Rethinking Transformer Scaling with...

Published 10/31/24

2024.10.30 每日AI论文 | 多模态遗忘挑战大，AutoKaggle提升效率。

本期的 8 篇论文如下： [00:33] 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities（CLEAR：文本与视觉模态中的字符遗忘） [01:10] 🤖 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions（AutoKaggle：一种用于自主数据科学竞赛的多智能体框架） [01:46] 🤖 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization（社交GPT：通过贪婪段优化提示LLMs进行社交关系推理） [02:26] 🌐 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and...

Published 10/30/24

2024.10.29 每日AI论文 | 波兰语模型性能提升，异构代理系统创新。

本期的 17 篇论文如下： [00:24] 🇵 Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation（Bielik 7B v0.1：波兰语言模型——开发、洞察与评估） [01:00] 🤖 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant（AgentStore：可扩展的异构代理作为专业化通才计算机助手集成） [01:39] 🤖 GPT-4o System Card（GPT-4o系统卡片） [02:21] 📄 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction（文档解析揭秘：结构化信息提取的技术、挑战与前景） [03:08] 🤖 LongReward: Improving...

Published 10/29/24

2024.10.28 每日AI论文 | 视觉-时间提示提升交互，连续扩散模型优化语音合成

本期的 13 篇论文如下： [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting（ROCKET-1：利用视觉-时间上下文提示掌握开放世界交互） [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion（基于每标记潜在扩散的连续语音合成） [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images（教授多模态大语言模型理解心电图图像） [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data（无限多模态：通过大规模高质量指令数据扩展多模态性能） [03:23] ⚡ FasterCache: Training-Free Video...

Published 10/28/24

【周末特辑】10月第4周最火AI论文 | 少样本NeRF高效收敛，长视频分割精度提升。

本期的 5 篇论文如下： [00:44] TOP1(🔥79) | ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors（节俭NeRF：无学习先验的少样本新视角合成快速收敛） [02:42] TOP2(🔥60) | 🌳 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree（SAM2Long：通过无训练记忆树增强SAM 2以实现长视频分割） [04:19] TOP3(🔥58) | 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss（打破内存壁垒：对比损失的近无限批量规模扩展） [06:11] TOP4(🔥55) | 🤖 CompassJudger-1: All-in-one Judge Model Helps Model...

Published 10/26/24

2024.10.25 每日AI论文 | 内存效率显著提升，长上下文对齐能力增强。

本期的 21 篇论文如下： [00:26] 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss（打破内存壁垒：对比损失的近无限批量规模扩展） [01:09] 🔄 LOGO -- Long cOntext aliGnment via efficient preference Optimization（LOGO -- 通过高效偏好优化实现长上下文对齐） [01:45] 🧠 Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch（从零开始释放LLMs的推理能力：可扩展的问题合成方法） [02:30] 🤔 Can Knowledge Editing Really Correct Hallucinations?（知识编辑真的能纠正幻觉吗？） [03:17] 🎮 Unbounded: A Generative Infinite Game of Character...

Published 10/25/24

2024.10.24 每日AI论文 | 多图像任务优化，视频生成模型评估

本期的 10 篇论文如下： [00:25] 🖼 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models（多图像增强直接偏好优化：大型视觉语言模型） [01:09] 🌍 WorldSimBench: Towards Video Generation Models as World Simulators（世界模拟器：迈向视频生成模型作为世界模拟器） [01:47] 🌊 Scaling Diffusion Language Models via Adaptation from Autoregressive Models（通过自回归模型适应扩展扩散语言模型） [02:20] 📱 Lightweight Neural App Control（轻量级神经应用控制） [03:01] 🏠 ARKit LabelMaker: A New Scale for Indoor 3D Scene...

Published 10/24/24