【周末特辑】11月第3周最火AI论文 | Add-it提升图像插入性能,LLMs实现长上下文自我改进。
Listen now
Description
本期的 5 篇论文如下: [00:44] TOP1(🔥54) | 🖼 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models(Add-it:基于预训练扩散模型的图像无训练对象插入) [02:31] TOP2(🔥44) | 🤖 Large Language Models Can Self-Improve in Long-context Reasoning(大型语言模型在长上下文推理中的自我改进) [04:15] TOP3(🔥43) | 🌐 LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models(LLaMA-Mesh:将3D网格生成与语言模型统一) [06:12] TOP4(🔥42) | 🎨 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision(全能编辑器:通过专家监督构建图像编辑通用模型) [08:01] TOP5(🔥42) | 📚 M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework(M-Longdoc:多模态超长文档理解和检索感知调优框架的基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
More Episodes
本期的 5 篇论文如下: [00:41] TOP1(🔥93) | 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step(LLaVA-o1:让视觉语言模型逐步推理) [02:41] TOP2(🔥55) | 🌍 Generative World Explorer(生成世界探索者) [05:00] TOP3(🔥44) | 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference...
Published 11/23/24
本期的 14 篇论文如下: [00:26] 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization(通过混合偏好优化提升多模态大语言模型的推理能力) [01:12] 🌐 Multimodal Autoregressive Pre-training of Large Vision Encoders(大规模视觉编码器多模态自回归预训练) [01:55] 🧠 Marco-o1: Towards Open Reasoning Models for...
Published 11/22/24