2024.11.18 每日AI论文 | 视觉语言模型推理提升，图像生成精细控制优化 - Listen - HuggingFace

2024.11.18 每日AI论文 | 视觉语言模型推理提升，图像生成精细控制优化

Listen now

Description

本期的 6 篇论文如下： [00:28] 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step（LLaVA-o1：让视觉语言模型逐步推理） [01:14] 🎨 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement（区域感知文本到图像生成：硬绑定与软优化） [01:51] 🌐 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation（高斯任意：交互式点云潜在扩散用于3D生成） [02:25] 🌅 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use（GUI代理的黎明：基于Claude 3.5计算机使用的初步案例研究） [03:00] 📖 Number it: Temporal Grounding Videos like Flipping Manga（像翻阅漫画一样进行视频时间定位） [03:45] 🌍 Xmodel-1.5: An 1B-scale Multilingual LLM（Xmodel-1.5：一个10亿参数的多语言大型语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

More Episodes

See all »

【周末特辑】11月第4周最火AI论文 | LLaVA-o1提升多模态推理，Genex优化具身AI规划。

本期的 5 篇论文如下： [00:41] TOP1(🔥93) | 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step（LLaVA-o1：让视觉语言模型逐步推理） [02:41] TOP2(🔥55) | 🌍 Generative World Explorer（生成世界探索者） [05:00] TOP3(🔥44) | 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference...

Published 11/23/24

HuggingFace 每日AI论文速递

Published 11/23/24

2024.11.22 每日AI论文 | 混合偏好优化提升推理，多模态自回归预训练创新。

本期的 14 篇论文如下： [00:26] 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization（通过混合偏好优化提升多模态大语言模型的推理能力） [01:12] 🌐 Multimodal Autoregressive Pre-training of Large Vision Encoders（大规模视觉编码器多模态自回归预训练） [01:55] 🧠 Marco-o1: Towards Open Reasoning Models for...

Published 11/22/24