2024.11.06 每日AI论文 | HTML提升RAG性能,分子图助手优化多模态任务
Listen now
Description
本期的 11 篇论文如下: [00:30] 📄 HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems(HtmlRAG:在RAG系统中,HTML比纯文本更适合建模检索知识) [01:12] 🧬 LLaMo: Large Language Model-based Molecular Graph Assistant(基于大型语言模型的分子图助手) [01:52] 🤖 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution(DeeR-VLA:动态推理多模态大语言模型以实现高效机器人执行) [02:28] 🤖 Sample-Efficient Alignment for LLMs(LLM的高效对齐方法) [03:01] 🚦 Controlling Language and Diffusion Models by Transporting Activations(通过传输激活控制语言和扩散模型) [03:49] 🌟 DreamPolish: Domain Score Distillation With Progressive Geometry Generation(梦幻抛光:基于渐进几何生成的领域分数蒸馏) [04:32] 🦓 Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge(斑马-羊驼:一种用于普及罕见病知识的上下文感知大型语言模型) [05:12] 👕 GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details(GarVerseLOD:利用多层次细节数据集从单张自然图像中进行高保真3D服装重建) [05:46] 🔍 Correlation of Object Detection Performance with Visual Saliency and Depth Estimation(目标检测性能与视觉显著性和深度估计的相关性) [06:28] 🔄 Adaptive Length Image Tokenization via Recurrent Allocation(通过递归分配实现自适应长度图像标记化) [07:01] 🧠 Inference Optimal VLMs Need Only One Visual Token but Larger Models(推断最优的视觉语言模型仅需一个视觉标记但需要更大的模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
More Episodes
本期的 5 篇论文如下: [00:41] TOP1(🔥93) | 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step(LLaVA-o1:让视觉语言模型逐步推理) [02:41] TOP2(🔥55) | 🌍 Generative World Explorer(生成世界探索者) [05:00] TOP3(🔥44) | 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference...
Published 11/23/24
本期的 14 篇论文如下: [00:26] 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization(通过混合偏好优化提升多模态大语言模型的推理能力) [01:12] 🌐 Multimodal Autoregressive Pre-training of Large Vision Encoders(大规模视觉编码器多模态自回归预训练) [01:55] 🧠 Marco-o1: Towards Open Reasoning Models for...
Published 11/22/24