2024.11.04 每日AI论文 | OS-ATLAS提升GUI代理性能,CAF优化生成模型效率。
Listen now
Description
本期的 17 篇论文如下: [00:25] 🤖 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents(OS-ATLAS:通用GUI代理的基础动作模型) [01:07] ⚙ Constant Acceleration Flow(恒定加速度流) [01:53] 🍅 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models(番茄:评估多模态基础模型在视觉时间推理能力) [02:33] 🎨 Randomized Autoregressive Visual Generation(随机自回归视觉生成) [03:10] 🧠 Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation(边学习边适应:通过智能工具使用适应性将LLMs应用于科学问题) [03:50] 📚 Personalization of Large Language Models: A Survey(大型语言模型的个性化:综述) [04:29] 🖼 In-Context LoRA for Diffusion Transformers(上下文LoRA用于扩散变换器) [05:09] ⚡ SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba State Space Models(SambaMixer:使用Mamba状态空间模型预测锂离子电池健康状态) [05:54] 🤖 Survey of User Interface Design and Interaction Techniques in Generative AI Applications(生成式AI应用中的用户界面设计与交互技术综述) [06:32] 🧶 HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models(HelloMeme:将空间编织注意力整合到扩散模型中以嵌入高层次和丰富保真度的条件) [07:07] 🌐 M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation(M2rc-Eval:大规模多语言仓库级代码补全评估) [07:44] 🌆 CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes(城市高斯V2:大规模场景的高效几何精确重建) [08:22] 🔄 GPT or BERT: why not both?(GPT还是BERT:为何不两者兼得?) [09:02] 🎭 Face Anonymization Made Simple(面部匿名化变得简单) [09:40] 📊 Zipfian Whitening(齐夫白化) [10:19] 📚 WikiNER-fr-gold: A Gold-Standard NER Corpus(WikiNER-fr-gold:一个金标准命名实体识别语料库) [10:53] 🧠 GRS-QA -- Graph Reasoning-Structured Question Answering Dataset(图推理结构化问答数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
More Episodes
本期的 5 篇论文如下: [00:41] TOP1(🔥93) | 🧠 LLaVA-o1: Let Vision Language Models Reason Step-by-Step(LLaVA-o1:让视觉语言模型逐步推理) [02:41] TOP2(🔥55) | 🌍 Generative World Explorer(生成世界探索者) [05:00] TOP3(🔥44) | 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference...
Published 11/23/24
本期的 14 篇论文如下: [00:26] 🧠 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization(通过混合偏好优化提升多模态大语言模型的推理能力) [01:12] 🌐 Multimodal Autoregressive Pre-training of Large Vision Encoders(大规模视觉编码器多模态自回归预训练) [01:55] 🧠 Marco-o1: Towards Open Reasoning Models for...
Published 11/22/24