2024.11.27 每日AI论文 | ShowUI提升GUI效率,F2F改进图像编辑。
Listen now
Description
本期的 18 篇论文如下: [00:28] 🖥 ShowUI: One Vision-Language-Action Model for GUI Visual Agent(ShowUI:一种用于GUI视觉代理的视觉-语言-动作模型) [01:08] 🎥 Pathways on the Image Manifold: Image Editing via Video Generation(图像流形上的路径:通过视频生成进行图像编辑) [01:45] ⭐ Star Attention: Efficient LLM Inference over Long Sequences(星型注意力:长序列上高效的大型语言模型推理) [02:24] ⚡ Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration(重新思考MLLMs中的Token减少:迈向无训练加速的统一范式) [03:01] 📊 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs(MME-Survey: 多模态大语言模型评估的综合调查) [03:44] 🎨 TEXGen: a Generative Diffusion Model for Mesh Textures(TEXGen:一种用于网格纹理的生成扩散模型) [04:27] 🎨 SketchAgent: Language-Driven Sequential Sketch Generation(SketchAgent:语言驱动的顺序草图生成) [05:11] 🔄 Learning 3D Representations from Procedural 3D Programs(从程序化3D程序中学习3D表示) [05:55] 🧠 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models(VLRewardBench:视觉语言生成奖励模型的挑战性基准) [06:50] 🔄 SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE(SAR3D:通过多尺度3D VQVAE实现自回归3D物体生成与理解) [07:27] 🖼 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity(精细标题:聚焦任意粒度的组合图像描述) [08:09] 🎨 DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting(DreamMix:解耦对象属性以增强定制化图像修复的可编辑性) [08:41] 📹 SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis(SALOVA:长视频助手在长视频分析中的目标检索与路由) [09:19] 📉 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens(低比特量化有利于未充分训练的大型语言模型:基于100万亿训练标记的量化大型语言模型缩放规律) [10:05] 🧬 MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts(MolReFlect:面向分子与文本之间细粒度对齐的研究) [10:40] 👕 Controllable Human Image Generation with Personalized Multi-Garments(个性化多服装的可控人体图像生成) [11:12] 🤖 Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)(视觉反图灵测试(VCT²):发现AI生成图像检测的挑战并引入视觉AI指数(V_AI)) [11:55] 🎥 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation(锚点创作者:通过人-物交互视频生成动画网络锚点推广产品) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
More Episodes
本期的 21 篇论文如下: [00:26] 🌐 Material Anything: Generating Materials for Any 3D Object via Diffusion(材料生成:通过扩散生成任意3D对象的材料) [01:05] 🎨 Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator(基于修复的大规模文本到图像模型:零样本主题驱动图像生成器) [01:48] 🤖 From Generation to Judgment:...
Published 11/26/24
本期的 14 篇论文如下: [00:26] 🎨 Style-Friendly SNR Sampler for Style-Driven Generation(风格友好SNR采样器用于风格驱动生成) [01:08] 🚀 TÜLU 3: Pushing Frontiers in Open Language Model Post-Training(TÜLU 3:推动开放语言模型后训练的前沿) [01:53] 🌐 OminiControl: Minimal and Universal Control for Diffusion...
Published 11/25/24