Language Agents: From Reasoning to Acting - Listen - Latent Space:

Latent Space: Founders, Engineers, and News on...

Language Agents: From Reasoning to Acting

Listen now

Description

OpenAI DevDay is almost here! Per tradition, we are hosting a DevDay pregame event for everyone coming to town! Join us with demos and gossip! Also sign up for related events across San Francisco: the AI DevTools Night, the xAI open house, the Replicate art show, the DevDay Watch Party (for non-attendees), Hack Night with OpenAI at Cloudflare. For everyone else, join the Latent Space Discord for our online watch party and find fellow AI Engineers in your city. OpenAI’s recent o1 release (and Reflection 70b debacle) has reignited broad interest in agentic general reasoning and tree search methods. While we have covered some of the self-taught reasoning literature on the Latent Space Paper Club, it is notable that the Eric Zelikman ended up at xAI, whereas OpenAI’s hiring of Noam Brown and now Shunyu suggests more interest in tool-using chain of thought/tree of thought/generator-verifier architectures for Level 3 Agents. We were more than delighted to learn that Shunyu is a fellow Latent Space enjoyer, and invited him back (after his first appearance on our NeurIPS 2023 pod) for a look through his academic career with Harrison Chase (one year after his first LS show). ReAct: Synergizing Reasoning and Acting in Language Models paper link Following seminal Chain of Thought papers from Wei et al and Kojima et al, and reflecting on lessons from building the WebShop human ecommerce trajectory benchmark, Shunyu’s first big hit, the ReAct paper showed that using LLMs to “generate both reasoning traces and task-specific actions in an interleaved manner” achieved remarkably greater performance (less hallucination/error propagation, higher ALFWorld/WebShop benchmark success) than CoT alone. In even better news, ReAct scales fabulously with finetuning: As a member of the elite Princeton NLP group, Shunyu was also a coauthor of the Reflexion paper, which we discuss in this pod. Tree of Thoughts paper link here Shunyu’s next major improvement on the CoT literature was Tree of Thoughts: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role… ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. The beauty of ToT is it doesnt require pretraining with exotic methods like backspace tokens or other MCTS architectures. You can listen to Shunyu explain ToT in his own words on our NeurIPS pod, but also the ineffable Yannic Kilcher: Other Work We don’t have the space to summarize the rest of Shunyu’s work, you can listen to our pod with him now, and recommend the CoALA paper and his initial hit webinar with Harrison, today’s guest cohost: as well as Shunyu’s PhD Defense Lecture: as well as Shunyu’s latest lecture covering a Brief History of LLM Agents: As usual, we are live on YouTube! Show Notes * Harrison Chase * LangChain, LangSmith, LangGraph * Shunyu Yao * Alec Radford * ReAct Paper * Hotpot QA * Tau Bench * WebShop * SWE-Agent * SWE-Bench * Trees of Thought * CoALA Paper * Related Episodes * Our Thomas Scialom (Meta) episode * Shunyu on our NeurIPS 2023 Best Papers episode * Harrison on our LangChain episode * Mentions * Sierra * Voyager * Jason Wei * Tavily * SERP API * Exa Timestamps * [00:00:00] Opening Song by Suno * [00:03:00] Introductions * [00:06:16] The ReAct paper * [00:12:09] Early applications of ReAct in LangChain * [00:17:15] Discussion of the Reflection paper * [00:22:35] Tree of Thoughts paper and search algorithms in language models * [00:27:21] SWE-Agent and SWE-Bench for coding benchmarks * [00:39:21] CoALA: Cognitive Architect

More Episodes

See all »

Agents @ Work: Lindy.ai

Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here...

Published 11/15/24

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Published 11/15/24

Agents @ Work: Dust.tt

We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show! Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! In our first ever episode with Logan Kilpatrick we called out...

Published 11/11/24