Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Listen now
Description
Maggie, Linus, Geoffrey, and the LS crew are reuniting for our second annual AI UX demo day in SF on Apr 28. Sign up to demo here! And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements! It’s become fashionable for many AI startups to project themselves as “the next Google” - while the search engine is so 2000s, both Perplexity and Exa referred to themselves as a “research engine” or “answer engine” in our NeurIPS pod. However these searches tend to be relatively shallow, and it is challenging to zoom up and down the ladders of abstraction to garner insights. For serious researchers, this level of simple one-off search will not cut it. We’ve commented in our Jan 2024 Recap that Flow Engineering (simply; multi-turn processes over many-shot single prompts) seems to offer far more performance, control and reliability for a given cost budget. Our experiments with Devin and our understanding of what the new Elicit Notebooks offer a glimpse into the potential for very deep, open ended, thoughtful human-AI collaboration at scale. It starts with prompts When ChatGPT exploded in popularity in November 2022 everyone was turned into a prompt engineer. While generative models were good at "vibe based" outcomes (tell me a joke, write a poem, etc) with basic prompts, they struggled with more complex questions, especially in symbolic fields like math, logic, etc. Two of the most important "tricks" that people picked up on were: * Chain of Thought prompting strategy proposed by Wei et al in the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”. Rather than doing traditional few-shot prompting with just question and answers, adding the thinking process that led to the answer resulted in much better outcomes. * Adding "Let's think step by step" to the prompt as a way to boost zero-shot reasoning, which was popularized by Kojima et al in the Large Language Models are Zero-Shot Reasoners paper from NeurIPS 2022. This bumped accuracy from 17% to 79% compared to zero-shot. Nowadays, prompts include everything from promises of monetary rewards to… whatever the Nous folks are doing to turn a model into a world simulator. At the end of the day, the goal of prompt engineering is increasing accuracy, structure, and repeatability in the generation of a model. From prompts to agents As prompt engineering got more and more popular, agents (see “The Anatomy of Autonomy”) took over Twitter with cool demos and AutoGPT became the fastest growing repo in Github history. The thing about AutoGPT that fascinated people was the ability to simply put in an objective without worrying about explaining HOW to achieve it, or having to write very sophisticated prompts. The system would create an execution plan on its own, and then loop through each task. The problem with open-ended agents like AutoGPT is that 1) it’s hard to replicate the same workflow over and over again 2) there isn’t a way to hard-code specific steps that the agent should take without actually coding them yourself, which isn’t what most people want from a product. From agents to products Prompt engineering and open-ended agents were great in the experimentation phase, but this year more and more of these workflows are starting to become polished products. Today’s guests are Andreas Stuhlmüller and Jungwon Byun of Elicit (previously Ought), an AI research assistant that they think of as “the best place to understand what is known”. Ought was a non-profit, but last September, Elicit spun off into a PBC with a $9m seed round. It is hard to quantify how much a workflow can be improved, but Elicit boasts some impressive numbers for research assistants: Just four months after launch, Elicit crossed $1M ARR, which shows how much interest there is for AI products that just work. One of the main takeaways we had from the episode is how teams should focus on supervising the proce
More Episodes
Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here...
Published 11/15/24
We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show! Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! In our first ever episode with Logan Kilpatrick we called out...
Published 11/11/24