Ryan Greenblatt - Solving ARC with GPT4o
Listen now
Description
Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs. Sponsor: Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose. We discuss: - Ryan's unique approach to solving the ARC Challenge and achieving impressive results. - The strengths and weaknesses of current AI models. - How AI and humans differ in learning and reasoning. - Combining various techniques to create smarter AI systems. - The potential risks and future advancements in AI, including the idea of agentic AI. https://x.com/RyanPGreenblatt https://www.redwoodresearch.org/ Refs: Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt] https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt On the Measure of Intelligence [Chollet] https://arxiv.org/abs/1911.01547 Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn] https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf Software 2.0 [Andrej Karpathy] https://karpathy.medium.com/software-2-0-a64152b37c35 Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley] https://amzn.to/3Wfy2E0 Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS] https://gwern.net/doc/iq/high/smpy/1984-clements.pdf Model Evaluation and Threat Research (METR) https://metr.org/ Why Tool AIs Want to Be Agent AIs https://gwern.net/tool-ai Simulators - Janus https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators AI Control: Improving Safety Despite Intentional Subversion https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion https://arxiv.org/abs/2312.06942 What a Compute-Centric Framework Says About Takeoff Speeds https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/ Global GDP over the long run https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log Safety Cases: How to Justify the Safety of Advanced AI Systems https://arxiv.org/abs/2403.10462 The Danger of a “Safety Case" http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf The Future Of Work Looks Like A UPS Truck (~02:15:50) https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck SWE-bench https://www.swebench.com/ Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model https://arxiv.org/pdf/2201.11990 Algorithmic Progress in Language Models https://epochai.org/blog/algorithmic-progress-in-language-models
More Episodes
Nora Belrose, Head of Interpretability Research at EleutherAI, discusses critical challenges in AI safety and development. The conversation begins with her technical work on concept erasure in neural networks through LEACE (LEAst-squares Concept Erasure), while highlighting how neural networks'...
Published 11/17/24
Prof. Gennady Pekhimenko (CEO of CentML, UofT) joins us in this *sponsored episode* to dive deep into AI system optimization and enterprise implementation. From NVIDIA's technical leadership model to the rise of open-source AI, Pekhimenko shares insights on bridging the gap between academic...
Published 11/13/24