How do you evaluate an LLM? Try an LLM. - Listen - The Stack

How do you evaluate an LLM? Try an LLM.

Listen now

Description

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs.

More Episodes

See all »

Net neutrality is in; TikTok and noncompetes are out

On this episode: The FTC bans most noncompete agreements, the implications of the TikTok “ban,” why a 2017 law is hitting startups with huge tax bills seven years later, and the return of net neutrality. Plus: the wunderkind hacker who ransomed Finland’s anxieties and secrets.

Published 04/30/24

The Stack Overflow Podcast

Published 04/30/24

Supporting the world’s most-used database engine through 2050

Dr. Richard Hipp, creator of SQLite, shares how he taught himself to program, the challenges he faced in creating SQLite, and the importance of testing and maintaining the software for long-term support.

Published 04/26/24