#10: Agent-as-a-judge 〜エージェントの評価を行うエージェント〜 - Listen -

#10: Agent-as-a-judge 〜エージェントの評価を行うエージェント〜

Listen now

Description

LLM-as-a-Judgeに着想を得て、エージェンティックシステムを評価するためにエージェンティックシステムを用いることを提案したAgent-as-a-Judge: Evaluate Agents with Agentsを題材に話しました。ポッドキャストの書き起こしサービス「LISTEN」は⁠こちら⁠ Shownotes: https://arxiv.org/abs/2410.10934v1 https://huggingface.co/DEVAI-benchmark https://github.com/metauto-ai/agent-as-a-judge/tree/main https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/ ⁠ 出演者： seya(⁠@sekikazu01⁠) kagaya(⁠@ry0_kaga⁠)

More Episodes

See all »

AI Engineering Now

Published 11/18/24

#9: 今流行り!?の社内v0開発に取り組んでみてる感想

Ubie社の事例に触発されて社内v0開発を始めた2人で、開発の知見や悩み、Figma AI等のデザインAIについて話しましたポッドキャストの書き起こしサービス「LISTEN」はこちら Shownotes: https://v0.dev/ ⁠https://www.figma.com/ja-jp/ai/ https://x.com/sys1yagi/status/1850763720630387170 出演者： seya(@sekikazu01) kagaya(@ry0_kaga)

Published 11/14/24

#8: Who Validate the Validator? - 継続的な評価をアップデートする仕組み -

継続的にLLMアプリケーションの評価基準や自動評価をアップデートする仕組みであるEvalGenについて書かれた論文「Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human...

Published 11/04/24