All episodes of AIエンジニアリングNow

Episodes

#10: Agent-as-a-judge 〜エージェントの評価を行うエージェント〜

LLM-as-a-Judgeに着想を得て、エージェンティックシステムを評価するためにエージェンティックシステムを用いることを提案したAgent-as-a-Judge: Evaluate Agents with Agentsを題材に話しました。ポッドキャストの書き起こしサービス「LISTEN」は⁠こちら⁠ Shownotes: https://arxiv.org/abs/2410.10934v1 https://huggingface.co/DEVAI-benchmark https://github.com/metauto-ai/agent-as-a-judge/tree/main https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/ ⁠ 出演者： seya(⁠@sekikazu01⁠) kagaya(⁠@ry0_kaga⁠)

Published 11/18/24

AI Engineering Now

Published 11/18/24

#9: 今流行り!?の社内v0開発に取り組んでみてる感想

Ubie社の事例に触発されて社内v0開発を始めた2人で、開発の知見や悩み、Figma AI等のデザインAIについて話しましたポッドキャストの書き起こしサービス「LISTEN」はこちら Shownotes: https://v0.dev/ ⁠https://www.figma.com/ja-jp/ai/ https://x.com/sys1yagi/status/1850763720630387170 出演者： seya(@sekikazu01) kagaya(@ry0_kaga)

Published 11/14/24

#8: Who Validate the Validator? - 継続的な評価をアップデートする仕組み -

継続的にLLMアプリケーションの評価基準や自動評価をアップデートする仕組みであるEvalGenについて書かれた論文「Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human...

Published 11/04/24

#7: AIによるプロンプト最適化：Automated Prompting 〜そして評価へ〜

自動で行うプロンプトチューニング、Auto...

Published 10/28/24

#6: RAG and Beyond 〜4つのレベルで理解するRAGとその先〜

今回はMicrosoftが公開したRAGタスクを4つのレベルに分類したリサーチであるRAG and Beyondがテーマですポッドキャストの書き起こしサービス「LISTEN」は⁠⁠⁠⁠⁠https://listen.style/p/aiengineeringnow Shownotes: https://arxiv.org/abs/2409.14924 https://x.com/K_Ishi_AI/status/1838765135206453254 出演者： seya(https://x.com/sekikazu01) kagaya(https://x.com/ry0_kaga)

Published 10/23/24

#5: OpenAI DevDay 2024の感想 ~Prompt Caching編~

今回はOpenAI DevDay 2024で発表されたPrompt Cachingがテーマですポッドキャストの書き起こしサービス「LISTEN」は⁠⁠⁠⁠⁠こちら⁠⁠⁠⁠⁠ Shownotes: https://platform.openai.com/docs/guides/prompt-caching https://www.anthropic.com/news/prompt-caching https://zenn.dev/google_cloud_jp/articles/0c257a98143152 出演者： seya(⁠⁠⁠⁠@sekikazu01⁠⁠⁠⁠) kagaya(⁠⁠⁠⁠@ry0_kaga⁠⁠⁠⁠)

Published 10/15/24

#4: ~Embedding first, Chunking Later~ Jina AIが提唱したLate Chunkingについて学ぶ

今回はJina AIが提唱したLate Chunkingがテーマです。 Jina AIはEmbedding model、Reranker、Semantic chunking等のAPIを公開しているRAGに取り組む上では注目の企業です。そんなJina AIが提唱したチャンキング手法であるLate Chunkingについて話しました。ポッドキャストの書き起こしサービス「LISTEN」は⁠⁠⁠⁠こちら⁠⁠⁠⁠ Shownotes: Jina.ai ⁠Late Chunking in Long-Context Embedding Models⁠ ⁠Late Chunking: Balancing Precision and Cost in Long Context Retrieval | Weaviate⁠ Training Text Embeddings with Jina AI What is ColBERT and Late Interaction and Why They Matter in...

Published 10/07/24

#3: 音声AI使ってる？最近の音声AIサービスについて雑談〜GoogleのNotebookLMとRetell AI、個人開発〜

音声AIサービスの雑談会です。特にGoogleのNotebookLMのAudio Overview、Illuminate、Retell AI、kagayaが絶賛個人開発中の音声AIサービスについて話しました。ポッドキャストの書き起こしサービス「LISTEN」は⁠⁠⁠こちら⁠⁠⁠ Shownotes: NotebookLM now lets you listen to a conversation about your sources AIが論文や書籍を要約してポッドキャスト風の会話音声に自動変換してくれる「Google Illuminate」が公開中グーグル、自分だけのAI「NotebookLM」に音声番組でまとめてくれる新機能 Retell AI - Supercharge your call operation with Voice AI 音声AIエージェントの世界とRetell AI入門出演者： seya(⁠⁠@sekikazu01⁠⁠) kagaya(⁠⁠@ry0_kaga⁠⁠)

Published 09/30/24

#2: LLMエージェント for ソフトウェアエンジニアリングの世界

ソフトウェアエンジニアリング領域のLLMエージェント研究のサーベイ論文である、Large Language Model-Based Agents for Software Engineering: A Surveyを題材に話しました。ポッドキャストの書き起こしサービス「LISTEN」は⁠⁠こちら⁠⁠ Shownotes: Large Language Model-Based Agents for Software Engineering: A Survey LLMエージェントのデザインパターン、Agentic Design Patternsを理解するマルチエージェントなコード生成エージェント、AgentCoderを理解する You Can REST Now: Automated Specification Inference and Black-Box... CodeAgent: Enhancing Code Generation with Tool-Integrated...

Published 09/16/24

#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る

Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。ポッドキャストの書き起こしサービス「LISTEN」はこちら Shownotes： Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Chat with Open Large Language Models From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org Benchmarks 201: Why Leaderboards > Arenas >>...

Published 09/08/24