Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17
Description
In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.
Key Takeaways
Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.Key Quotes
"I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."
"To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."
Jonathan Yarkoni
LinkedInYouTubeX (Twitter)Reach LatentNicolay Gerold:
LinkedInX (Twitter)Chapters
00:00 Introduction: Extracting Value from Unstructured Data 03:16 Flexible Tailoring Solutions to Client Needs 05:39 Monitoring and Retraining Models in the Evolving AI Landscape 09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities 17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development 28:29 Dream Tech Stack for Generative AI
unstructured data, textual data, automation, weather prediction, data cleaning, chat GPT, AI disruption, legal, education, software engineering, marketing, biotech, immediate results, cutting-edge solutions, tech stack
Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them.
Today we are talking to Max Buckley on how to find and fix these errors.
Max works at Google and has built...
Published 11/21/24
Ever wondered why vector search isn't always the best path for information retrieval?
Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub.
Discover how BM25 transforms search efficiency, even at GitHub's immense scale.
BM25,...
Published 11/15/24