Episodes
Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them.
Today we are talking to Max Buckley on how to find and fix these errors.
Max works at Google and has built a lot of interesting experiments with LLMs on using them to improve knowledge bases for generation.
We talk about identifying ambiguities, fixing errors, creating improvement loops in the documents...
Published 11/21/24
Ever wondered why vector search isn't always the best path for information retrieval?
Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub.
Discover how BM25 transforms search efficiency, even at GitHub's immense scale.
BM25, short for Best Match 25, use term frequency (TF) and inverse document frequency (IDF) to score document-query matches. It addresses limitations in TF-IDF, such as term saturation and document length...
Published 11/15/24
Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.
Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:
Key Insights:
Multi-tier storage strategy: GPU memory (1% of data, fastest)RAM (10% of data)Local SSDObject storage (slowest but cheapest)Real-time search solution: New data goes to buffer (searchable...
Published 11/07/24
Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer.
While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results.
The architecture typically splits into three core components:
ingestion/indexing (requiring decisions between batch vs streaming)query processing (balancing understanding vs performance)analytics/feedback loops...
Published 10/31/24
Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip.
Some key points:
Uni-modal embeddings convert a single type of input (text, images, audio) into vectorsMultimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., searching images with text)Multimodal models can potentially learn richer representations of the world, including concepts that are difficult or...
Published 10/25/24
Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.
Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more.
"LanceDB is the database...
Published 10/23/24
Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they turn anything into embeddings.
When most people think about embeddings, they think about ada, openai.
You just take your text and throw it in there.
But that’s too crude.
OpenAI embeddings are trained on the internet.
But your data set (most likely) is not the internet.
You have different nuances.
And you have more...
Published 10/10/24
Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies.
That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs.
Everyone is talking about them no-one knows how to build them.
But before we look into that, what are they good for in search?
Imagine a large corpus of academic papers. When a user searches for "machine learning in healthcare",...
Published 10/04/24
ColPali makes us rethink how we approach document processing.
ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods.
In this episode, Jo Bergum, chief scientist at Vespa, shares his insights on how ColPali is changing the way we approach complex document formats like PDFs and HTML...
Published 09/27/24
Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT.
We discuss:
The role of rerankers in retrieval pipelinesAdvantages of late interaction models like ColBERT for interpretabilityTraining rerankers vs. embedding models and their impact on...
Published 09/26/24
Text embeddings have limitations when it comes to handling long documents and out-of-domain data.
Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at Cohere. Tbh, he has too many accolades to count off here.
We talk about the main limitations of embeddings:
Failing out of domainStruggling with...
Published 09/19/24
Hey! Welcome back.
Today we look at how we can get our RAG system ready for scale.
We discuss common problems and their solutions, when you introduce more users and more requests to your system.
For this we are joined by Nirant Kasliwal, the author of fastembed.
Nirant shares practical insights on metadata extraction, evaluation strategies, and emerging technologies like Colipali. This episode is a must-listen for anyone looking to level up their RAG implementations.
"Naive RAG has a lot of...
Published 09/12/24
In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, including large language models (LLMs) and semantic search, contribute to relevant search results.
Key Highlights:
Defining relevance is challenging and depends heavily on user intent and contextCombining multiple search techniques (keyword, semantic, etc.) in tiers can improve resultsLLMs are emerging as a...
Published 09/05/24
In this episode, we talk data-driven search optimizations with Charlie Hull.
Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source search companies in the UK, has written “Searching the Enterprise”, and is one of the main voices on data-driven search.
We discuss strategies to improve search systems quantitatively and much more.
Key Points:
Relevance in search is subjective and context-dependent, making it challenging to measure...
Published 08/30/24
Welcome back to How AI Is Built.
We have got a very special episode to kick off season two.
Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in the field of information retrieval, recommender systems, and AI-powered search. He worked for Canva, Algolia, Cisco, Gartner, Handshake, to pick a few.
His core focus is query understanding.
**Query understanding is about focusing less on the results and more on the query.** The query of the user is the...
Published 08/15/24
Your queries are on a spectrum.
Head Tail
High Volume Low Volume
General Specific
Few Queries Many Queries
When we talk about volume, we talk about the amount of searches with the same query term.
Tail queries still have a large volume of search volume, but as a distribution.
What counts into the head, torso, and tail queries will always be application specific, so you have to log the queries and create some analytics for it to...
Published 08/15/24
Facets let your users tune their search results.
They are not filters.
Filters eliminate search results.
Facets give users the ability to reduce the search results.
Facets allow the user in the frontend to limit the search results to a limited, more specific set. They are extremely valuable in combination with head queries (reference), which return a large number of results.
Read more: https://nicolaygerold.substack.com/p/facets-vs-filters-grokking-search
Further reading:
Daniel Tunkelang....
Published 08/13/24
Today we are launching the season 2 of How AI Is Built.
The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2.
This season will be all about search.
We are trying to make it better, more actionable, and more in-depth. The goal is that at the end of this season, you have a full-fleshed course on search in podcast form, which mini-courses on specific elements like RAG.
We...
Published 08/08/24
In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.
Key Takeaways
Generative AI projects often require less data cleaning due...
Published 07/16/24
This episode of "How AI Is Built" is all about data processing for AI. Abhishek Choudhary and Nicolay discuss Spark and alternatives to process data so it is AI-ready.
Spark is a distributed system that allows for fast data processing by utilizing memory. It uses a dataframe representation "RDD" to simplify data processing.
When should you use Spark to process your data for your AI Systems?
→ Use Spark when:
Your data exceeds terabytes in volumeYou expect unpredictable data growthYour...
Published 07/12/24
In this episode, Nicolay talks with Rahul Parundekar, founder of AI Hero, about the current state and future of AI agents. Drawing from over a decade of experience working on agent technology at companies like Toyota, Rahul emphasizes the importance of focusing on realistic, bounded use cases rather than chasing full autonomy.
They dive into the key challenges, like effectively capturing expert workflows and decision processes, delivering seamless user experiences that integrate into existing...
Published 07/04/24
In this conversation, Nicolay and Richmond Alake discuss various topics related to building AI agents and using MongoDB in the AI space. They cover the use of agents and multi-agents, the challenges of controlling agent behavior, and the importance of prompt compression.
When you are building agents. Build them iteratively. Start with simple LLM calls before moving to multi-agent systems.
Main Takeaways:
Prompt Compression: Using techniques like prompt compression can significantly reduce...
Published 06/27/24
In this episode, Kirk Marple, CEO and founder of Graphlit, shares his expertise on building efficient data integrations.
Kirk breaks down his approach using relatable concepts:
The "Two-Sided Funnel": This model streamlines data flow by converting various data sources into a standard format before distributing it.
Universal Data Streams: Kirk explains how he transforms diverse data into a single, manageable stream of information.
Parallel Processing: Learn about the "competing consumer...
Published 06/25/24
In our latest episode, we sit down with Derek Tu, Founder and CEO of Carbon, a cutting-edge ETL tool designed specifically for large language models (LLMs).
Carbon is streamlining AI development by providing a platform for integrating unstructured data from various sources, enabling businesses to build innovative AI applications more efficiently while addressing data privacy and ethical concerns.
"I think people are trying to optimize around the chunking strategy... But for me, that seems a...
Published 06/19/24