Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure | ep 1
Listen now
Description
Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning. Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more. "LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search." “We're big believers in the composable data systems vision." "You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows." "We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier." "LanceDB offers up to 1,000 times faster performance than Parquet." Change She: LinkedIn X (Twitter) LanceDB: X (Twitter) GitHub Web Discord VectorDB Recipes Nicolay Gerold: LinkedIn X (Twitter) Chapters: 00:00 Introduction to LanceDB 02:16 Building LanceDB in Rust 12:10 Optimizing Data Infrastructure 26:20 Surprising Use Cases for LanceDB 32:01 The Future of LanceDB LanceDB, AI, database, Rust, multimodal AI, data infrastructure, embeddings, images, performance, Parquet, machine learning, model database, function registries, agents. --- Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message
More Episodes
Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built...
Published 11/21/24
Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25,...
Published 11/15/24
Published 11/15/24