Building the database for AI, Multi-modal AI, Multi-modal Storage

Building the database for AI, Multi-modal AI, Multi-modal Storage | S2 E10

Listen now

Description

Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning. Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more. "LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search." “We're big believers in the composable data systems vision." "You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows." "We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier." "LanceDB offers up to 1,000 times faster performance than Parquet." Change She: LinkedInX (Twitter)LanceDB: X (Twitter)GitHubWebDiscordVectorDB RecipesNicolay Gerold: LinkedInX (Twitter)00:00 Introduction to Multimodal Embeddings00:26 Challenges in Storage and Serving02:51 LanceDB: The Solution for Multimodal Data04:25 Interview with Chang She: Origins and Vision10:37 Technical Deep Dive: LanceDB and Rust18:11 Innovations in Data Storage Formats19:00 Optimizing Performance in Lakehouse Ecosystems21:22 Future Use Cases for LanceDB26:04 Building Effective Recommendation Systems32:10 Exciting Applications and Future Directions

More Episodes

See all »

From Ambiguous to AI-Ready: Improving Documentation Quality for RAG Systems | S2 E15

Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built...

Published 11/21/24

BM25 is the workhorse of search; vectors are its visionary cousin | S2 E14

Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25,...

Published 11/15/24

How AI Is Built

Published 11/15/24