SQL Meets Vector Search with Linpeng Tang of MyScale
Listen now
Description
Welcome back to an episode where we're talking Vectors, Vector Databases, and AI with Linpeng Tang, CTO and co-founder of MyScale. MyScale is a super interesting technology. They're combining the best of OLAP databases with Vector Search. The project started back in 2019 where they forked ClickHouse and then adapted it to support Vector Storage, Indexing, and Search. The really unique and cool thing is you get the familiarity and usability of SQL with the power of being able to compare the similarity between unstructured data. We think this has really fascinating use cases for analytics well beyond what we're seeing with other vector database technology that's mostly restricted to building RAG models for LLMs. Also, because it's built on ClickHouse, MyScale is massively scalable, which is an area that many of the dedicated vector databases actually struggle with. We cover a lot about how vector databases work, why they decided to build off of ClickHouse, and how they plan to open source the database. Timestamps 02:29 Introduction 06:22 Value of a Vector Database 12:40 Forking ClickHouse 18:53 Transforming Clickhouse into a SQL vector database 32:08 Data modeling 32:56 What data can be Vectorized 38:37 Indexing 43:35 Achieving Scale 46:35 Bottlenecks 48:41 MyScale vs other dedicated Vector Databases 51:38 Going Open Source 56:04 Closing thoughts
More Episodes
Today we have Mark Huang on the show. Mark has previously held roles in Data Science and ML at companies like Box and Splunk and is now the co-founder and chief architect of Gradient, an enterprise AI platform to build and deploy autonomous assistants. In our chat, we get into some of the stuff...
Published 06/18/24
Published 06/18/24