Building Taxonomies: Data Models to Remove Ambiguity from AI and Search | S2 E8
Listen now
Description
Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies. That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs. Everyone is talking about them no-one knows how to build them. But before we look into that, what are they good for in search? Imagine a large corpus of academic papers. When a user searches for "machine learning in healthcare", the system can: Recognize "machine learning" as a subcategory of "artificial intelligence"Identify "healthcare" as a broad field with subfields like "diagnostics" and "patient care"We can use these to expand the query or narrow it down.We can return results that include papers on "neural networks for medical imaging" or "predictive analytics in patient outcomes", even if these exact phrases weren't in the search queryWe can also filter down and remove papers not tagged with AI that might just mention it in a side not.So we are building the plumbing, the necessary infrastructure for tagging, categorization, query expansion and relexation, filtering. So how can we build them? 1️⃣ Start with Industry Standards • Leverage established taxonomies (e.g., Google, GS1, IAB) • Audit them for relevance to your project • Use as a foundation, not a final solution 2️⃣ Customize and Fill Gaps • Adapt industry taxonomies to your specific domain • Create a "coverage model" for your unique needs • Mine internal docs to identify domain-specific concepts 3️⃣ Follow Ontology Best Practices • Use clear, unique primary labels for each concept • Include definitions to avoid ambiguity • Provide context for each taxonomy node Jessica Talisman: LinkedInNicolay Gerold: ⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Taxonomies and Knowledge Graphs 02:03 Building the Foundation: Metadata to Knowledge Graphs 04:35 Industry Taxonomies and Coverage Models 06:32 Clustering and Labeling Techniques 11:00 Evaluating and Maintaining Taxonomies 31:41 Exploring Taxonomy Granularity 32:18 Differentiating Taxonomies for Experts and Users 33:35 Mapping and Equivalency in Taxonomies 34:02 Best Practices and Examples of Taxonomies 40:50 Building Multilingual Taxonomies 44:33 Creative Applications of Taxonomies 48:54 Overrated and Underappreciated Technologies 53:00 The Importance of Human Involvement in AI 53:57 Connecting with the Speaker 55:05 Final Thoughts and Takeaways
More Episodes
Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built...
Published 11/21/24
Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25,...
Published 11/15/24
Published 11/15/24