Serverless Data Orchestration, AI in the Data Stack, AI Pipelines

Serverless Data Orchestration, AI in the Data Stack, AI Pipelines | ep 12

Listen now

Description

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater. Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack. The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools. Key Quotes: Find the right level of abstraction when building data orchestration tasks/workflows. "I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.” Modularize data pipeline components: "It's just around understanding what that dev workflow should look like. I think it should be a bit more modular." Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability. Adopt a streaming/event-driven architecture for low-latency AI use cases: "If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices." Hugo Lu: LinkedIn Newsletter Orchestra Orchestra Docs Nicolay Gerold: ⁠LinkedIn⁠ ⁠X (Twitter) 00:00 Introduction to Orchestra and its Focus on Data Products 08:03 Unified Control Plane for Data Stack and End-to-End Control 14:42 Use Cases and Unique Applications of Orchestra 19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra 22:23 Event-Driven Architectures and Monitoring in Orchestra 23:49 Putting Data Products First and Monitoring Health and Usage 25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads

More Episodes

See all »

From Ambiguous to AI-Ready: Improving Documentation Quality for RAG Systems | S2 E15

Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them. Today we are talking to Max Buckley on how to find and fix these errors. Max works at Google and has built...

Published 11/21/24

BM25 is the workhorse of search; vectors are its visionary cousin | S2 E14

Ever wondered why vector search isn't always the best path for information retrieval? Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub. Discover how BM25 transforms search efficiency, even at GitHub's immense scale. BM25,...

Published 11/15/24

How AI Is Built

Published 11/15/24