Next-Gen Data Modeling, Integrity, and Governance with YODA -

Streaming Audio: A Confluent podcast about...

Next-Gen Data Modeling, Integrity, and Governance with YODA

Listen now

Description

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale. Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets. The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data. This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation. To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer. Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more. ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles. EPISODE LINKS Current 2022 Talk: Next Gen Data Modeling in the Open Data PlatformData Mesh 101Data Mesh Architecture: A Modern Distributed Data ModelWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

More Episodes

See all »

Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates

Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates. Kafka Core: KIP-833 provides an updated timeline for KRaft.KIP-866 now is preview and allows migration from an existing...

Published 06/15/23

Streaming Audio: Apache Kafka® & Real-Time Data

Published 06/15/23

A Special Announcement from Streaming Audio

After recording 64 episodes and featuring 58 amazing guests, the Streaming Audio podcast series has amassed over 130,000 plays on YouTube in the last year. We're extremely proud of these achievements and feel that it's time to take a well-deserved break. Streaming Audio will be taking a vacation!...

Published 04/13/23