All episodes of Building the Backend: Data Solutions that Power

Building the Backend: Data Solutions that Power Leading Organizations

Episodes

The Analytics Engine for All Your Data with Justin Borgman @ Starburst

In this episode we speak with Justin Borgman, Chairman & CEO at Starburst, which is based on open source Trino (formerly PrestoSQL) and was recently valued at $3.35 billion after securing their series D funding. In this episode we discuss convergence of DW’s / DL's, why data lakes fail and much much more. Top 3 takeaways The data mesh architecture is gaining adoption more quickly in Europe due to GDPR.There were two main limitations of data lakes when comparing to DW’s, performance and...

Published 03/15/22

Building the Backend: Data Solutions that Power Leading Organizations

Published 03/15/22

Transform Your Object Storage Into a Git-like Repository With Paul Singman @ LakeFS

In this episode we speak with Paul Singman Developer Advocate at Treeverse / LakeFS. LakeFS is an open source project that allows you to transform your object storage into a Git-like repository. Top 3 takeaways LakeFS enables use cases like debugging to quickly view historical versions of your data at a specific point in time and running ML experiments over the same set of data with branching..The current data landscape is very fragmented with many tools available.. Over the coming years...

Published 03/01/22

Enable Faster Data Processing and Access with Apache Arrow with Matt Topol @ Factset

In this episode we speak with Matt Topol, Vice President, Principal Software Architect @ FactSet and dive deep into how they are taking advantage of Apache Arrow for faster processing and data access. Below are the top 3 value bombs: Apache Arrow is an open-source in-memory columnar format that creates a standard way to share and process data structures.Apache Arrow Flight eliminates serialization and deserialization which enables faster access to query results compared to traditional JDBC...

Published 02/01/22

Implementing Amundsen @ Convoy with Chad Sanderson

In this episode we speak with Chad Sanderson head of data and early stage startup advisor focused on data innovation @ Convoy and uncover their journey to implementing Amundsen, an open source data catalog. Below are the top 3 value bombs: Data Scientist’s should not be spending the majority of their time trying to find the data they are interested in. Amundsen is a powerful open source data catalog that integrates across your data landscape to provide visibility into your data assets and...

Published 01/25/22

The Importance of Treating Your Data Initiatives as Products with Murali Bhogavalli

Your data team should not just be keeping the lights on, but should be building and creating data products to support the business. In this episode we speak with Murali Bhogavalli a data product manager and explore what is a data product manager and how they differ from a traditional product manager. Below are the top 3 value bombs: Data should be looked at as a product and treated as such within the organization (i.e. agile methodologies, continuous improvement…) Organizations need to be...

Published 01/18/22

Open-Source Data Catalog Amundsen with Mark Grover @ Stemma

In this episode of Building The Backend we hear from Mark Grover founder @ Stemma, co-creator of Amundsen. Stemma is a fully managed data catalog, powered by the leading open-source data catalog, Amundsen. Below are top 3 value bombs: Automated data catalogs are critical to help wrangle the growing data across organizations. (i.e. Being able to identify out of 150 columns on this table only 10 are being used downstream)Tribal knowledge and context cannot be automated - data catalogs cannot...

Published 01/11/22

Architecting a Modern Data Lake with Dipti Borkar from Ahana

In this episode of Building The Backend we hear from Dipti Borkar cofounder @ Ahana a managed service for Presto on AWS, where we talk all about the data lake, how it should be structured and where the industry is going. Below are top 3 value bombs: Presto is an open source distributed SQL query engine originally created by Facebook, mainly used to run SQL queries on data lakes but can be connected to relational data stores as well. Ahana is a managed Presto service on AWS with 3x...

Published 11/09/21

Open Source BI with Apache Superset

What tools are you using for data viz? Are they low cost? One option is Apache Superset, in this episode we speak with Robert Stolz to learn more about Superset and other open source data tools. Top 3 Value Bombs: One popular use case with Apache Superset is embedding it within applications because it’s open source, there is a wide range of flexibility to integrate it with existing systems. Apache Superset supports any sources supported by the Python SQL toolkit called SQLAlchemy. DBT...

Published 11/02/21

Edge Computing and Continuous Intelligence with Swim

In this episode of Building The Backend we hear from Simon Crosby – CTO @ Swim an open source edge computing operating system, where we talk all about edge computing, event streaming and much more. Below are top 3 value bombs: Edge means more than just being physically located somewhere it could also mean in the cloud. It really is the closest point of where your source data is being generated.Continuous intelligence is a design pattern where streaming data is directly tied into business...

Published 10/26/21

12 Modern Data Architecture Principles That Should Be Implemented in 2022

This episode is a little different then the usual format. Instead of interviewing a data leader - I share what I consider are the 12 most important principles when designing a modern data architecture. Please message me on LinkedIn with the thoughts on this show.

Published 10/19/21

The Keys to Good Data Quality With Prukalpa Sankar from Atlan

In this episode of Building The Backend we hear from Prukalpa Sankar – Co-founder of Atlan, where we talk all about data quality/governance, common issues organizations face when implementing data quality and much much more. Below are top 3 value bombs: Data Governance has a bad reputation. It should not be a bureaucratic controlling process that is pushed from the top down. Active Metadata is key to modern data architectures, essentially it’s putting together all the human and machine...

Published 10/12/21

Designing a Modern Data Architecture – Teradata

This is a podcast episode you do not want to miss with Stephen Brobst, CTO @ Teradata. We discuss all things Data Warehouses, the shift to the distributed cloud and, key principles to implementing successful DW's. Top 3 Value Bombs: Large organizations are shifting more to a distributed / inter-cloud architecture for many reasons, a couple of reasons are data sovereignty, increasing residency and reducing costs.Just because your DW does not support indexing does not mean you do not need...

Published 10/04/21

Exploring Open-Source Data Integration With Airbyte

“The hardest part of ETL is not building the connectors, it is maintaining them.” Truer words never spoken. Really enjoyed this episode with Michel Tricot CEO & Co-Founder of Airbyte where we discuss all things data integration and connectors. Top 3 value bombs: The future of ETL/ELT integration connectors may lie with open source. Many closed source data integration tools only create connectors if the ROI is there, but this leaves many tools out and speed to market can be slow....

Published 09/28/21

How To Effectively Reduce Data Quality Incidents 10x with Datafold

This episode features Gleb Mezhanskiy Co-Founder & CEO @ Datafold, during our discussion we talk all about data observability and how to improve your data quality. Before Datafold, Gleb was a founding member of data teams at Lyft and Autodesk, where he built sophisticated data platforms and developed tooling to improve productivity and data quality. Top 3 Value Bombs: The foundation of any data observability platform is the data catalog. Data observability becomes increasingly difficult...

Published 09/21/21

Applying Transformations to Streaming Data with Materialize

This episode features Arjun Narayan Co-Founder & CEO @ Materialize, during our discussion we talk all about transforming streaming data, the do’s the don’ts and how Materialize is changing the landscape of streaming. Top 3 Value Bombs: When creating schema changes organizations should always strive to create forward compatible schema changes only. This means consumers will be able to consume your data model without impacting them, they just may be missing your newly added...

Published 09/14/21

Optimizing Spark in the Cloud - with Jean-Yves Stephan

This episode features Jean-Yves Stephan Co-Founder & CEO @ Data Mechanics (recently Acq. by Spot by NetApp), during our discussion we talk about optimizing Spark to run in the cloud at a low cost. Top 3 Value Bombs: Running Spark CAN be expensive but there are ways to reduce your current operating costs by 50-75% by smart automations (i.e. tune for node type, memory and CPU). Spot instances can lower your costs by utilizing unused instances. Creating serverless architectures and using...

Published 09/07/21

How To Achieving Better Observability and Control Over Your Data Pipelines with Josh Benamram

This episode features Josh Benamrum, who is the co-founder of Databand. Databand is a company that helps engineering teams achieve better observability and control over their tech stack. Top 3 Value Bombs: When observing our data we should be looking at our data and pipelinesDon’t wait till the board meeting for an incorrect metric to make DQ a priorityHaving clear SLA’s on just what data quality means across the organization is essential

Published 08/31/21

Unify Your Data Operations with Nexla

Travis welcomes to his podcast Saket Saurabh, who provides a window into the world of data management and the self-service options that are democratizing it. Co-founder and CEO of Nexla, Saket has a passion for data and infrastructure and how to improve its flow among partners, customers and vendors. Nexla automates various data engineering tasks, intelligently creates an abstraction of data and enables collaboration among people at different skill levels. Named a 2021 Cool Vendor by Gartner,...

Published 08/24/21

A Powerful Open Source Database That Supports Many Storage Needs (MariaDB)

In this episode, we speak with Rob Hedgpeth, a director of developer developer relations at Maria DB. We explore all things Maria DB, the capabilities it has and when you should consider it for your next project. Top 3 value bombs: MariaDB follows a shared nothing architecture and supports distributed SQL for unlimited scaling on demand.MariaDB can handle many types of storage (i.e. document store, graph and spatial)When deciding on your next relational database do not just look at...

Published 08/17/21

Increase the Quality and Reliability of Your Data

In this episode, we speak with Lior Gavish, the co-founder of Monte Carlo to explore all things data quality. Monte Carlo is a data lineage and observability tool that lowers your data downtime. Top 3 Value Bombs: Data products should be thought of in it’s entirely from the source to the consumer.No one data stakeholder can solve data quality issues, it’s a collaboration of the data engineers, business, data consumer and even software to help automate certain aspects of cataloging and...

Published 07/27/21

Build Real-Time Data Pipelines in Minutes Not Months with Meroxa

In this episode, we speak with DeVaris Brown, he is the CEO and co-founder of Meroxa, which is a data platform that enables organizations to build real time data pipelines in minutes not months. Prior to founding Meroxa, DeVaris was a product leader at Twitter, Heroku, and Zendesk. In this episode we will be talking about all things data ingestion. Top 3 Value Bombs: Data ingestion should be in real time to provide the most flexibility across your use cases. Real time ingestion is not as...

Published 07/20/21

Launch, Monitor, and Share Data Pipelines In a Matter of Minutes

In this episode, we speak with Blake Burch, co-founder of Shipyard, a data orchestrator tool that allows you to create powerful workflows in a matter of minutes. Top 3 Value Bombs: Data tests are often for the assumptions we already know. There's a lot of unknowns that can crop up and cause issues that tests are not catching. Start analyzing job metadata to alert on potential anomalies.Store your raw data to allow the most flexibility when it comes to re-transforming the data.Don’t settle...

Published 07/13/21

The Data Warehouse for Distributed Clouds - Yellowbrick

In this episode, we speak with Mark Cusack, CTO at Yellowbrick. Yellowbrick is a data warehouse platform that was built from the ground up for performance and cost that can be deployed across clouds and on-prem. Top 3 Value Bombs: Yellowbrick DW was recently named a contender in Cloud Data Warehouses by Forrester Research and they are able to achieve 100X performance at 1/5th the price against many competitors. As data production is exponentially increasing at the “edge” the need to...

Published 06/29/21

What You Should Know Before Getting Started With Data Science with DATA SCIENCE I N F I N I T Y

In this episode, we speak with Andrew Jones who has spent 13 years in Data Science at companies including Amazon & more recently Sony PlayStation where he developed and prototyped Machine Learning based features for the PlayStation 5, several of which have been patented by Sony. Since then he has created the DATA SCIENCE I N F I N I T Y community to support folks on there data science journey. Top 3 Value Bombs: 85% of AI projects fail, one of the reasons is due to going too complex...

Published 06/22/21