All episodes of Data Archives - Software Engineering Daily

Data Archives - Software Engineering Daily

Episodes

Streaming Analytics with Hojjat Jafarpour

Streaming analytics refers to the process of analyzing real-time data that is generated continuously and rapidly from various sources, such as sensors, applications, social media, and other internet-connected devices. Streaming analytics platforms enable organizations to extract business value from data in motion, similar to how traditional analytics tools derive insights from data at rest. DeltaStream

Published 04/06/23

Turso: Globally Replicated SQLite with Glauber Costa

Distributed databases are necessary for storing and managing data across multiple nodes in a network. They provide scalability, fault tolerance, improved performance, and cost savings. By distributing data across nodes, they allow for efficient processing of large amounts of data and redundancy against failures. They can also be used to store data across multiple locations

Published 04/03/23

Observability Trends with John Hart

DataSet is a log analytics platform provided by Sentinel One that helps DevOps, IT engineering, and security teams get answers from their data across all time periods, both live streaming and historical. It’s powered by a unique architecture that uses a massively parallel query engine to provide actionable insights from the data available. John Hart

Published 03/20/23

Data Investing and the MAD with Matt Turck

There are many types of early stage funding available from friends and family to seed to series A. Some firms invest across a wide set of technologies and seek only to provide capital. Others are in it for the long haul – they focus on specific areas of technology and develop both long term relationships

Published 03/10/23

Accessing Data at Scale with Justin Borgman

The Presto/Trino project makes distributed querying easier across a variety of data sources. As the need for machine learning and other high volume data applications has increased, the need for support, tooling, and cloud infrastructure for Presto/Trino has increased with it. Starburst helps your teams run fast queries on any data source. With Starburst you

Published 11/11/22

Building on the Data Cloud with Torsten Grabs

Building and managing data-intensive applications has traditionally been costly and complex, and has placed an operational burden on developers to maintain as their organization scales. Todays’ developers, data scientists, and data engineers need a streamlined, single cloud data platform for building applications, pipelines, and machine learning models — without having to move or copy their

Published 11/07/22

Serverless Clickhouse for Developers with Jorge Sancha

Data analytics technology and tools have seen significant improvements in the past decade. But, it can still take weeks to prototype, build and deploy new transformations and deployments, usually requiring considerable engineering resources. Plus, most data isn’t real-time. Instead, most of it is still batch-processed. Tinybird Analytics provides an easy way to ingest and query

Published 09/12/22

Lakehouse Data Stack with Raj Bains

Originally published on April 12, 2022. As companies move to Spark and a Lakehouse architecture, they are realizing that the data tools are lagging way behind. You need to be a programmer to effectively use Spark and Airflow. There are some low-code ETL tools, but is that enough? Companies want to treat their data pipelines

Published 08/25/22

Data Infrastructure for Finance

Data is becoming a bank’s biggest asset. These complex enterprises have a huge opportunity ahead – to transform themselves to become a trusted hub of a much broader data ecosystem that goes beyond the financial industry and helps to form a new class of cross-industry experience architectures that are scalable and transparent. The data physics

Published 08/18/22

Faking Data Using Tonic.ai with Ian Coe and Adam Kamor

Ian Coe CEO Adam Kamor Head of Engineering Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable

Published 08/05/22

Couchbase with Ravi Mayuram

Published 07/28/22

Decodable Streaming with Eric Sammer

Published 06/01/22

Data Delivery with Naqeeb Memon

Data-as-a-service is a company category type that is not as common as API-as-a-service, software-as-a-service, or platform-as-a-service. In order to vend data, a data-as-a-service provider needs to define how that data will be priced, stored, and delivered to users: streaming over an API or served via static files. Naqeeb Memon of Safegraph joins the show

Published 05/14/22

Data Labeling with Michael Malyuk

Data labeling allows machine learning algorithms to find patterns among the data. There are a variety of data labeling platforms that enable humans to apply labels to this data and ready it for algorithms. Heartex is a data labeling platform with an open source core. Michael Malyuk joins the show to talk through the platform

Published 05/11/22

Pinot and StarTree with Chinmay Soman

Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a

Published 05/09/22

Data Loss Prevention with Yasir Ali

Data loss can occur when large data sources such as Slack or Google Drive get leaked. In order to detect and avoid leaks, a data asset graph can be built to understand the risks of a company environment. Polymer is a data loss prevention product that helps companies avoid problematic data leaks. Yasir Ali is

Published 04/29/22

Airbyte Engineering with Michel Tricot

Data integration infrastructure is not easy to build. Moving large amounts of data from one place to another has historically required developers to build ad hoc integration points to move data between SaaS services, data lakes, and data warehouses. Today, there are dedicated systems and services for moving these large batches of data. Airbyte builds

Published 04/27/22

Select Star with Shinji Kim

Modern organizations eventually face data governance challenges. Keeping track of where data came from, what systems update it, in what ways updates can be made are just some of the issues to be tackled. Large organizations face additional challenges like training, onboarding, and capturing the institutional knowledge that leaves with the departure of key team

Published 04/25/22

Time Series IoT on InfluxDB with Brian Gilmore

The solution many turn to for capturing their streaming data is InfluxDB. In this episode, I interview Brian Gilmore, Director of Product Management at InfluxData, about how real time applications achieve success built on top of InfluxDB. When most people hear the phrase Internet of Things, it typically evokes an image of connected devices we

Published 04/14/22

Lakehouse Data Stack with Raj Bains

As companies move to Spark and a Lakehouse architecture, they are realizing that the data tools are lagging way behind. You need to be a programmer to effectively use Spark and Airflow. There are some low-code ETL tools, but is that enough? Companies want to treat their data pipelines like mission-critical apps. They want DevOps

Published 04/12/22

Data Engineering Trends with Lior Gavish and James Densmore

Lior Gavish James Densmore Data infrastructure is a fast-moving sector of the software market. As the volume of data has increased, so too has the quality of tooling to support data management and data engineering. In today’s show, we have a guest from a data intensive company as well as a company that builds a

Published 04/05/22

PlanetScale Management with Sam Lambert

Running a database company requires expertise in both technical and managerial skills. There are deeply technical engineering questions around query paths, scalability, and distributed systems. And there are complex managerial questions around developer productivity and task allocation. Sam Lambert is the CEO of PlanetScale, which is building modern relational database infrastructure. Before PlanetScale he spent

Published 03/31/22

SingleStore with Jordan Tigani

SingleStore is a multi-use, multi-model database designed for transactional and analytic workloads, as well as search and other domain specific applications. SingleStore is the evolution of the database company MemSQL, which sought to bring fast, in-memory SQL database technology to market. Jordan Tigani is Chief Product Officer of SingleStore and joins the show to talk

Published 03/29/22

DuckDB with Hannes Muleisen

DuckDB is a relational database management system with no external dependencies, with a simple system for deployment and integration into build processes. It enables complex queries in SQL with a large function library, and provides transactional guarantees through multi-version concurrency control. Hannes Mühleisen works on DuckDB and joins the show to talk about query engines

Published 03/19/22

RudderStack Engineering with Soumaydeb Mitra

Customer data pipelines power the backend of many successful web platforms. In a customer data pipeline, data is collected from sources such as mobile apps and cloud SaaS tools, transformed and munged using data engineering, stored in data warehouses, and piped to analytics, advertising platforms, and data infrastructure. RudderStack is an open source customer data

Published 03/16/22