Dear Analyst #92: Generating insights from vehicle telemetry data and crafting a data strategy with Victor Rodrigues
Description
Data can come from different places, and one area I don't hear about too often is from vehicles. Victor Rodrigues is from Brazil and transitioned into a career in data six years ago. Before that, he was working in various IT including network and infrastructure administration. He eventually relocated to Dublin working as a cloud specialist for Microsoft helping organizations with digital transformation. We discuss a specific data project involving vehicle telemetry data, when data and the real world collide, and selling data solutions into the enterprise.
Pulling performance data about bus fleets and trucks
Who knew that cars generated so much data? With all the sensors and chips installed in cars these days, data is constantly being generated in real-time? According to statista, 25GB of data is generated every hour by modern connected cars:
One of Victor's first roles in the data field was as a data engineer setting up data pipelines for collecting data from vehicles. The startup he worked at helped its customers collect telemetry data about their buses, trucks, and other vehicles. They would deliver insights from this data back to the customer. The North Star goal is to reduce the costs per kilometer by any means possible.
Vehicles have various apps and IoT devices collecting and producing data. The data collected would include how long is a car stopping in traffic? How often is the engine running? Victor's job involved building ETL/ELT pipelines to collect this raw data and transform it into a data model that could be used for analytics and reporting.
Don't sleep on data strategy and architecture
Before Victor could get into the fun parts of analyzing the data, he had to build out a data strategy and architecture. This is the part where you have to decide which tools is best for a specific part of the data pipeline.
Do you go with Google Cloud or Microsoft Azure? What is the best architecture? What's the most cost-effective solution? I remember when I was studying for the AWS Solutions Architect exam, I came across AWS' Well-Architected Framework. These are playbooks for picking the right tools (within the AWS ecosystem) for various use cases and scenarios:
In setting the data strategy and architecture, the main variable that affected Victor's decision was cost. His team first started with Google Cloud and piping data into BigQuery, Google Cloud's main data warehouse solution. All of the big data warehouse tools allow you to trial the platform before throwing all your data in. He found that BigQuery was the most cost effective solution, but collecting data in Google Cloud wasn't as great as other cloud providers.
The ultimate architecture looked something like this:
* Ingest billions of rows of data in Microsoft Azure* Pipe the data into Google Cloud Bigquery for data modeling and analytics* Use Tableau and PowerBI for data visualization
Finding insights from the data and delivering impact
Victor had all this data streaming into his multi-cloud architecture, so what happens next? He helped figure out what KPIs to tr...
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24