Dear Analyst #86: One Important Excel Feature to Know to Do Your Best Data Analysis
Description
Nothing like a click-baity headline to get your spreadsheet emotion all riled up amirite? Earlier in my data analysis career, I thought knowing advanced Excel formulas and writing macros made you a good analyst. If you've been following this podcast/newsletter, you've probably discovered that there is no one magic Excel feature that automatically makes you a good data analyst. The key to good data analysis is a soft skill: asking good questions (see this episode with the co-founder of Mode, Benn Stancil). Having said that, there is one Excel feature that I learned early in my career that really helped me improve my data analysis: PivotTables. Why PivotTables and what features about PivotTables make them so good for data analysis? Read on for more. You can also download the Excel file used in this episode here.
Michael Jordan: an expert user of the pivot
Video tutorial of this episode:
https://www.youtube.com/watch?v=WXdu4ijNly8
1. Getting data to look right for PivotTables
This is a requirement for PivotTables that also teaches you important lessons about how to structure your data. It's tempting to see a long list of data and just say "throw a PivotTable on it!" and not think through if the PivotTable will actually "work." What I mean by this is that the underlying source data has to be laid out correctly in order for you to do any type of exploratory data analysis on your data set. Take this common layout of data you might see in a PowerPoint presentation or some deck (video game sales):
You have years (or some time period) along the columns and then some measure in the rows. Perfectly fine table of data for seeing sales, in this case, for different video game companies. Now there are a ton of other companies and devices to report on, so maybe for the purposes of data analysis, you want to put this into a PivotTable. Once you do that, this is what the PivotTable settings look like:
Not super useful for quickly seeing sales trends by year, seeing which devices had the most sales, etc. You have to "drag in" each year into the Columns field to see the actual sales.
Transforming and massaging the summary table
To make this data look right, it requires a bit of manual massaging of the data. There are ways you could automate this with a macro, but it's important to understand why the data should look this way before you "throw a PivotTable" on the data:
A few things to note about this dataset:
* Years are no longer across the top in the columns. It's just one column with the value being the year itself.* The sales metric is also its own column and not spread out across columns.
Given that more data you're analyzing is coming from databases, data will most likely be structured like this to begin with. But when you're manually aggregating data from reports or other sources in Excel and need to get it into a format ready for PivotTables, this is the transformation you need to do in order for it to look right for data analysis.
2.
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24