Dear Analyst #88: How to learn data science and machine learning from scratch with Santiago Viquez
Description
Companies are generating more big data these days, so dumping the data into a CSV for analysis just doesn't cut it anymore. Sure you could use Power Query or Power BI, but more analysts are turning to Python and platforms built for big data processing. The next step is to use machine learning to help predict what the future might look like. Santiago Viquez is currently a data analytics mentor at Springboard, an education platform helping students prepare for new careers. On the side, Santiago has built a ton of cool projects related to data science, natural language processing, and more. In this conversation we dig into how Santiago learned data science from scratch during the pandemic, and how he thinks analysts should learn data science.
Santiago Viquez
Started at the bottom now we're at a multinational corporation
Santiago studied physics in Costa Rica, but realized he didn't want to pursue a career in physics. After doing some research, he realized a career in data analytics and data science would be more suitable. Having known a little bit of Python, he started applying to a few positions and eventually got his data analytics career started as an intern at a small startup. His internship turned into a full-time role as a data analyst which he kept for two years.
Santiago left the startup and went in the complete opposite direction in terms of company size. He had roles in data analysis and data science at large corporations like Walmart and UPS working remotely the entire time. During his time at Walmart, he started working part-time at Springboard helping students land careers in data analytics.
The experience working at a startup versus a large company is night and day. We've seen stories of people like Preksha in episode 85 and Lauren in episode 64 make completely new transitions to a career in data. But we don't hear about the data analytics professional who moves from startup to large company too often.
One example Santiago bought up is how corporations frame problems. You typically have clear success metrics, KPIs, stakeholders, and data sources to work with. At a startup, you are defining the problem by yourself. It's just you. You're in charge of collecting the data sources, providing analyses to key stakeholders, and owning the entire model or analysis end-to-end.
Reducing food waste for restaurants in Costa Rica with data science
When Santiago was a consultant, he was helping a big restaurant group in Costa Rica figure out ways to reduce food waste. The restaurant group consisted of 30-40 restaurants (which is big for Costa Rica). Each restaurant had its own manager and each manager would request food from various suppliers. The problem was that some managers were good at forecasting how much food they would need for the next 10-15 days, others were not so good.
Santiago's goal was to create a tool that would help each manager predict how much food to order from the suppliers. The first phase of the project was gathering data. In this case, Santiago had to get the recipes from each restaurant manager. These recipes were then joined with each restaurant's sales data to see the volume of ingredients required.
The interesting thing is that each recipe had to be...
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24