Dear Analyst #90: Biostatistics, public health, and the #1 strategy to land a job in data with Tyler Vu
Description
You go to a family gathering and everyone is fawning over you cousin who has a cushy stats job at Harvard. Knowing your cousin, you think to yourself: if my cousin can do it, so can I. Next thing you know, you are a research fellow at Harvard University. Tyler Vu was studying applied math at Cal State Fullerton and didn't realize he had a passion for Biostatistics until his fellowship at Harvard. He is currently getting his PhD in Biostatistics at UCSD and is the youngest person to ever pursue a PhD in Biostats at UCSD. In this episode we talk about doing network analysis for the public health sector, facial/voice recognition, and Tyler's #1 strategy he thinks everyone should use to land their next job or internship in data.
Predicting HIV rates when you are missing data
As a neophyte to the data science and machine learning space, Tyler definitely veered into concepts that were quite foreign to me as he discusses his current PhD thesis. His thesis involves analyzing social networks knowing that there's a lot of missing data within the context of public health. We talk about why finding the HIV rate in a sample is different from other metrics you could get from a sample.
For instance, if you want to get the average height of people in the U.S., you pick a random sample of people, find the average height, and extrapolate this to the rest of the population (roughly). This is a straightforward analysis since each person's height is independent of each other.
In the case of public health, people are connected via social networks. With HIV, predicting whether someone tests positive or negative is dependent on the people you are connected with and whether those people have tested positive or negative. In this type of analysis there's a lot of bias and "non-parametric estimation of network properties," according to Tyler. I'm not even going to pretend I know what these terms mean. There's actually very little published work on this subject so Tyler's thesis would be adding a lot to the current research on this subject.
Source: Alteryx community
Training a voice and face machine learning model
Tyler has a history of working on one-of-a-kind projects. During his undergrad years, he worked on a project that combined face and voice recognition. Kind of like having a double authenticator system if you wanted to unlock an iPhone, for instance. Since you're combining both image and voice features to train a model, it creates a "highly dimensional problem."
Tyler helped with coding the project all in MATLAB. Given the tools and frameworks available, Tyler was pleasantly surprised to see the speed in which they were able to go from hypothesis to working app on this project.
Predicting "fragile" countries
During Tyler's research at Harvard, he worked on a project to help predict which countries will become "fragile." This is the definition of a "fragile state" according to the United States Institute of Peace:
Each fragile state is fragile in its own way, but they all face significant governance and economic challenges. In fragile states, governments lack legitimacy in the eyes of citizens, and institutions struggle or fail to provide basic public goods—security, justice, and rudimentary services—and to manage political conflicts peacefully.
The project's aim was basically trying to predict which countries might become fragile in the future so that the governments could better plan for these...
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24