Dear Analyst #119: Developing the holy “grail” model at Lyft, user journeys, and hidden analytics with Sean Taylor
Listen now
Description
Future Dear Analyst episodes will get more sporadic since, well, life gets in the way. Unfortunately curiosity (in most cases) doesn't pay the bills. Nevertheless, when I come across an idea or person that I think is worth sharing/learning more about, I'll try my best to post. In this episode, I interview the Chief Scientist of a data startup who did his PhD at Stern NYU and was on track go becoming a professor. Then he got an internship at Facebook and everything changed. The speed of learning at a tech company outpaced what the academic was used to at university. Over the years, Sean Taylor has worked with and spoken to hundreds of data analysts and statisticians. We'll dive into his data science work at Lyft, his notion of "hidden analytics," and why he's obsessed with user journeys in modern applications. Modeling the Lyft marketplace and creating the GRAIL model Sean worked at Facebook for 5 years as a research scientist and worked on general data problems. Eventually he joined the revenue operations science team at Lyft. His team's goal was to help grow the marketplace of riders and drives on the platform. One of the most important aspects of the marketplace is the forecast. As Lyft runs promotions and enters new cities, how do you ensure there are enough drivers for the riders and vice versa? The team ultimately decided that a simple cohort methodology would be best to help set the forecast for both drivers and riders. Every rider, for instance, would belong to a cohort based on when they first signed up for Lyft, when they booked their first ride, etc. There's a "liquidation curve" for each cohort that eventually hugs the x-axis. There is much more detail about the cohort methodology in this blog post by the Lyft Engineering team from 2019. Despite being such a simple model, the model worked surprisingly well. Goals of this model taken from the blog post mentioned in the previous paragraph: * Forecast the behavior of each observed cohort and use it to project how many rides are taken or driver hours are provided within a specific cohort * Forecast the behavior of the cohorts that are yet to be seen. * Aggregate all the projected rides and driver hours to make forecasts for both the demand and supply side of our business. Sean talked about how there were flaws in the model, and one of those flaws is that a marketplace is ver fluid and evolves over time. When a rider is exposed ot high prices, this may lead to churn and this was also not included in the model. Sean's team tried building a better model called GRAIL but Sean left Lyft before completing the model. Source: Symposiums Speaking of Lyft's data team, I had mentioned Amundsen, an open source data discovery platform Lyft released in 2019 (blog post). It's great to see the data team at Lyft giving back to the ecosystem to help data analysts and data scientists do their job better! Discovering a bug that cost the company $15M per year One of the best feelings as a data analyst is using data to uncover the root cause or underlying trends in a given business situation.
More Episodes
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24