Episodes
We talk with Michel Tricot, who is the Founder and CEO of Airbyte, which is an open source data integration Y Combinator startup. It has raised over $30M in capital and has been growing quite fast. It was a great conversation and I think you will also enjoy it. 🎉 We cover lots of things in the podcast including:  1. Technical aspects of what Airbyte does, how it sits in the ETL/ ELT landscape, how it differs from other tools such as Fivetran, Stich etc.  2. Data Warehouses being a...
Published 10/11/21
Published 10/11/21
Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.  Using an example of running a necklace business from shells - we learn about...
Published 08/18/21
In this episode, I'm excited to be talking with Jeff Bermant, who is the founder and CEO of Cocoon Mydata Rewards browser. It is a browser based off Chrome and it pays people to use it! ✨  In this episode we talk about data ethics and privacy, and how Jeff believes that users should be paid for their data. We talk about GDPR and similar laws in US, future of data privacy and more!  Go to https://getcocoon.com to download and use Cocoon Rewards Browser.  ~Thanks for listening~ --- Send...
Published 08/04/21
In this episode, we are talking about women in tech with Rupal Gupta. Rupal, a recent graduate from Online MS in CS from Georgia Tech, is a data engineer in the industry and is passionate to help promote women in tech. She also has some great tips and resources for anyone trying to break into data science and tech!  In this episode we talk about things that can help promote women in tech, women in tech conferences such as Grace Hopper, looking for jobs, resources to prepare for the...
Published 10/08/20
In this episode, we talk about what makes AWS SageMaker great for ML model development including model building, training and deployment. We cover 3 advantages in each of these 3 areas.  We cover points such as: 1. Host ML endpoints for deploying models to thousands or millions of users. 2. Saving costs for model training using SageMaker. 3. Use CloudWatch logs with SageMaker endpoints to debug ML models.  4. Use preconfigured environments or models provided by AWS. 5. Automatically save...
Published 06/17/20
In this episode, we are talking with Paul Azunre. Paul is one of the world’s experts in the area of Transfer Learning for NLP and is also an author of the upcoming book Transfer Learning for NLP published by Manning Publications. In this episode we talk about things such as:  1) Paul’s background and how his background in maths and optimization as well as fake news detection got him started in transfer learning in NLP. 2) How Paul got started with the book, book writing process as well as...
Published 04/13/20
In this episode, we talk about why the two libraries Scikit-Learn and Keras are great for machine learning. These two libraries combined with Pandas form the 3 core libraries in Python for a data scientist today.  We cover things like: 1)  Data Exploration and data cleaning - how Pandas and Jupyter notebooks provide a good way to get started here. 2) Data Transformation - how Scikit-Learn provides many useful functions like train_test_split, Scalers, PCA etc. 3) Data Fitting - how...
Published 01/26/20
In this episode, we talk with Akshay Kanade. He is a business analyst working in New York City who likes taking a big view of data, and has very interesting spiritual views on data analytics and life in general, he is also a handwriting expert- he can read people’s handwriting and can recognize a lot about their personalities. In this interview we will cover several things such as:  - How has been an analyst influenced Akshay's life?  - Introspection about data and analytics - Taking high...
Published 12/01/19
In this podcast episode, we do an interview! We talk with Patrick McClory, who is the founder and CEO of IntrospectData. He is an expert working in areas of data science consulting, large machine learning projects, math, statistics and more. In this episode we cover several interesting topics such as: 1) What makes a good data scientist? 2) The different roles in the industry such as data engineer, machine learning engineer, data analyst etc. 3) The first mile problem: Data ownership and...
Published 11/22/19
What should you consider for pursuing MS in US? There might be several questions in your mind as you explore this question. In this episode we cover some of the main things to consider before you make the decision. I also go into details about things which I wish I knew before coming to US for MS.  The things I cover in the podcast are to consider for MS in US are:  1)  Location matter more than rankings. 2) Talk to professors before applying. 3) Culture of hard work, and advantage of...
Published 11/15/19
The Data Life Podcast is a podcast where we talk all-about real life experiences with data and data science science tools, techniques, models and personalities.  In this episode, we will talk about how Pandas is becoming a tool of choice for many data scientists for doing their data analysis work. We will explore how Pandas wins over Excel in several key areas that are important for businesses today: 1) Large dataset sizes 2) Different kinds of input formats such as JSON, CSV, HTML, SQL...
Published 10/25/19
So many tweets and news articles and unstructured text surrounds us. How do we make sense of all of these? Natural language processing or NLP can help. NLP refers to algorithms that process, understand and generate aspects of natural language either in text or in spoken voice. In this episode we will cover some of the common techniques in NLP to help get started in this exciting field!  We cover several tasks in a NLP pipeline: 1. Tokenization and punctuation removal 2. Stemming and...
Published 10/05/19
As a data scientist, you will work on machine learning models that are deployed on websites - usually wrapped around a REST API, these days they also call this approach a “micro-service”. It is for this reason it is important to know how backends and front ends work and how to build them. In this episode, we talk about building a note app which is a Single Page Application or SPA using Pythons flask library for backend and Vue.js for frontend. We use REST API to communicate between them.  We...
Published 09/16/19
Ever wonder how to automatically detect language from a script? How does Google do it?  Ever wonder how Amazon knows whether you are searching for a product or a SKU on its search bar?  We look into character-based text classifiers in this episode. We cover 2 types of models. First is the bag-of-words models such as Naive Bayes, logistic regression and vanilla neural network. Second we cover sequence models such as LSTMs and how to prepare your characters for the LSTMs including things like...
Published 08/07/19
You and your team might spend a lot of time building a new feature. But how do you know if this feature will be liked by the users? One of the ways to statistically prove this is by using A/B testing. Listen to this episode to get tips, tricks and intuition behind hypothesis testing, alpha, beta, p-values, two-sample t-tests and more.  These understandings have been learnt from experiences deploying A/B tests in the field, and talking to experts.  These ideas are typically not covered in...
Published 07/17/19
In this episode, we will talk about the importance of business impact in data science.  "Your users don't care how smart you are" was a quote I read that got me started in thinking about this.  The right way to do data science is to think of users, revenue impact, business value and go for the simplest solution possible.  The wrong way to do data science is to just find a nail to hit the hammer with rather than the other way around.  We will cover about all this and more!  Amazon link of...
Published 06/25/19
This episode covers the ten essential machine learning questions. Disclaimer: Baseline answers have been provided in the episode for guidance. For complete accuracy, please refer to textbooks or to courses by Andrew Ng on Coursera.  If this content is useful, please consider buying me a coffee via the link https://anchor.fm/the-data-life-podcast/support  Resources: 1. Machine Learning Course by Andrew Ng: https://www.coursera.org/learn/machine-learning 2. Deep Learning Course by Andrew Ng:...
Published 06/21/19
Twitter is a rich source of live information. Is it possible to run sentiment analysis on what the world is thinking as an event unfolds over time? Could we track Twitter data and see if it correlates to news that affects stock market movements? These are some of the questions that we will answer in this podcast episode.  There are 6 steps for mining Twitter data for sentiment analysis of events that we will cover: 1) Get Twitter API Credentials 2) Setup API Credentials in Python 3) Get...
Published 06/01/19
In this episode, we will talk about things like Maslow's Hierarchy of Needs, and focussing on higher level needs such as satisfaction and achieving full potential. In the area of tech, data science and software development, admitting your interest could involve "shyness" as the next shiny cool thing is pursued by everyone. But if your interest is in a niche, don't let others stop you from putting in an effort to become great at it.  Thanks for listening, and please show your support to keep...
Published 05/19/19
Udacity has become a popular platform for learning about various things in data science, machine learning and programming in general. In this episode, we will discuss the good, bad and ugly of the Udacity nanodegrees. I will also cover my experiences with Deep Learning and NLP Nanodegrees.  We will cover things like how Udacity has great production quality and has nice intro courses, but due to their lack of depth and low community engagement, the high costs might not be justified (most of...
Published 05/03/19
In this episode we will talk all about the various steps to transition to data science from non computer science backgrounds. One of the main difficulties people face from non-CS backgrounds is how overwhelming it can be to transition to data science field, I talk about my own journey, and share the 6 steps which can help you in your own data science career!  00:00 to 02:10: Introduction 02:11 to 06:00: My Background of moving to data science from electrical engineering 06:01 to 10:56:...
Published 04/20/19
Welcome! In this episode, we will cover some of the top data science podcasts, that have helped me a lot in my own journey, and hopefully will be helpful to you as well.  The top 5 podcasts are (linked to my favorite episodes): 1) AI in Industry with Daniel Faggella 2) This week in Machine Learning and AI (TWiML) 3) DataFramed 4) Data Skeptic 5) Talk Python to Me Listen to the episode for the sixth bonus podcast! If you think I should mention another podcast here, let me know and I will...
Published 04/10/19
Have you ever thought about building a video course? Have you wanted to share your expertise with other people via a video course on different platforms like Udemy? Have you wondered what are the economics and revenue details of building a course? This podcast episode is for you!  In this episode, I talk about my experience in building my first data science video course, lessons learnt and how you can use these in your own video course.   00:00 to 09:30- I talk about my experience with...
Published 03/30/19
In this episode,  we cover the two main types of recommendation engines used at companies like Netflix and Spotify. 1) Content based recommendation systems use the genres or tags of each product to find other similar products to recommend to users. 2) Collaborative filtering based recommendation systems use user activity and user ratings on the website to recommend products.  We go through the pros and cons of each, the challenges, how do companies like Netflix and Spotify scale their...
Published 03/22/19