All episodes of Sample Space

Episodes

Published 12/06/24

You want to be in control of your own Copilot - with Ty Dunn, co-founder at Continue.dev

There are many LLMs that you can use for programming these days. Some of them even go into your IDE like Cursor or Github Copilot. But what if you want to tweak these LLMs do to what you want? Instead of being stuck with the tools that a vendor gives you, the goal of Continue.dev is to allow you to customise this yourself. In this podcast we talk to Ty Dunn, co-founder of the project to learn more about this. If you are curious to learn more about this effort, please check out...

Published 11/06/24

Sample Space

Published 11/06/24

What it is like to maintain the scikit-learn docs

Scikit-learn's documentation pages are celebrated. But not everyone is aware that the project actually has somebody on payroll to take care of it. In this episode we talk to Arturo about stories from the scikit-learn documentation. In particular, the docs have a recommender that few folks are aware of. People just assume that it is manually curated, but there are a few base scikit-learn tools under the hood there. Link to the official scikit-learn MOOC:...

Published 10/31/24

Sqlite can totally do embeddings now

Vector databases are kind of everywhere these days. There is a big pool of VC's that are pooring money into the ecosystem too. But while all of that is happening, sqlite has also gotten support for it. In this episode we talk the Alex Garcia, the maintainer of this project, and discuss how the project got created on what the future has in store. Sqlite-vec Github repo: https://github.com/asg017/sqlite-vec Alex Garcia...

Published 10/23/24

How to rethink the notebook - with Akshay Agrawal, co-creator of Marimo

Jupyter has been a great environment to explore computational ideas, but that doesn't mean that it can be the only environment for interactive coding in Python. It also comes with some downsides, which led Akshay Agrawal to create an alternative called Marimo. We discussed it in a previous livestream and figured that it was time to sit down with the creator to learn what led to the development of this exciting new too. You can learn more about Marimo by going to their website over at...

Published 10/16/24

You are always dealing with many tables - with Madelon Hulsebos

When you are working on a data pipeline for ML ... you are never dealing with a single table. It always demands different tables for different reasons that all have to be mashed together in order to have something that you can learn from. But if that is the case, why do we spend so much time talking about ML pipelines that only work on a single table? Madelon Hulsebos has a Phd on the topic and so we figured that we might ask her. As mentioned in the podcast, here is the link to Madelon's...

Published 09/10/24

How Narwhals has many end users ... that never use it directly with Marco Gorelli

When you pip install a package you will for sure end up using it later. But often you will also install a bunch of dependencies and it is very likely that you won't directly interact with all of them. That does not mean that such a package is not useful, it merely means that the package might be directly used by a maintainer instead. This is interesting, because recently one such tool came into existence. It is called Narwhals and it seems to be on track to become critical infrastructure for...

Published 08/21/24

Pragmatic data science checklists with Peter Bull - cofounder Drivendata

A lot of things can (and have) gone wrong when folks tried to apply data science projects. So how might we prevent that? Maybe what we need to do is to look at the medical profession and their practice of checklists before surgery.

Published 07/17/24

Model safety, that's a pickle! with Adrin Jalali - scikit-learn maintainer

Historically it's always been the case that you would use a pickle file to store a trained scikit-learn model on disk for deployment. Pickles make sense because these are so flexible, but they do carry a security concern. Adrin has been working on a remedy called skops, which is the main topic of this podcast. To learn more about skops, make sure to check the documentation: https://skops.readthedocs.io/en/stable/

Published 06/27/24

Moving Towards KDearestNeighbors with Leland McInnes - creator of UMAP

Leland McInnes is known for a lot of packages. There's UMAP, but also PyNNDescent and HDBScan. Recently he's also been working on tools to help visualise clusters of data and he's also cooking up something new that's related to nearest neighbor algorithms. This interview touches all of these topics. If you're interested in learning more about the MoMA exhibition, it was by Refik Anadol: https://refikanadol.com/ and this was the work at MoMA: https://refikanadol.com/works/unsupervised/. The...

Published 05/30/24

Talk like a DataFrame, run like SQL with Phillip Cloud - core-committer on Ibis

Ibis is a Python library that offers a single data-frame API, from Python, which can run your queries on many different backends. These include databases like Postgres, but also commercial vendors like BigQuery and Snowflake. This ability to control multiple backends from a single API has a lot of use-cases, as well as maintainer challenges, all of which are discussed in this episode. To learn more about Ibis, check out the docs here: https://ibis-project.org/ If you're attending PyCon US...

Published 05/02/24

Enhancing Jupyter with Widgets with Trevor Manz - creator of anywidget.

In this (first!) episode of Sample Space we talk to Trevor Mantz, the creator of anywidget. It's a (neat!) tool to help you build more interactive notebooks by giving you tools to apply just enough Javascript to get directional communication working in your favorite notebook environment. That means that Python can talk to widgets, but also that widgets can talk to Python. There's a lot to like about these widgets and we're doing a proper deep dive in this first episode. To learn more about...

Published 04/11/24

Introducing Sample Space

We're starting a new podcast!

Published 04/03/24