Episodes
There are many LLMs that you can use for programming these days. Some of them even go into your IDE like Cursor or Github Copilot. But what if you want to tweak these LLMs do to what you want? Instead of being stuck with the tools that a vendor gives you, the goal of Continue.dev is to allow you to customise this yourself. In this podcast we talk to Ty Dunn, co-founder of the project to learn more about this. If you are curious to learn more about this effort, please check out...
Published 11/06/24
Published 11/06/24
Scikit-learn's documentation pages are celebrated. But not everyone is aware that the project actually has somebody on payroll to take care of it. In this episode we talk to Arturo about stories from the scikit-learn documentation. In particular, the docs have a recommender that few folks are aware of. People just assume that it is manually curated, but there are a few base scikit-learn tools under the hood there. Link to the official scikit-learn MOOC:...
Published 10/31/24
Vector databases are kind of everywhere these days. There is a big pool of VC's that are pooring money into the ecosystem too. But while all of that is happening, sqlite has also gotten support for it. In this episode we talk the Alex Garcia, the maintainer of this project, and discuss how the project got created on what the future has in store. Sqlite-vec Github repo: https://github.com/asg017/sqlite-vec Alex Garcia...
Published 10/23/24
Jupyter has been a great environment to explore computational ideas, but that doesn't mean that it can be the only environment for interactive coding in Python. It also comes with some downsides, which led Akshay Agrawal to create an alternative called Marimo. We discussed it in a previous livestream and figured that it was time to sit down with the creator to learn what led to the development of this exciting new too. You can learn more about Marimo by going to their website over at...
Published 10/16/24
When you are working on a data pipeline for ML ... you are never dealing with a single table. It always demands different tables for different reasons that all have to be mashed together in order to have something that you can learn from. But if that is the case, why do we spend so much time talking about ML pipelines that only work on a single table? Madelon Hulsebos has a Phd on the topic and so we figured that we might ask her. As mentioned in the podcast, here is the link to Madelon's...
Published 09/10/24
When you pip install a package you will for sure end up using it later. But often you will also install a bunch of dependencies and it is very likely that you won't directly interact with all of them. That does not mean that such a package is not useful, it merely means that the package might be directly used by a maintainer instead. This is interesting, because recently one such tool came into existence. It is called Narwhals and it seems to be on track to become critical infrastructure for...
Published 08/21/24
A lot of things can (and have) gone wrong when folks tried to apply data science projects. So how might we prevent that? Maybe what we need to do is to look at the medical profession and their practice of checklists before surgery.
Published 07/17/24
Historically it's always been the case that you would use a pickle file to store a trained scikit-learn model on disk for deployment. Pickles make sense because these are so flexible, but they do carry a security concern. Adrin has been working on a remedy called skops, which is the main topic of this podcast. To learn more about skops, make sure to check the documentation: https://skops.readthedocs.io/en/stable/
Published 06/27/24
Leland McInnes is known for a lot of packages. There's UMAP, but also PyNNDescent and HDBScan. Recently he's also been working on tools to help visualise clusters of data and he's also cooking up something new that's related to nearest neighbor algorithms. This interview touches all of these topics. If you're interested in learning more about the MoMA exhibition, it was by Refik Anadol: https://refikanadol.com/ and this was the work at MoMA: https://refikanadol.com/works/unsupervised/. The...
Published 05/30/24
Ibis is a Python library that offers a single data-frame API, from Python, which can run your queries on many different backends. These include databases like Postgres, but also commercial vendors like BigQuery and Snowflake. This ability to control multiple backends from a single API has a lot of use-cases, as well as maintainer challenges, all of which are discussed in this episode. To learn more about Ibis, check out the docs here: https://ibis-project.org/ If you're attending PyCon US...
Published 05/02/24
In this (first!) episode of Sample Space we talk to Trevor Mantz, the creator of anywidget. It's a (neat!) tool to help you build more interactive notebooks by giving you tools to apply just enough Javascript to get directional communication working in your favorite notebook environment. That means that Python can talk to widgets, but also that widgets can talk to Python. There's a lot to like about these widgets and we're doing a proper deep dive in this first episode. To learn more about...
Published 04/11/24
We're starting a new podcast!
Published 04/03/24