Episodes
In this episode, Jacob Schreiber interviews Žiga Avsec about a recently released model, Enformer. Their discussion begins with life differences between academia and industry, specifically about how research is conducted in the two settings. Then, they discuss the Enformer model, how it builds on previous work, and the potential that models like it have for genomics research in the future. Finally, they have a high-level discussion on the state of modern deep learning libraries and which ones...
Published 11/09/21
Published 11/09/21
The Bioinformatics Contest is back this year, and we are back to discuss it! This year’s contest winners Maksym Kovalchuk (1st prize) and Matt Holt (2nd prize) talk about how they approach participating in the contest and what strategies have earned them the top scores. Timestamps and links for the individual problems: 00:10:36 Genotype Imputation 00:21:26 Causative Mutation 00:30:27 Superspreaders 00:37:22 Minor Haplotype 00:46:37 Isoform...
Published 09/27/21
In this episode, Apostolos Chalkis presents sampling steady states of metabolic networks as an alternative to the widely used flux balance analysis (FBA). We also discuss dingo, a Python package written by Apostolos that employs geometric random walks to sample steady states. You can see dingo in action here. Links: Dingo on GitHub Searching for COVID-19 treatments using metabolic networks Tweag open source fellowships This episode was originally published...
Published 07/28/21
In this episode, Jacob Schreiber interviews Da-Inn Erika Lee about data and computational methods for making sense of 3D genome structure. They begin their discussion by talking about 3D genome structure at a high level and the challenges in working with such data. Then, they discuss a method recently developed by Erika, named GRiNCH, that mines this data to identify spans of the genome that cluster together in 3D space and potentially help control gene regulation. ...
Published 06/23/21
In this episode, Michael Love joins us to talk about the differential gene expression analysis from bulk RNA-Seq data. We talk about the history of Mike’s own differential expression package, DESeq2, as well as other packages in this space, like edgeR and limma, and the theory they are based upon. Mike also shares his experience of being the author and maintainer of a popular bioninformatics package. Links: Moderated estimation of fold change and dispersion for...
Published 05/12/21
In this episode, Lindsay Pino discusses the challenges of making quantitative measurements in the field of proteomics. Specifically, she discusses the difficulties of comparing measurements across different samples, potentially acquired in different labs, as well as a method she has developed recently for calibrating these measurements without the need for expensive reagents. The discussion then turns more broadly to questions in genomics that can potentially be addressed using proteomic...
Published 04/21/21
In this episode, we learn about B cell maturation and class switching from Hamish King. Hamish recently published a paper on this subject in Science Immunology, where he and his coauthors analyzed gene expression and antibody repertoire data from human tonsils. In the episode Hamish talks about some of the interesting B cell states he uncovered and shares his thoughts on questions such as «When does a B cell decide to class-switch?» and «Why is the antibody isotype correlated with...
Published 03/31/21
In this episode, Jacob Schreiber interviews Molly Gasperini about enhancer elements. They begin their discussion by talking about Octant Bio, and then dive into the surprisingly difficult task of defining enhancers and determining the mechanisms that enable them to regulate gene expression. Links: Octant Bio Towards a comprehensive catalogue of validated and target-linked human enhancers (Molly Gasperini, Jacob M. Tome, and Jay Shendure)
Published 03/10/21
Polygenic risk scores (PRS) rely on the genome-wide association studies (GWAS) to predict the phenotype based on the genotype. However, the prediction accuracy suffers when GWAS from one population are used to calculate PRS within a different population, which is a problem because the majority of the GWAS are done on cohorts of European ancestry. In this episode, Bárbara Bitarello helps us understand how PRS work and why they don’t transfer well across populations. ...
Published 02/17/21
In this episode, we chat about phylogenetics with Xiang Ji. We start with a general introduction to the field and then go deeper into the likelihood-based methods (maximum likelihood and Bayesian inference). In particular, we talk about the different ways to calculate the likelihood gradient, including a linear-time exact gradient algorithm recently published by Xiang and his colleagues. Links: Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient...
Published 01/13/21
In this episode Markus Schmidt explains how seeding in read alignment works. We define and compare k-mers, minimizers, MEMs, SMEMs, and maximal spanning seeds. Markus also presents his recent work on computing variable-sized seeds (MEMs, SMEMs, and maximal spanning seeds) from fixed-sized seeds (k-mers and minimizers) and his Modular Aligner. Links: A performant bridge between fixed-size and variable-size seeding (Arne Kutzner, Pok-Son Kim, Markus Schmidt) MA...
Published 12/16/20
In this episode, Jacob Schreiber interviews Devin Schweppe about the analysis of mass spectrometry data in the field of proteomics. They begin by delving into the different types of mass spectrometry methods, including MS1, MS2, and, MS3, and the reasons for using each. They then discuss a recent paper from Devin, Full-Featured, Real-Time Database Searching Platform Enables Fast and Accurate Multiplexed Quantitative Proteomics that involved building a real-time system for quantifying...
Published 11/18/20
In this episode Will Freyman talks about identity-by-descent (IBD): how it’s used at 23andMe, and how the templated positional Burrows-Wheeler transform can find IBD segments in the presence of genotyping and phasing errors. Links: Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform (William A. Freyman, Kimberly F. McManus, Suyash S. Shringarpure, Ethan M. Jewett, Katarzyna Bryc, the 23andMe Research Team, Adam...
Published 10/27/20
In this episode, Jacob Schreiber interviews David Kelley about machine learning models that can yield insight into the consequences of mutations on the genome. They begin their discussion by talking about Calico Labs, and then delve into a series of papers that David has written about using models, named Basset and Basenji, that connect genome sequence to functional activity and so can be used to quantify the effect of any mutation. Links: Calico Labs Basset:...
Published 10/07/20
In this episode, Jacob Schreiber interviews Jill Moore about recent research from the ENCODE Project. They begin their discussion with an overview and goals of the ENCODE Project, and then discuss a bundle of papers that were recently published in various Nature journals and the flagship paper, Expanded encyclopaedias of DNA elements in the human and mouse genomes. They conclude their discussion by talking about the challenges with managing a large project as a trainee in a consortium...
Published 09/10/20
In systems biology, Boolean networks are a way to model interactions such as gene regulation or cell signaling. The standard interpretations of Boolean networks are the synchronous, asynchronous, and fully asynchronous semantics. In this episode Loïc Paulevé explains how the same Boolean networks can be interpreted in a new, “most permissive” way. Loïc proved mathematically that his semantics can reproduce all behaviors achievable by a compatible quantitative model, whereas the traditional...
Published 08/19/20
In this episode, Jacob Schreiber interviews Marinka Zitnik about applications of machine learning to drug development. They begin their discussion with an overview of open research questions in the field, including limiting the search space of high-throughput testing methods, designing drugs entirely from scratch, predicting ways that existing drugs can be repurposed, and identifying likely side-effects of combining existing drugs in novel ways. Focusing on the last of these areas, they then...
Published 07/29/20
NGLess is a programming language specifically targeted at next generation sequencing (NGS) data processing. In this episode we chat with its main developer, Luis Pedro Coelho, about the benefits of domain-specific languages, pros and cons of Haskell in bioinformatics, reproducibility, and of course NGLess itself. Links: NGLess on GitHub NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language (Luis Pedro Coelho, Renato Alves,...
Published 06/24/20