Episodes
In this episode, Jacob Schreiber interviews Žiga Avsec about
a recently released model, Enformer. Their discussion begins with life
differences between academia and industry, specifically about how research
is conducted in the two settings. Then, they discuss the Enformer model,
how it builds on previous work, and the potential that models like it have
for genomics research in the future. Finally, they have a high-level discussion
on the state of modern deep learning libraries and which ones...
Published 11/09/21
The Bioinformatics Contest is back this year, and we are back to discuss
it!
This year’s contest winners
Maksym Kovalchuk (1st prize) and
Matt Holt (2nd prize)
talk about how they approach
participating in the contest and what strategies have earned them the top
scores.
Timestamps and links for the individual problems:
00:10:36 Genotype Imputation
00:21:26 Causative Mutation
00:30:27 Superspreaders
00:37:22 Minor Haplotype
00:46:37 Isoform...
Published 09/27/21
In this episode, Apostolos Chalkis presents sampling steady
states of metabolic networks as an alternative to the widely used flux balance
analysis (FBA). We also discuss dingo, a
Python package written by Apostolos that employs geometric random walks to
sample steady states. You can see dingo in action
here.
Links:
Dingo on GitHub
Searching for COVID-19 treatments using metabolic networks
Tweag open source fellowships
This episode was originally published...
Published 07/28/21
In this episode, Jacob Schreiber interviews Da-Inn Erika Lee about
data and computational methods for making sense of 3D genome structure. They
begin their discussion by talking about 3D genome structure at a high level
and the challenges in working with such data. Then, they discuss a method
recently developed by Erika, named GRiNCH, that mines this data to
identify spans of the genome that cluster together in 3D space and
potentially help control gene regulation.
...
Published 06/23/21
In this episode, Michael Love joins us to talk about the differential gene
expression analysis from bulk RNA-Seq data.
We talk about the history of Mike’s own differential expression package,
DESeq2, as well as other packages in this space, like edgeR and limma, and the
theory they are based upon. Mike also shares his experience of being the
author and maintainer of a popular bioninformatics package.
Links:
Moderated estimation of fold change and dispersion for...
Published 05/12/21
In this episode, Lindsay Pino discusses the
challenges of making quantitative measurements in the field of proteomics.
Specifically, she discusses the difficulties of comparing measurements across
different samples, potentially acquired in different labs, as well as a method
she has developed recently for calibrating these measurements without the need
for expensive reagents. The discussion then turns more broadly to questions in
genomics that can potentially be addressed using proteomic...
Published 04/21/21
In this episode, we learn about B cell maturation and class switching from
Hamish King. Hamish recently published a
paper on this subject in Science Immunology, where he and his coauthors
analyzed gene expression and antibody repertoire data from human tonsils.
In the episode Hamish talks about some of the interesting B cell states he
uncovered and shares his thoughts on questions such as «When does a B cell
decide to class-switch?» and «Why is the antibody isotype correlated with...
Published 03/31/21
In this episode, Jacob Schreiber interviews Molly Gasperini about
enhancer elements. They begin their discussion by talking about Octant Bio,
and then dive into the surprisingly difficult task of defining enhancers and
determining the mechanisms that enable them to regulate gene expression.
Links:
Octant Bio
Towards a comprehensive catalogue of validated and target-linked human enhancers (Molly Gasperini, Jacob M. Tome, and Jay Shendure)
Published 03/10/21
Polygenic risk scores (PRS) rely on the genome-wide association studies (GWAS)
to predict the phenotype based on the genotype. However, the prediction
accuracy suffers when GWAS from one population are used to calculate PRS within
a different population, which is a problem because the majority of the GWAS
are done on cohorts of European ancestry.
In this episode, Bárbara Bitarello helps us
understand how PRS work and why they don’t transfer well across populations.
...
Published 02/17/21
In this episode, we chat about phylogenetics with Xiang Ji. We start with a
general introduction to the field and then go deeper into the likelihood-based
methods (maximum likelihood and Bayesian inference). In particular, we talk
about the different ways to calculate the likelihood gradient, including a
linear-time exact gradient algorithm recently published by Xiang and his
colleagues.
Links:
Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient...
Published 01/13/21
In this episode Markus Schmidt explains how seeding in read alignment works.
We define and compare k-mers, minimizers, MEMs, SMEMs, and maximal spanning seeds.
Markus also presents his recent work on computing variable-sized seeds (MEMs,
SMEMs, and maximal spanning seeds) from fixed-sized seeds (k-mers and
minimizers) and his Modular Aligner.
Links:
A performant bridge between fixed-size and variable-size seeding
(Arne Kutzner, Pok-Son Kim, Markus Schmidt)
MA...
Published 12/16/20
In this episode, Jacob Schreiber interviews Devin Schweppe about
the analysis of mass spectrometry data in the field of proteomics. They begin
by delving into the different types of mass spectrometry methods, including MS1,
MS2, and, MS3, and the reasons for using each. They then discuss a recent paper
from Devin, Full-Featured, Real-Time Database Searching Platform Enables Fast
and Accurate Multiplexed Quantitative Proteomics that involved building a
real-time system for quantifying...
Published 11/18/20
In this episode Will Freyman talks about identity-by-descent (IBD): how
it’s used at 23andMe, and how the templated
positional Burrows-Wheeler transform can find IBD segments in the presence of
genotyping and phasing errors.
Links:
Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform
(William A. Freyman, Kimberly F. McManus, Suyash S. Shringarpure, Ethan M. Jewett, Katarzyna Bryc, the 23andMe Research Team, Adam...
Published 10/27/20
In this episode, Jacob Schreiber interviews David Kelley about
machine learning models that can yield insight into the consequences of
mutations on the genome. They begin their discussion by talking about
Calico Labs, and then delve into a series of papers that David has
written about using models, named Basset and Basenji, that connect genome
sequence to functional activity and so can be used to quantify the effect of
any mutation.
Links:
Calico Labs
Basset:...
Published 10/07/20
In this episode, Jacob Schreiber interviews Jill Moore about
recent research from the ENCODE Project. They begin their
discussion with an overview and goals of the ENCODE Project, and then
discuss a bundle of papers that were recently published in various
Nature journals and the flagship paper, Expanded encyclopaedias of DNA elements in the human and mouse genomes.
They conclude their discussion by talking about the challenges with
managing a large project as a trainee in a consortium...
Published 09/10/20
In systems biology, Boolean networks are a way to model interactions such as
gene regulation or cell signaling. The standard
interpretations of Boolean networks are the synchronous, asynchronous, and
fully asynchronous semantics.
In this episode Loïc Paulevé explains how the
same Boolean networks can be interpreted in a new, “most permissive” way.
Loïc proved mathematically that his semantics can reproduce all behaviors
achievable by a compatible quantitative model, whereas the
traditional...
Published 08/19/20
In this episode, Jacob Schreiber interviews Marinka Zitnik about
applications of machine learning to drug development.
They begin their discussion with an overview of open research questions in the
field, including limiting the search space of high-throughput testing methods,
designing drugs entirely from scratch, predicting ways that existing drugs can
be repurposed, and identifying likely side-effects of combining existing drugs
in novel ways. Focusing on the last of these areas, they then...
Published 07/29/20
NGLess is a programming language specifically
targeted at next generation sequencing (NGS) data processing.
In this episode we chat with its main developer, Luis Pedro
Coelho, about the benefits of domain-specific
languages, pros and cons of Haskell in bioinformatics, reproducibility, and of
course NGLess itself.
Links:
NGLess on GitHub
NG-meta-profiler: fast processing of metagenomes using NGLess, a
domain-specific language
(Luis Pedro Coelho, Renato Alves,...
Published 06/24/20