Elham Azizi is a postdoctoral fellow at the Computational & Systems Biology Program at Memorial Sloan Kettering Cancer Center. She will be joining Columbia University as an Assistant Professor of Biomedical Engineering and Herbert & Florence Irving Professor of Cancer Data Research in the Irving Institute for Cancer Dynamics in January 2020. Her research utilizes single-cell genomic technologies combined with statistical machine learning techniques, to characterize interacting cells in the tumor microenvironment as well as their dysregulated gene circuitry. She received a PhD in Bioinformatics from Boston University (2014), an MS degree in Electrical Engineering also from Boston University (2010) and a BS in Electrical Engineering from Sharif University of Technology (2008). She is a recipient of the NIH NCI Pathway to Independence Award, the Tri-Institutional Breakout Prize for Junior Investigators, and an American Cancer Society Postdoctoral Fellowship.
Probabilistic modeling of evolving T cell states during response and resistance to Donor Lymphocyte Infusion
Donor lymphocyte infusion (DLI) is a standard of care immunotherapy for relapsed leukemia after allogeneic hematopoietic stem cell transplant. We mapped evolving T cell states in response or resistance to DLI in a cohort of Chronic myelogenous leukemia (CML) patients, using single-cell RNA-seq and statistical machine learning tools. This revealed temporal dynamics of exhausted T cells that expand in responders and are inexistent in non-responders to DLI. Using a Bayesian framework, we integrated ATAC-seq with single-cell RNA-seq data and found these outcome-specific T cell subsets are driven by distinct sets of regulators.
Alexis Battle is an Associate Professor of Biomedical Engineering at Johns Hopkins University, and a 2016 Searle Scholar. Her research group focuses on understanding the impact of genetic variation on the human body, using machine learning and probabilistic methods to analyze large scale genomic data. She is interested in applications to personal genomics, genetics of gene expression, and gene networks in disease, leveraging diverse data to infer more comprehensive models of genetic effects on the cell. She earned her Ph.D. in Computer Science in 2013 from Stanford University, where she also received her Bachelor’s degree in Symbolic Systems in 2003. Alexis spent several years in industry as a manager and member of the technical staff at Google, Inc. She joined Johns Hopkins University in July 2014.
Dr. Bonneau focuses on two main categories of computational biology: learning networks from functional genomics data and predicting and designing protein and peptoid structure. In both areas he has played key roles in achieving critical field-wide milestones. In the area of structure prediction he was one of the early authors on the Rosetta code, which was one of the first codes to demonstrate accurate and comprehensive ability to predict protein structure in the absence of sequence homology. His lab has also made key contributions to the areas of genomics data analysis. They focus on two main areas: 1) methods for network inference that learn dynamics and topology from data (the Inferelator) , and 2) methods that learn condition dependent co-regulated groups from integrations of different genomics data-types (integrative biclustering). His lab strives to develop new methods that let systems-biologists derive functional forms from relevant biology and parameters from data automatically. Dr. Bonneau has also helped to start a new project with political scientists and experimental psychologists to apply methods for learning network structure from time series to social media time series data (using Twitter, online blogs about politics, and Facebook as our initial data sources (recently funded by NSF INSPIRE, http://smapp.nyu.edu/).
Structure-Based Function Prediction using Graph Convolutional Networks
Recent massive increases in the number of sequences available in public databases challenges current experimental approaches to determining protein function. These methods are limited by both the large scale of these sequences databases and the diversity of protein functions. We present a deep learning Graph Convolutional Network (GCN) trained on sequence and structural data and evaluate it on ~40k proteins with known structures and functions from the Protein Data Bank (PDB). Our GCN predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and competing methods. Feature extraction via a language model removes the need for constructing multiple sequence alignments or feature engineering. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 30% sequence identity to the training set. Using class activation mapping, we can automatically identify structural regions at the residue-level that lead to each function prediction for every protein confidently predicted, advancing site-specific function prediction. De-noising inherent in the trained model allows an only minor drop in performance when structure predictions are used, including multiple de novo protocols. We use our method to annotate all proteins in the PDB, making several new confident function predictions spanning both fold and function trees.
Elodie Ghedin, PhD is Director of the Center for Genomics and Systems Biology at New York University, and Professor of Biology and Global Public Health. Her laboratory uses comparative genomics, evolutionary biology, and systems biology techniques to generate critical insight about host-pathogen interactions. Prof. Ghedin’s research program meets at the interface of molecular parasitology, microbiology, and genomics and focuses on the molecular basis of macroparasite (nematode) adaptation to niches in their human hosts, and microparasite (virus and bacteria) diversity and interaction in transmission and virulence.
Prof. Ghedin received her BS in Biology and PhD in Molecular Parasitology from McGill University (Montreal, Canada). She was named a MacArthur Foundation Fellow (2011), A Kavli Frontier of Science Fellow (2012), and an American Academy of Microbiology Fellow
Defective Virus Genomes in Host-Virus Interactions
Defective influenza virus particles generated during viral replication carry incomplete genomes and can interfere with the replication of competent viruses. These defective viruses are found in a substantial proportion of the virus population within infected hosts. They are also thought to modulate disease severity and pathogenicity of the influenza infection. Our studies in natural human infections, longitudinal animal experiments, and single cell analyses address this complex virus-host interplay to better understand how influenza virus evolves within infected hosts.
Anshul Kundaje is an Assistant Professor of Genetics and Computer Science at Stanford University. The Kundaje lab develops interpretable machine learning and deep learning approaches for large-scale integrative analysis of functional genomic data to decode regulatory elements and pathways across diverse cell types and tissues and understand their role in cellular function and disease. Anshul completed his Ph.D. in Computer Science in 2008 from Columbia University. As a postdoc at Stanford and MIT/Broad, he led the integrative analysis efforts for two of the largest functional genomics consortia - The Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Dr. Kundaje is a recipient of the 2016 NIH Director’s New Innovator Award and The 2014 Alfred Sloan Foundation Fellowship. Anshul is also a member of the NIH Director's Advisory Committee for Artificial Intelligence in Biomedical Research.
Deep learning at base-resolution reveals motif syntax of the cis-regulatory code
Functional genomics experiments profiling genome-wide regulatory state have revealed millions of putative regulatory elements in diverse cell states. These massive datasets have spurred the development of neural network models that can accurately map DNA sequence to associated cell-type specific molecular phenotypes such as transcription factor (TF) binding, chromatin accessibility and gene expression. Beyond high prediction accuracy, the primary appeal of neural networks is that they are capable of learning predictive sequence features and modeling non-linear feature interactions directly from raw DNA sequence with minimal assumptions. Hence, interpreting these purported black box models could reveal novel insights into the cis-regulatory code. Here, we introduce a convolutional neural network, BPNet, which can model base-resolution TF binding profiles from ChIP-nexus/exo experiments using raw DNA sequence. We apply BPNet to model combinatorial binding profiles of four pluripotency transcription factors Oct4, Sox2, Nanog, and Klf4 in mESCs. We develop a suite of model interpretation methods to learn novel motifs and motif representations, accurately map predictive motif instances in the genome and identify higher-order rules by which combinatorial motif syntax influences binding of these TFs. We find that instances of strict motif spacing are largely due to retrotransposons, but that soft motif syntax influences motif interactions at protein and nucleosome range. Most strikingly, Nanog binding is driven by motifs with a strong preference for ∼10.5 bp spacings corresponding to helical periodicity. BPNet can be easily adapted to other types of profiling experiments (e.g. ChIP-seq, DNase-seq, ATAC-seq, PRO-seq), thus paving the way to decipher the complexity of the cis-regulatory code using deep learning models of functional genomics data.
Professor Joakim Lundeberg heads the Department of Gene Technology, KTH Royal Institute of Technology and focus on molecular technology development. His research group is since May 2010 located at the Science for Life Laboratory (SciLifeLab), a national center for molecular biosciences with focus on health and environmental research. The center combines frontline technical expertise with advanced knowledge of translational medicine and molecular bioscience. The current research focus of JL relates to spatially resolved gene expression studies in situ, Spatial Transcriptomics. RNA-sequencing offers the possibility to analyze the expression of all genes in a sample. However, the spatial information of gene expression is lost. In the pioneering work a method was described that allowed studies of gene expression in tissue sections using RNA-sequencing to uncover transcriptional patterns in situ (Ståhl et al, Science, 2016). The basic concept is remarkably simple; by placing tissue sections on arrayed reverse transcription oligonucleotides with positional barcodes, cDNA for RNA-sequencing can be generated with maintained positional information within the tissue. The quality of the obtained cDNA libraries is as high as with the best protocols for homogenized tissue. Applying this strategy has been demonstrated to work remarkably well and allows visualizing and quantifying the transcriptome in regular histological tissue sections, i.e. tissue domains can be matched to precise gene expression patterns. Furthermore, data driven methods can be applied to discover in an unsupervised manner transcriptomic patterns in space. Such patterns correspond to cell-types, microenvironments, or tissue components that allows for novel avenues of research.
Exploring data driven analysis of spatially resolved transcriptomes in situ and in single cells.
In the presentation we will describe our Spatial Transcriptomics technology to generate transcriptome wide data from imaged tissue sections. We will also demonstrate the use of unsupervised principles to view the molecular landscape in the investigated normal and pathological conditions combined with single cell annotations.
We utilize computational and experimental methodologies to identify and characterize the essential genetic elements that guide the function of the human genome, with a particular emphasis on the elements that orchestrate the development of the human brain. Our lab creates detailed cell-specific molecular maps of genetic, epigenetic, transcriptional, and translational activity, creating a draft of the molecular recipe for the creation of the brain. We also develop methods to detect, catalog and functionally annotate variants in the genetic pathways that control developmental processes and how they are perturbed to create disease. We aim to understand of the functional elements of the human genome well enough to enable, eventually, the ability to repair, re-engineer, or fortify these genetic networks within human cells.
Planetary-scale genomics and precision astronaut medicine
The avalanche of easy-to-create genomics data has impacted almost all areas of medicine and science, from cancer patients and microbial diagnostics to molecular monitoring for astronauts in space. Recent technologies and algorithms from our laboratory and others demonstrate that an integrative, cross-kingdom view of patients (precision metagenomics) holds unprecedented biomedical potential to discern risk, improve diagnostic accuracy, and to map both genetic and epigenetic states around the world and in real-time. Finally, these methods and molecular tools work together to guide the most comprehensive, longitudinal, multi-omic view of human astronaut physiology in the NASA Twins Study, which lay the foundation for future long-duration spaceflight, including sequencing, quantifying, and engineering genomes in space.
Sohrab Shah was appointed to MSK in April 2018 as the inaugural Chief of the Computational Oncology Service and is the incumbent of the Nicholls-Biondi Chair. He received a PhD in computer science from the University of British Columbia in 2008 and was appointed as a Principal Investigator to The British Columbia Cancer Agency and the University of British Columbia in 2010 where he developed the roots of his research program. He is a University of British Columbia Killam laureate and a Susan G. Komen Foundation Scholar. His research focuses on cancer evolution, where he uses integrative approaches involving genomics and computational modeling. He has led major projects including the analysis team of the METABRIC consortium, and has published major works in breast and ovarian cancer genomics, including the first description of mutational evolution in a breast cancer patient (Shah et al. Nature 2009), the first mutational landscape of triple negative breast cancers (Shah et al. Nature 2012) and single cell resolution demonstration of clonal evolution in breast cancer xenografts (Eirew et al. Nature 2015). More recently, he has made seminal contributions to understanding clonal evolution in ovarian cancer and discovered that specific mutational patterns related to foldback inversions in the genomes of ovarian cancers are prognostic in terms of treatment outcomes. Dr. Shah’s recent focus is in deciphering clonal evolution and mutational processes at single cell resolution. His work has been published in Nature, Nature Genetics, Nature Methods, Cell, NEJM, Genome Research, Genome Biology, amongst others.
Inferring evolutionary fitness in cancer
In this talk I will discuss new approaches for inferring fitness of clonal populations in cancer using single cell sequencing and timeseries observations. Most computational models of fitness in cancer evolution are rooted in estimating how selection operated over the life history of a cancer, inferred from a single time point and bilk sequencing. I will present an experimental design and computational modeling framework for quantifying fitness and growth trajectories of distinct clonal populations. I will discuss how scaled whole genome sequencing, phylogenetics, Wright Fisher diffusion processes and phenotypic associations can provide rich insights and a quantitative rationale for predicting the growth trajectories of cancer cells from model systems to patients.
Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Applied Physics and Applied Mathematics as well as the Department of Systems Biology, and is affiliated faculty in Statistics. He is a co-founder and co-organizer of hackNY (http://hackNY.org), a nonprofit which since 2010 has organized once a semester student hackathons and the hackNY Fellows Program, a structured summer internship at NYC startups. Prior to joining the faculty at Columbia he was a Courant Instructor at NYU (1998-2001) and earned his PhD at Princeton University (1993-1998) in theoretical physics. He is a Fellow of the American Physical Society and is a recipient of Columbia's Avanessians Diversity Award. He is currently writing a book on the history and ethics of data with Professor Matt Jones (Columbia) forthcoming from W. W. Norton & Company in 2021.
Data science at the new york times: lessons from systems biology
Within both the communities of academia and industry, ways of understanding complex real world systems are being transformed by the availability of copious data. Such transformation can bring both opportunity and challenge; biology experienced their "data moment" in the 1990s with the sequencing of whole organisms. Much of that experience --- such as the demands for novel collaborations across previously-separated communities, and the opportunity to create new strategies for attacking the core questions of the discipline --- has close parallels in both communities. In this talk I'll share some ways re-framing domain questions as machine learning tasks has opened up new avenues for understanding both in academic research and in real-world applications. I'll illustrate how descriptive, predictive, and prescriptive analyses have different roles in science and industry, focusing on lessons learned since 2013 in developing a new data science team at The New York Times.
Peng Yin works as an Assistant(2010-2014)/Associate(2014-2016)/Full(2016-) Professor of Systems Biology at Harvard Medical School and a Core Faculty Member at Wyss Institute for Biologically Inspired Engineering at Harvard University (2010-). He is co-founder and director of Ultivue, Inc., an early stage company for digital pathology backed by Arch Ventures. He is also co-founder and director of NuProbe Global, a startup for PCR and NGS based molecular diagnostics backed by prestigious venture funds. His research interests lie at the interface of information science, molecular engineering, and biology. The current focus is to engineer information directed self-assembly of nucleic acid (DNA/RNA) structures and devices, and to exploit such systems to do develop applications in nano-fabrication, imaging, sensing, diagnostics, and therapeutics. He is a recipient of a 2010 NIH Director's New Innovator Award, a 2011 NSF CAREER Award, a 2011 DARPA Young Faculty Award, a 2011 ONR Young Investigator Program Award, a 2013 NIH Director's Transformative Research Award, a 2013 NSF Expedition in Computing Award, a 2014 ACS Synthetic Biology Young Investigator Award, 2014/2015 Finalists for Blavatnik National Award for Young Scientists, 2014/2015 World Economic Forum Young Scientist Awards, a 2017 Tulip Award for DNA Computing and Molecular Programming, and a 2018 NIH Director’s PIONEER award. He graduated from Peking University with B.S. in Biochemistry and Molecular Biology and Bachelor of Economics in 1998, and from Duke University with M.S. in Molecular Cancer Biology in 2000 and Ph.D. in Computer Science in 2005, and did his postdoc training at CalTech (2005-2009).
DNA advancing bioimaging