Please use the links below to view previous webinars:
- March 24, 2020 - Revealing Principles of Subcellular RNA Localization by APEX-Seq, by Furqan Fazal, Stanford Univerisity. Hosted by iRNA COSI and the RNA Society
- April 21, 2020 - Dynamic determinants of co-transcriptional gene regulation by Ana Fiszbein, Massachusetts Institute of Technology. Hosted by iRNA COSI and the RNA Society
- April 22, 2020 - DNCON2: improved protein contact prediction using two-level deep convolutional neural networks by Jianlin Chen, University of Missouri. Hosted by MLCSB COSI
- May 19, 2020 - A SARS-CoV-2 protein interaction map reveals targets for drug repurposing by Nevan Krogan, University of California, San Francisco. Hosted by ISCB
- May 20, 2020 - Deep Neural Networks for Interpreting RNA-binding Protein Target Preferences by Mahsa Ghanbari, Max Delbrück Center for Molecular Medicine. Hosted by iRNA COSI and the RNA Society
- May 26,2020 - Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding by Raluca Gordan, Duke University, and Ning Shen, Harvard Mecial School. Hosted by RegSys COSI
- June 12, 2020 - Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies by Erfan Sayyari, University of California, San Diego. Hosted by EvolCompGen COSI
- June 23, 2020 - Engineering Alternative Polyadenylation with Deep Generative Neural Networks by Johannes Linder, University of Washington. Hosted by iRNA COSI and the RNA Society
- June 26, 2020 - At Home with Covid-19 by Brian Shoichet, University of California San Francisco. Hosted by ISCB
- June 30, 2020 - Global surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model by David Buckeridge, McGill Univeristy, and Yue Li, McGill University. Hosted by ISCB
- July 7, 2020 - Genetic Basis Of De Novo Appearance Of Carotenoid Ornamentation In Bare-Parts Of Canaries by Malgorzata Gazda, University of Porto. Hosted by EvolCompGen COSI and SMBE
- July 21, 2020 - Pooled CRISPR screens with imaging on microRaft arrays reveals stress granule-regulatory factors by Emily Wheeler, University of California San Diego. Hosted by iRNA COSI and the RNA Society
- July 30, 2020 - Southern African Human Population Structure - an Opportunity to Expand Genomics Research Worldwide by Caitlin Uren, Stellenbosch University. Hosted by ASBCB
- August 11, 2020 - Protein Function Prediction using Graph Convolutional Networks with Language Model Features by Vladimir Gligorijevic, Flatiron Institute. Hosted by MLCSB COSI
March 24, 2020
The human body is composed of trillion of cells, which are the building blocks of life. Each cell is highly organized and contains RNAs that code for proteins and serve regulatory roles. The location of an RNA species within a cell can dictate its folding1, editing, splicing, translation, degradation, binding partners, catalytic activity, and even the fate of the protein that it encodes. However, characterizing the RNA contents of cellular compartments that cannot be biochemically isolated is challenging. Here we introduce APEX-seq2, a method for RNA sequencing based on the direct proximity labeling of RNA using the peroxidase enzyme APEX2. APEX-seq in nine distinct subcellular locales produced a nanometer-resolution spatial map of the human transcriptome, revealing extensive patterns of localization for diverse RNA classes and transcript isoforms. We uncovered a radial organization of the nuclear transcriptome, which is gated at the inner surface of the nuclear pore for cytoplasmic export of processed transcripts. We identified two distinct pathways of messenger RNA localization to mitochondria, each associated with specific sets of transcripts for building complementary macromolecular machines within the organelle. APEX-seq should be widely applicable to many systems and model organisms, enabling comprehensive investigations of the dynamic spatial transcriptome.
- Sun L*, Fazal FM*, Li P*, Broughton JP, Lee B, Tang L, Huang W, Kool ET, Chang HY, Zhang QC. RNA structure maps across mammalian cellular compartments. Nature Structural and Molecular Biology (NSMB), 26, 322-330 (2019)
- Fazal FM*, Han S*, Parker KR, Kaewsapsak P, Xu J, Boettiger AN, Chang HY, Ting AY. Atlas of subcellular RNA localization revealed by APEX-seq. Cell, 178, 473–490 (2019)
April 21, 2020
The architecture of mammalian genes enables the production of multiple transcripts that greatly expand the coding capacity of our genomes. Understanding how these transcripts are regulated is of particular importance in cancer genomics, as their aberrant regulation contributes to the ~10 million cancer-related deaths each year. We recently described a phenomenon called exon-mediated activation of transcription starts (EMATS) in which the splicing of internal exons impacts the spectrum of promoters used and expression level of the host gene. We showed that targeted-inhibition of splicing reduces the usage of promoters and suppresses gene expression, while evolutionary creation of a new splice site can activate cryptic promoters. My findings support a model in which splicing factors recruit transcription machinery to influence promoter choice and regulate the expression of thousands of mammalian genes.
by Jianlin Cheng
April 22, 2020
Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction.
In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks-the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length.
The source code of DNCON2 is available at https://github.com/multicom-toolbox/DNCON2/
May 19, 2020
Efforts to develop antiviral drugs versus COVID-19 or vaccines for its prevention have been hampered by limited knowledge of the molecular details of SARS-CoV-2 infection. This webinar will describe our efforts to address this challenge by expressing 26 of the 29 SARS-CoV-2 proteins in human cells and identifying the human proteins physically associated with each using affinity-purification mass spectrometry. Among 332 high-confidence SARS-CoV-2-human protein-protein interactions, we identified 66 druggable human proteins or host factors targeted by 69 compounds (29 FDA-approved drugs, 12 drugs in clinical trials, and 28 preclinical compounds). Within a subset of these, multiple viral assays identified two sets of pharmacological agents that displayed antiviral activity.
May 20, 2020 at 11:00AM EDT!
Deep learning has become a powerful paradigm to analyze the binding sites of regulatory factors including RNA-binding proteins (RBPs), owing to its strength to learn complex features from possibly multiple sources of raw data. However, the interpretability of these models, which is crucial to improve our understanding of RBP binding preferences and functions, has not yet been investigated in significant detail. We have designed a multitask and multimodal deep neural network for characterizing in vivo RBP targets. The model incorporates not only the sequence but also the region type of the binding sites as input, which helps the model to boost the prediction performance. To interpret the model, we quantified the contribution of the input features to the predictive score of each RBP. Learning across multiple RBPs at once, we are able to avoid experimental biases and to identify the RNA sequence motifs and transcript context patterns that are the most important for the predictions of each individual RBP. Our findings are consistent with known motifs and binding behaviors and can provide new insights about the regulatory functions of RBPs
Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding
by Raluca Gordan and Ning Shen
May 26, 2020 at 11:00AM EDT!
Paralogous transcription factors (TFs) are oftentimes reported to have identical DNA-binding motifs, despite the fact that they perform distinct regulatory functions. Differential genomic targeting by paralogous TFs is generally assumed to be due to interactions with protein co-factors or the chromatin environment. Using a computational-experimental framework called iMADS (integrative modeling and analysis of differential specificity), we show that, contrary to previous assumptions, paralogous TFs bind differently to genomic target sites even in vitro. We used iMADS to quantify, model, and analyze specificity differences between 11 TFs from 4 protein families. We found that paralogous TFs have diverged mainly at medium- and low-affinity sites, which are poorly captured by current motif models. We identify sequence and shape features differentially preferred by paralogous TFs, and we show that the intrinsic differences in specificity among paralogous TFs contribute to their differential in vivo binding. Thus, our study represents a step forward in deciphering the molecular mechanisms of differential specificity in TF families.
June 12, 2020
Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.
June 23, 2020
Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Rational design of gene enhancers, splice sites, 3’-end regulatory sequences and more has the potential of greatly accelerating the fields of nanotechnology and medical therapeutics. Deep neural network models, together with gradient ascent optimization, show promise for sequence design. The optimized sequences can however get stuck in local minima, have low diversity and may be computationally very costly to generate at scale. In the first part of this talk, I will present our work on using gradient-based methods to design regulatory sequences of Alternative Polyadenylation (APA), a post-transcriptional mechanism where multiple polyadenylation signals (PAS) in the mRNA compete for cleavage. Given a deep neural network trained on a massively parallel reporter assay of APA variants, we forward-engineer new functional polyadenylation signals with precisely defined cleavage and isoform distributions. In the second part of this talk, I discuss how we extend this design framework using a class of generative neural networks called deep exploration networks (DENs). By penalizing any two generated patterns based on similarity, DENs learn to jointly maximize fitness and diversity. DENs can be used to design transcription factor binding sites, splice sequences and functional proteins. In the context of APA, we used DENs to engineer PAS with more than 10-fold higher selection odds than the best gradient ascent-generated patterns.
June 26, 2020
The urgency of the coronavirus pandemic has motivated investigators world wide to seek approved drugs or investigation new drugs as a way to rapidly advance therapeutics into clinical trials to treat the disease. I will describe a large collaboration, hosted by the UCSF Quantitative Biology Institute, to do that in a mechanistically focused way. Using AP-MS, a host-pathogen network of viral and human proteins was created, and drugs were sought targeting the human partner. From among 322 high confidence human proteins associated with 26 viral proteins emerged 63 that were druggable. Against those, 69 drugs were tested for efficacy, and from these 10 drugs in two broad classes emerged: those targeting protein biogenesis, and those acting against the Sigma1 and Sigma2 receptors. The activities of these drugs, and the chemoinformatics infrastructure that supported their selection, will be discussed. The mechanism-based repurposing strategy will be compared to a complementary effort that targets viral proteins and seeks novel chemical matter, using structure-based ultra-large library docking.
Global surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model
By Yue Li and David Buckeridge
June 30, 2020
As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction COVID-19 and NPI in a semantically meaningful manner.
Genetic Basis Of De Novo Appearance Of Carotenoid Ornamentation In Bare-Parts Of Canaries
by Malgorzata Gazda
July 7, 2020
Unlike wild and domestic canaries (Serinus canaria), or any of the three dozen species of finches in genus Serinus, the domestic urucum breed of canaries exhibits bright red bills and legs. This novel trait offers a unique opportunity to understand the mechanisms of bare-part coloration in birds. To identify the mutation producing the colorful phenotype, we resequenced the genome of urucum canaries and performed a range of analyses to search for genotype-to-phenotype associations across the genome. We identified a nonsynonymous mutation in the gene BCO2 (beta-carotene oxygenase 2, also known as BCDO2), an enzyme involved in the cleavage and breakdown of full-length carotenoids into short apocarotenoids. Protein structural models and in vitro functional assays indicate that the urucum mutation abrogates the carotenoid-cleavage activity of BCO2. Consistent with the predicted loss of carotenoid-cleavage activity, urucum canaries tended to have increased levels of full-length carotenoid pigments in bill tissue and reduced levels of carotenoid-cleavage products (apocarotenoids) in retinal tissue compared with other breeds of canaries. We hypothesize that carotenoid-based bare-part coloration might be readily gained, modified, or lost through simple switches in the enzymatic activity or regulation of BCO2 and this gene may be an important mediator in the evolution of bare-part coloration among bird species.
Pooled CRISPR screens with imaging on microRaft arrays reveals stress granule-regulatory factors
by Emily Wheeler
July 21, 2020
Genetic screens using pooled CRISPR-based approaches are scalable and inexpensive, but restricted to standard readouts including survival, proliferation and sortable markers. However, many biologically relevant cell states involve cellular and subcellular changes that are only accessible by microscopic visualization, and are currently impossible to screen with pooled methods. Here we combine pooled CRISPR/Cas9 screening with microRaft array technology and high-content imaging to screen image-based phenotypes (CRaft-ID; CRISPR-based microRaft, followed by gRNA Identification). By isolating microRafts that contain genetic clones harboring individual guide RNAs, we identify RNA binding proteins (RBPs) that influence the formation of stress granules, punctate protein-RNA assemblies, that form during stress. To automate hit identification, we developed a machine-learning model trained on nuclear morphology to remove unhealthy cells or imaging artifacts. In doing so, we identified and validated previously uncharacterized RBPs that modulate stress granule abundance, highlighting the applicability of our approach to facilitate image-based pooled CRISPR screens.
Southern African Human Population Structure - an Opportunity to Expand Genomics Research Worldwide
by Caitlin Uren
July 30, 2020 at 9:00AM EDT!
Human genetic diversity in southern Africa is vast, complex and unique. Identifying and characterizing population structure in this region is not a trivial task but when performed correctly, allows for this information to be included in numerous genomic analyses such as studies investigating a populations’ demographic and genetic history and the association between this history and both Mendelian and complex diseases. I will discuss results from our population genetic and demographic studies and how this is related to various phenotypes (with a focus on tuberculosis susceptibility), and discuss various aspects of genomics that in my opinion are greatly lacking in southern Africa. I will conclude by discussing how populations worldwide will benefit from genomics research in this region.
Protein Function Prediction using Graph Convolutional Networks with Language Model Features
by Vladimir Gligorijevic
August 11, 2020 at 11:00AM EDT!
With the maturing of de novo structure prediction methods and the rise of deep learning techniques, it now becomes possible to generate high-throughput structure and function predictions for many unannotated proteins.
We will first introduce deepFRI (deep functional residue identification), our recently proposed deep learning Graph Convolutional Network (GCN) for predicting protein functions by leveraging protein contact maps representing protein structures and residue-level features from a pre-trained language model. Our model learns general structure-function relationships by robustly predicting Gene Ontology (GO) terms of proteins with < 30% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. deepFRI not only improves predictions of GO terms from protein sequences and predicted 3D structures, but also brings residue-level saliency mapping. The mapping provides insight into putative functional sites allowing for biological interpretation, hypothesis generation or the design of targeted validation experiments.