Please use the links below to view previous webinars:
- March 24, 2020 - Revealing Principles of Subcellular RNA Localization by APEX-Seq, by Furqan Fazal, Stanford Univerisity. Hosted by iRNA COSI and the RNA Society
- April 21, 2020 - Dynamic determinants of co-transcriptional gene regulation by Ana Fiszbein, Massachusetts Institute of Technology. Hosted by iRNA COSI and the RNA Society
- April 22, 2020 - DNCON2: improved protein contact prediction using two-level deep convolutional neural networks by Jianlin Chen, University of Missouri. Hosted by MLCSB COSI
- May 19, 2020 - A SARS-CoV-2 protein interaction map reveals targets for drug repurposing by Nevan Krogan, University of California, San Francisco. Hosted by ISCB
- May 20, 2020 - Deep Neural Networks for Interpreting RNA-binding Protein Target Preferences by Mahsa Ghanbari, Max Delbrück Center for Molecular Medicine. Hosted by iRNA COSI and the RNA Society
- May 26,2020 - Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding by Raluca Gordan, Duke University, and Ning Shen, Harvard Mecial School. Hosted by RegSys COSI
- June 12, 2020 - Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies by Erfan Sayyari, University of California, San Diego. Hosted by EvolCompGen COSI
- June 23, 2020 - Engineering Alternative Polyadenylation with Deep Generative Neural Networks by Johannes Linder, University of Washington. Hosted by iRNA COSI and the RNA Society
- June 26, 2020 - At Home with Covid-19 by Brian Shoichet, University of California San Francisco. Hosted by ISCB
- June 30, 2020 - Global surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model by David Buckeridge, McGill Univeristy, and Yue Li, McGill University. Hosted by ISCB
- July 7, 2020 - Genetic Basis Of De Novo Appearance Of Carotenoid Ornamentation In Bare-Parts Of Canaries by Malgorzata Gazda, University of Porto. Hosted by EvolCompGen COSI and SMBE
- July 21, 2020 - Pooled CRISPR screens with imaging on microRaft arrays reveals stress granule-regulatory factors by Emily Wheeler, University of California San Diego. Hosted by iRNA COSI and the RNA Society
- July 30, 2020 - Southern African Human Population Structure - an Opportunity to Expand Genomics Research Worldwide by Caitlin Uren, Stellenbosch University. Hosted by ASBCB
- August 11, 2020 - Protein Function Prediction using Graph Convolutional Networks with Language Model Features by Vladimir Gligorijevic, Flatiron Institute. Hosted by MLCSB COSI
- August 24, 2020 - Unravelling the mystery of orphan genes to understand the origins of genetic novelty by Nikos Vakirlis, Alexander Fleming Biomedical Sciences Research Center. Hosted by EvolCompGen COSI and SMBE
- September 16, 2020 - Encyclopedia of DNA Elements (ENCODE) Phase III by Zhiping Weng, University of Massachusetts Medical School. Hosted by ISCB
- September 30, 2020 - RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference by Alexey Kozlov and Alexandros Stamatakis. Hosted by EvolCompGen COSI and SMBE
- October 2, 2020 - Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic by Philippe Lemey, Katholieke Universiteit, Leuven. Hosted by ISCB
- October 8, 2020 - Indigenous Voices in Computational Biology: An Introduction to Ethical Genomic Research with Indigenous People by Rene Begay, University of COlorado. Hosted by ISCB
- October 15, 2020 - Altered RNA Splicing by Mutant p53 Activates Oncogenic RAS Signaling in Pancreatic Cancer by Luisa Escobar-Hoyos, Yale School of Medicine. Hosted bv iRNA COSI and RNA Society
March 24, 2020
The human body is composed of trillion of cells, which are the building blocks of life. Each cell is highly organized and contains RNAs that code for proteins and serve regulatory roles. The location of an RNA species within a cell can dictate its folding1, editing, splicing, translation, degradation, binding partners, catalytic activity, and even the fate of the protein that it encodes. However, characterizing the RNA contents of cellular compartments that cannot be biochemically isolated is challenging. Here we introduce APEX-seq2, a method for RNA sequencing based on the direct proximity labeling of RNA using the peroxidase enzyme APEX2. APEX-seq in nine distinct subcellular locales produced a nanometer-resolution spatial map of the human transcriptome, revealing extensive patterns of localization for diverse RNA classes and transcript isoforms. We uncovered a radial organization of the nuclear transcriptome, which is gated at the inner surface of the nuclear pore for cytoplasmic export of processed transcripts. We identified two distinct pathways of messenger RNA localization to mitochondria, each associated with specific sets of transcripts for building complementary macromolecular machines within the organelle. APEX-seq should be widely applicable to many systems and model organisms, enabling comprehensive investigations of the dynamic spatial transcriptome.
- Sun L*, Fazal FM*, Li P*, Broughton JP, Lee B, Tang L, Huang W, Kool ET, Chang HY, Zhang QC. RNA structure maps across mammalian cellular compartments. Nature Structural and Molecular Biology (NSMB), 26, 322-330 (2019)
- Fazal FM*, Han S*, Parker KR, Kaewsapsak P, Xu J, Boettiger AN, Chang HY, Ting AY. Atlas of subcellular RNA localization revealed by APEX-seq. Cell, 178, 473–490 (2019)
April 21, 2020
The architecture of mammalian genes enables the production of multiple transcripts that greatly expand the coding capacity of our genomes. Understanding how these transcripts are regulated is of particular importance in cancer genomics, as their aberrant regulation contributes to the ~10 million cancer-related deaths each year. We recently described a phenomenon called exon-mediated activation of transcription starts (EMATS) in which the splicing of internal exons impacts the spectrum of promoters used and expression level of the host gene. We showed that targeted-inhibition of splicing reduces the usage of promoters and suppresses gene expression, while evolutionary creation of a new splice site can activate cryptic promoters. My findings support a model in which splicing factors recruit transcription machinery to influence promoter choice and regulate the expression of thousands of mammalian genes.
by Jianlin Cheng
April 22, 2020
Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction.
In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks-the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length.
The source code of DNCON2 is available at https://github.com/multicom-toolbox/DNCON2/
May 19, 2020
Efforts to develop antiviral drugs versus COVID-19 or vaccines for its prevention have been hampered by limited knowledge of the molecular details of SARS-CoV-2 infection. This webinar will describe our efforts to address this challenge by expressing 26 of the 29 SARS-CoV-2 proteins in human cells and identifying the human proteins physically associated with each using affinity-purification mass spectrometry. Among 332 high-confidence SARS-CoV-2-human protein-protein interactions, we identified 66 druggable human proteins or host factors targeted by 69 compounds (29 FDA-approved drugs, 12 drugs in clinical trials, and 28 preclinical compounds). Within a subset of these, multiple viral assays identified two sets of pharmacological agents that displayed antiviral activity.
May 20, 2020 at 11:00AM EDT!
Deep learning has become a powerful paradigm to analyze the binding sites of regulatory factors including RNA-binding proteins (RBPs), owing to its strength to learn complex features from possibly multiple sources of raw data. However, the interpretability of these models, which is crucial to improve our understanding of RBP binding preferences and functions, has not yet been investigated in significant detail. We have designed a multitask and multimodal deep neural network for characterizing in vivo RBP targets. The model incorporates not only the sequence but also the region type of the binding sites as input, which helps the model to boost the prediction performance. To interpret the model, we quantified the contribution of the input features to the predictive score of each RBP. Learning across multiple RBPs at once, we are able to avoid experimental biases and to identify the RNA sequence motifs and transcript context patterns that are the most important for the predictions of each individual RBP. Our findings are consistent with known motifs and binding behaviors and can provide new insights about the regulatory functions of RBPs
Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding
by Raluca Gordan and Ning Shen
May 26, 2020 at 11:00AM EDT!
Paralogous transcription factors (TFs) are oftentimes reported to have identical DNA-binding motifs, despite the fact that they perform distinct regulatory functions. Differential genomic targeting by paralogous TFs is generally assumed to be due to interactions with protein co-factors or the chromatin environment. Using a computational-experimental framework called iMADS (integrative modeling and analysis of differential specificity), we show that, contrary to previous assumptions, paralogous TFs bind differently to genomic target sites even in vitro. We used iMADS to quantify, model, and analyze specificity differences between 11 TFs from 4 protein families. We found that paralogous TFs have diverged mainly at medium- and low-affinity sites, which are poorly captured by current motif models. We identify sequence and shape features differentially preferred by paralogous TFs, and we show that the intrinsic differences in specificity among paralogous TFs contribute to their differential in vivo binding. Thus, our study represents a step forward in deciphering the molecular mechanisms of differential specificity in TF families.
June 12, 2020
Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.
June 23, 2020
Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Rational design of gene enhancers, splice sites, 3’-end regulatory sequences and more has the potential of greatly accelerating the fields of nanotechnology and medical therapeutics. Deep neural network models, together with gradient ascent optimization, show promise for sequence design. The optimized sequences can however get stuck in local minima, have low diversity and may be computationally very costly to generate at scale. In the first part of this talk, I will present our work on using gradient-based methods to design regulatory sequences of Alternative Polyadenylation (APA), a post-transcriptional mechanism where multiple polyadenylation signals (PAS) in the mRNA compete for cleavage. Given a deep neural network trained on a massively parallel reporter assay of APA variants, we forward-engineer new functional polyadenylation signals with precisely defined cleavage and isoform distributions. In the second part of this talk, I discuss how we extend this design framework using a class of generative neural networks called deep exploration networks (DENs). By penalizing any two generated patterns based on similarity, DENs learn to jointly maximize fitness and diversity. DENs can be used to design transcription factor binding sites, splice sequences and functional proteins. In the context of APA, we used DENs to engineer PAS with more than 10-fold higher selection odds than the best gradient ascent-generated patterns.
June 26, 2020
The urgency of the coronavirus pandemic has motivated investigators world wide to seek approved drugs or investigation new drugs as a way to rapidly advance therapeutics into clinical trials to treat the disease. I will describe a large collaboration, hosted by the UCSF Quantitative Biology Institute, to do that in a mechanistically focused way. Using AP-MS, a host-pathogen network of viral and human proteins was created, and drugs were sought targeting the human partner. From among 322 high confidence human proteins associated with 26 viral proteins emerged 63 that were druggable. Against those, 69 drugs were tested for efficacy, and from these 10 drugs in two broad classes emerged: those targeting protein biogenesis, and those acting against the Sigma1 and Sigma2 receptors. The activities of these drugs, and the chemoinformatics infrastructure that supported their selection, will be discussed. The mechanism-based repurposing strategy will be compared to a complementary effort that targets viral proteins and seeks novel chemical matter, using structure-based ultra-large library docking.
Global surveillance of COVID-19 by mining news media using a multi-source dynamic embedded topic model
By Yue Li and David Buckeridge
June 30, 2020
As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction COVID-19 and NPI in a semantically meaningful manner.
Genetic Basis Of De Novo Appearance Of Carotenoid Ornamentation In Bare-Parts Of Canaries
by Malgorzata Gazda
July 7, 2020
Unlike wild and domestic canaries (Serinus canaria), or any of the three dozen species of finches in genus Serinus, the domestic urucum breed of canaries exhibits bright red bills and legs. This novel trait offers a unique opportunity to understand the mechanisms of bare-part coloration in birds. To identify the mutation producing the colorful phenotype, we resequenced the genome of urucum canaries and performed a range of analyses to search for genotype-to-phenotype associations across the genome. We identified a nonsynonymous mutation in the gene BCO2 (beta-carotene oxygenase 2, also known as BCDO2), an enzyme involved in the cleavage and breakdown of full-length carotenoids into short apocarotenoids. Protein structural models and in vitro functional assays indicate that the urucum mutation abrogates the carotenoid-cleavage activity of BCO2. Consistent with the predicted loss of carotenoid-cleavage activity, urucum canaries tended to have increased levels of full-length carotenoid pigments in bill tissue and reduced levels of carotenoid-cleavage products (apocarotenoids) in retinal tissue compared with other breeds of canaries. We hypothesize that carotenoid-based bare-part coloration might be readily gained, modified, or lost through simple switches in the enzymatic activity or regulation of BCO2 and this gene may be an important mediator in the evolution of bare-part coloration among bird species.
Pooled CRISPR screens with imaging on microRaft arrays reveals stress granule-regulatory factors
by Emily Wheeler
July 21, 2020
Genetic screens using pooled CRISPR-based approaches are scalable and inexpensive, but restricted to standard readouts including survival, proliferation and sortable markers. However, many biologically relevant cell states involve cellular and subcellular changes that are only accessible by microscopic visualization, and are currently impossible to screen with pooled methods. Here we combine pooled CRISPR/Cas9 screening with microRaft array technology and high-content imaging to screen image-based phenotypes (CRaft-ID; CRISPR-based microRaft, followed by gRNA Identification). By isolating microRafts that contain genetic clones harboring individual guide RNAs, we identify RNA binding proteins (RBPs) that influence the formation of stress granules, punctate protein-RNA assemblies, that form during stress. To automate hit identification, we developed a machine-learning model trained on nuclear morphology to remove unhealthy cells or imaging artifacts. In doing so, we identified and validated previously uncharacterized RBPs that modulate stress granule abundance, highlighting the applicability of our approach to facilitate image-based pooled CRISPR screens.
Southern African Human Population Structure - an Opportunity to Expand Genomics Research Worldwide
by Caitlin Uren
July 30, 2020
Human genetic diversity in southern Africa is vast, complex and unique. Identifying and characterizing population structure in this region is not a trivial task but when performed correctly, allows for this information to be included in numerous genomic analyses such as studies investigating a populations’ demographic and genetic history and the association between this history and both Mendelian and complex diseases. I will discuss results from our population genetic and demographic studies and how this is related to various phenotypes (with a focus on tuberculosis susceptibility), and discuss various aspects of genomics that in my opinion are greatly lacking in southern Africa. I will conclude by discussing how populations worldwide will benefit from genomics research in this region.
Protein Function Prediction using Graph Convolutional Networks with Language Model Features
by Vladimir Gligorijevic
August 11, 2020 at 11:00AM EDT!
With the maturing of de novo structure prediction methods and the rise of deep learning techniques, it now becomes possible to generate high-throughput structure and function predictions for many unannotated proteins.
We will first introduce deepFRI (deep functional residue identification), our recently proposed deep learning Graph Convolutional Network (GCN) for predicting protein functions by leveraging protein contact maps representing protein structures and residue-level features from a pre-trained language model. Our model learns general structure-function relationships by robustly predicting Gene Ontology (GO) terms of proteins with < 30% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. deepFRI not only improves predictions of GO terms from protein sequences and predicted 3D structures, but also brings residue-level saliency mapping. The mapping provides insight into putative functional sites allowing for biological interpretation, hypothesis generation or the design of targeted validation experiments.
Unravelling the mystery of orphan genes to understand the origins of genetic novelty
by Nikos Vakirlis
August 24, 2020
What explains the presence of a gene only in the genome of one species and not in any other?
Species-specific protein-coding genes, also known as orphans, can arise "from scratch" from previously non-genic loci, through a process known as de novo gene emergence. How exactly the evolutionary transition from non-gene to functional gene unfolds is unclear. Can such de novo emerging genes increase an organism's fitness, and if so how? Orphan genes can also result from extensive sequence divergence of ancestral genes, which can eventually erase all similarity of a gene to its homologues in other species, a process even less well understood than de novo emergence. I will present novel findings which advance our understanding of both these evolutionary mechanisms and bring us a small step closer to a complete picture of the origins of genetic novelty.
September 16, 2020
The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference
by Alexey Kozlov and Alexandros Stamatakis
September 30, 2020
Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.
We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric.
Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic
by Philippe Lemey
October 2, 2020
There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. We find that the sarbecoviruses—the viral subgenus containing SARS-CoV and SARS-CoV-2—undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 1879–1999), 1969 (95% HPD: 1930–2000) and 1982 (95% HPD: 1948–2009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades.
Indigenous Voices in Computational Biology: An Introduction to Ethical Genomic Research with Indigenous People
by Rene Begay
October 8, 2020!
Indigenous communities through the world have distinct languages, culture, political structures, and ways of knowing. For too long, these communities have been exploited for material goods, land, and more recent for biospecimens. It is important to note that Indigenous people are not anti-science but rather support science that includes their intrinsic perspectives and expertise. Indigenous scientists are emerging across the world bridging science, policy, technology, and Indigenous ways of knowing to determine how their communities can benefit from genomic and clinical health research. The Indigenous Voices in Computational Biology series from the ISCB Academy will highlight the work conducted by Indigenous researchers in the United States, New Zealand, and other countries. Topics will include genomic data sharing, ethical engagement with Indigenous peoples in paleogenomics, and how to responsibly conduct research on Indigenous ancestors (ancient DNA). As a result, Indigenous scientists have developed their own Native biobank and hosted an international Indigenous genomics conference to discuss ethical concerns within their communities and present community based genomic research that integrates Indigenous knowledge. This presentation will introduce the series overarching themes and provide the framework that encourages ethical engagement with Indigenous communities in genomic research.
Altered RNA Splicing by Mutant p53 Activates Oncogenic RAS Signaling in Pancreatic Cancer
by Luisa Escobar-Hoyos
October 15, 2020!
Pancreatic ductal adenocarcinoma (PDAC) is driven by co-existing mutations in KRAS and TP53. However, how these mutations collaborate to promote this cancer is unknown. Here, we uncover sequence-specific changes in RNA splicing enforced by mutant p53 which enhance KRAS activity. Mutant p53 increases expression of splicing regulator hnRNPK to promote inclusion of cytosine-rich exons within GTPase-activating proteins (GAPs), negative regulators of RAS family members. Mutant p53-enforced GAP isoforms lose cell membrane association, leading to heightened KRAS activity. Preventing cytosine-rich exon inclusion in mutant KRAS/p53 PDACs decreases tumor growth. Moreover, mutant p53 PDACs are sensitized to inhibition of splicing via spliceosome inhibitors. These data provide insight into co-enrichment of KRAS and p53 mutations and therapeutics targeting this mechanism in PDAC.