Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

RNA COSI Track Presentations

Attention Conference Presenters - please review the Speaker Information Page available here
The rules and impact of nonsense-mediated mRNA decay in human cancers
Date: Saturday, July 22
Time: 10:00am - 10:15am
Room: Meeting Hall IB
  • Rik Lindeboom, Radboud University, Netherlands
  • Fran Supek, Centre for Genomic Regulation, Spain
  • Ben Lehner, Centre for Genomic Regulation, Spain

Presentation Overview: Show

BACKGROUND
Premature termination codons (PTCs) cause cancer as well as a large proportion of inherited human genetic diseases. PTC-containing transcripts can be degraded by an mRNA surveillance pathway termed nonsense-mediated decay (NMD). However, the efficiency of NMD varies, for example being inefficient when a PTC is located downstream of the last splice site in the mRNA (known as the exon junction complex /EJC/ model). We used matched exome and transcriptome data from 9,769 human tumors to systematically elucidate the rules of NMD targeting in human cells.

RESULTS
An integrated model incorporating multiple rules beyond the canonical EJC model explains 74% of the non-random variance in NMD efficiency across thousands of PTCs. For instance, we find that start codon-proximal PTCs commonly evade NMD via downstream re-initiation of translation. Moreover, NMD is less efficiently triggered by PTCs in very long exons and by PTCs that are far upstream of the wild-type stop codon. Sequence motifs corresponding to known RNA-binding proteins may modulate NMD activity in particular instances. We also show that rapid mRNA turnover and gene dosage compensation mask the effects of NMD in many genes.

CONCLUSIONS
Applying the NMD model reveals signatures of both positive and negative selection on NMD-triggering somatic mutations in human tumors and provides a novel classification of tumor suppressor genes. Taken together therefore, this study [1] provides important mechanistic insight into NMD and into tumor evolution, as well as a broader framework for predicting the effects of nonsense variants in human disease.

Splicing isoform expression provides insights into neurodevelopmental disorders
Date: Saturday, July 22
Time: 10:15am - 10:30am
Room: Meeting Hall IB
  • Guan Ning Lin, Shanghai Jiao Tong University, China
  • Roser Corominas, University of California San Diego, United States
  • Jorge Urresti, University of California San Diego, United States
  • Megha Amar, University of California San Diego, United States
  • Alysson R. Muotri, University of California San Diego, United States
  • Jonathan Sebat, University of California San Diego, United States
  • Nenad Sestan, Yale School of Medicine, United States
  • Stephan Sanders, University of California San Francisco, United States
  • Lilia M. Iakoucheva, University of California San Diego, United States

Presentation Overview: Show

BACKGROUND
Alternative splicing plays important role in brain development, however its global contribution to human brain diseases has not been fully investigated. Here, we examined the relationships between splicing isoforms expression and de novo mutations implicated in neurodevelopmental disorders.

RESULTS
We construct a comprehensive spatiotemporal isoform transcriptome of the developing human brain by integrating all known AS isoforms of human genes with RNA-seq data from the BrainSpan across multiple brain developmental periods and regions. And we observed distinct expression patterns of different isoforms encoded by the same gene. About 20% of human genes demonstrate significant isoform expression variation, and a subset of them, defined as “switch genes”, significantly change isoform expression in the adjacent brain developmental periods, while “non-switch genes” carry consistent isoform expression variations. The isoforms encoded by the switch genes are enriched in alternatively regulated microexons, de novo mutations from the patients with neurodevelopmental diseases, and are co-expressed with the Rbfox1 and CELF2 splicing factors. We have also performed the experiments on the de novo splice site mutations on the isoforms of four switch genes, SCN2A, DYRK1A and BTRC, and demonstrated that the mutations can differentially disrupt splicing and lead to exon skipping depending on the expressed isoforms.

CONCLUSIONS
One of our most remarkable findings from the isoform transcriptome is the discovery of the distinct properties of switch and non-switch genes. Switch genes were enriched in neuronal-related functions as opposed to general cellular functions for non-switch genes. This suggests that these two gene types are fundamentally different, and perhaps play distinct roles during brain development. In addition, the experimental results of functional impacts of splice-site mutations in switch genes, such as BTRC, suggest that the mutations can selectively disrupt the gene and reduce its translational efficiency by impacting only the certain isoforms. In case of BTRC, it is likely impacting Wnt signaling through impaired degradation of β-catenin by disrupting the BTRC isoforms carrying the mutation. Therefore, we propose that global functional effect of mutations associated with human diseases should be investigated at isoform- rather than gene-level resolution.

lncRNA dysregulation alters the activity of driver genes in cancer
Date: Saturday, July 22
Time: 10:30am - 10:45am
Room: Meeting Hall IB
  • Hua-Sheng Shiu, Baylor College of Medicine, United States
  • Sonal Somvanshi, Baylor College of Medicine, United States
  • Pavel Sumazin, Baylor College of Medicine, United States

Presentation Overview: Show

BACKGROUND
The expression of long-noncoding RNAs (lncRNAs) is dysregulated in most cancers, often by copy number gains or losses. However, pathophysiological consequences for lncRNA dysregulation and their mechanisms of actions are known for only a handful of these genes.

RESULTS
We present evidence that alterations at hundreds of lncRNA loci dysregulate dozens of oncogenes and tumor suppressors (cancer genes) in each of fourteen tumor contexts to potentially affect tumorigenesis. Integrated computational and statistical analyses of tumor profiles and evidence from biochemical assays in cancer cell lines point to the existence of extensive context-specific multimodal lncRNA regulatory networks that are composed of both transcriptional and post-transcriptional interactions that have pathophysiologically-relevant effects on tumor cells.
Our analysis builds on the inference of lncRNA regulatory networks and we provide statistical evidence to support our predicted interactions. We showed that tumor-specific lncRNA dysregulation that is correlated with somatic in cis copy number changes is predictive of expression variability of dozens of cancer genes in each tumor context analyzed, even after accounting for other known and inferred cis- and trans-acting factors; that the loci of lncRNAs that are enriched for cancer-gene targets are more likely to be altered by somatic copy number alterations, and that these lncRNAs are more likely to be dysregulated. Considering the modality of lncRNA regulation, we showed that lncRNA binding sites in proximal promoters are enriched in core promoters; that lncRNAs with binding sites in core promoters are more likely to be predicted as regulators; and that binding site multiplicity in core promoters, and to a lesser extent in proximal promoter, is predictive of lncRNA-target correlation. Some lncRNAs appear to be transcriptional and others post-transcriptional specialists; interestingly, their regulatory modality is consistent with their known localization in the cell.
Studying inferred interactions in multiple tumor types revealed that the majority of lncRNAs act in a context-specific manner, but suggested that dozens of lncRNAs—including OIP5-AS1, TUG1, NEAT1, MALAT1, XIST, and TSIX—play physiologically-relevant roles across multiple tumor types to synergistically regulate key cancer pathways. We biochemically studied select context-specific and pan-cancer lncRNA regulator candidates and showed that OIP5-AS1 regulates tumor suppressors in gynecologic tumors and that it’s silencing phenocopies the effects of tumor-suppressor silencing in breast cancer, ovarian cancer, and endometrial cancer cell lines; that TUG1 regulates oncogenes in ovarian serous carcinomas and that it’s silencing reduces proliferation in ovarian cancer cell lines; and that LINC01184 and WT1-AS regulate cancer genes in breast and ovarian cancer cells, respectively

CONCLUSIONS
Our analyses suggest that extensive lncRNA regulatory networks are regulating hundreds of genes, including dozens of cancer genes, in each tumor context, and that somatic copy number alterations at lncRNA loci, and their subsequent dysregulation, can dysregulate cancer genes and affect tumor pathology. Our analysis also identified lncRNAs that are predicted to synergistically regulate key pathways, including pathways that play key roles in cancer. We present statistical and biochemical evidence for the pathophysiological relevance of the dysregulation of hundreds of lncRNAs in each of fourteen tumors types.

Alternative splicing remodels the proetin interaction network of cancer gene drivers
Date: Saturday, July 22
Time: 10:45am - 11:00am
Room: Meeting Hall IB
  • Hector Climente-Gonzalez, Pompeu Fabra University, Spain
  • Marina Reixachs, Pompeu Fabra University, Spain
  • Eduardo Eyras, Pompeu Fabra University, Spain

Presentation Overview: Show

BACKGROUND
Alternative splicing changes are frequently observed during physiological processes like cell differentiation, as well as during disease states, including cancer. However, the functional relevance of these splicing changes remains mostly unknown.

RESULTS
We have carried out a systematic analysis to characterize the potential functional consequences of alternative splicing changes in thousands of tumor samples. This analysis reveals that a subset of alternative splicing changes affect protein domain families that are frequently mutated in tumors, disrupt the protein–protein interaction network of cancer gene drivers, and show mutual exclusion with mutations in cancer drivers. We further show that tumor samples present a negative correlation between the number of these splicing changes and the number mutated cancer drivers. We extend this analysis to show that splicing changes affect protein production using sequencing of RNAs occupied by Ribosomes (Ribo-seq) and that the remodeling of the protein interaction network of cancer drivers is conserved between human and mouse.

CONCLUSIONS

We propose that a subset of the alternative splicing changes observed in tumors represents independent oncogenic processes and could potentially be considered alternative splicing drivers (AS-drivers).

Computational framework to design effective oligonucleotides for exon skipping of Duchenne Muscular Dystrophy
Date: Saturday, July 22
Time: 11:00am - 11:15am
Room: Meeting Hall IB
  • Hosna Jabbari, University of Alberta, Canada
  • Toshifumi Yokota, University of Alberta, Canada
  • Carlo Montemagno, University of Alberta, Canada

Presentation Overview: Show

BACKGROUND
Duchenne muscular dystrophy (DMD) is a lethal and devastating genetic disorder seen in one out of every 3500-5000 boys. Exon skipping of DMD mRNA aims to restore the disrupted reading frame using antisense oligonucleotides (AOs), allowing the production of truncated but partly functional dystrophin proteins. Exon skipping therapy employs a cocktail of synthetic RNA-like molecules as an “RNA stitch” to jump over (splice out) the mutated parts of the gene that block the effective synthesis of the encoded protein. AOs target one of several hundred potential target sites on an exon. The efficacy of exon skipping at different target positions within an exon can vary by more than 20-fold. However, most studies have involved in vitro testing of several target sequences for a given exon with no guarantee that the selected target sequence represents an optimal choice, which is a significant challenge in designing effective antisense exon-skipping drugs. Although the first exon skipping drug, eteplirsen, was approved by the FDA in September 2016, its efficacy is questioned.

RESULTS
Computational methods are fast and inexpensive experiments that can play an important role in effectively guiding the pre-screening of such suitable sequences [1]. We have developed an algorithm that takes into account different chemistries of AO and exon. Our algorithm considers structure of AO and exon before interaction to find the minimum free energy interacting structure of AO and exon. Our algorithm is fast, efficient and follows a biologically sound structure formation hypothesis [2]. Preliminary comparison with in vitro data shows a high correlation between our algorithm prediction and skipping efficacy of AOs.
The outcomes of AO-mediated exon skipping are likely to be determined by a group of influential features, with possible interdependencies among themselves, many of which can be modelled computationally. Most features contribute to noise, while only a small fraction of features and their combinations are responsible for the outcome of interest. Identification of these pivotal features using machine-learning techniques plays a crucial role in predicting regions of high efficacy. To improve our framework, we are using Weka to identify these pivotal features from dozens of different parameters that were previously found to be correlated with exon skipping efficacy.

CONCLUSIONS
We believe that our algorithm can dramatically improve the efficiency of exon skipping, even those that are already in clinical trials (e.g. exon 51 and exon 53). We will launch our computational framework as a webserver which will be freely available for academic use so that other researchers can design exon skipping AOs for DMD and other genes. Although we initially focus on PMOs, we will expand our software to include other antisense chemistries (e.g. 2’MePS and 2'-O-methoxyethyl-modified antisense oligonucleotides) in the future. The software could serve as a design tool of antisense drugs for DMD and other disorders that can be potentially treated by exon skipping (e.g. dysferlin deficiency). In addition, we believe by sharing our method we can incorporate community feedback to improve its quality.

Single cell transcriptomics reveals specific RNA editing signatures in the human brain
Date: Saturday, July 22
Time: 11:15am - 11:30am
Room: Meeting Hall IB
  • Ernesto Picardi, University of Bari & IBIOM-CNR, Italy
  • Anna Maria D'Erchia, University of Bari & IBIOM-CNR, Italy
  • Graziano Pesole, University of Bari & IBIOM-CNR, Italy

Presentation Overview: Show

BACKGROUND
A-to-I RNA editing in human is carried out by members of ADAR family of enzymes that act on double strand RNAs and can alter codon identity, splicing sites or base-pairing interactions within higher-order RNA structures. Recoding RNA editing is essential for normal brain development and regulates important functional properties of neurotransmitter receptors [1, 2]. Indeed, its deregulation has been linked to several nervous diseases such as epilepsy, schizophrenia, Alzheimer, major depression and amyotrophic lateral sclerosis [3, 4]. Recently we have profiled RNA editing in six different human tissues using whole transcriptome sequencing and detected more than three million events [5]. Interestingly, genes undergoing RNA editing were consistently enriched in genes involved in neurological disorders and cancer, confirming the relevant biological role of RNA editing in human.
Although investigations in bulk tissues are extremely useful, they do not capture the transcriptomic heterogeneity of multiple cell types constituting the ensemble tissue.

RESULTS
To characterize the complexity of RNA editing at single cell resolution, we investigated this phenomenon in single cells from adult human cortex obtained from living subjects in which transcriptome diversity was already surveyed by single cell RNA sequencing (scRNA-seq) [6]. Using a comprehensive collection of known RNA editing events, we explored inosinome profiles in 466 cortex cells. Individual scRNAseq data were quality checked by FASTQC and poor regions at 3’ ends were trimmed by means of trim_galore tool. Cleaned read were then mapped onto the human reference genome by STAR aligner. RNA editing candidates were detected using our REDItools [7] and analyzed by custom scripts. We found that the identification of A-to-I RNA editing in single cells was strongly correlated with the amount of generated RNA reads. RNA editing profiles were quite heterogeneous also inside the same cell population. However, the observed RNA editing profile as well as the Alu editing index were sufficient to discriminate major cell types as neurons, astrocytes and oligodendrocytes, underlining the cell specific nature of RNA editing. Interestingly, recoding RNA editing were mainly detectable in neurons, remarking the primary role of A-to-I editing in modulating brain functions through key modifications in receptors for neurotransmitters.

CONCLUSIONS
Taken together, our results demonstrate that RNA editing is detectable in single cells and demonstrates that A-to-I patterns reveal specific editing signatures distinguishing major cell types in the human brain. Profiling RNA editing in single cells yields novel and exiting insights into neuronal plasticity and opens up the possibility of deciphering as yet unknown molecular mechanisms in diverse neurological or neurodegenerative disorders. In addition, A-to-I changes in single cells may contribute to the identification of novel therapeutic targets or prognostic markers for innovative approaches of precision medicine.

Dissecting the complexity and cell-to-cell variability of alternative splicing regulation
Date: Saturday, July 22
Time: 11:30am - 11:45am
Room: Meeting Hall IB
  • Martin Mikl, Weizmann Institute of Science, Israel
  • Eran Segal, Weizmann Institute of Science, Israel

Presentation Overview: Show

BACKGROUND
Alternative splicing is a pervasive mechanism that allows for a great expansion of the proteome and constitutes an additional level of control over gene expression. The complex nature of splicing regulation calls for a systematic and controlled investigation of sequence alterations, changing single features at a time while keeping the general environment constant.

RESULTS
To achieve this goal and to overcome the limitations of our current understanding of splicing regulation, we developed a bifluorescent high-throughput reporter assay based on stable integration of the transgene in a specific genomic location and designed tens of thousands of sequences containing an alternative splicing event to determine the effect of sequence variations on the splicing decision. We identified intronic and exonic sequences with dominant negative and positive effects on splicing and combinatorial interactions between sequence elements and observed that the canonical splicing pathway is not always the most efficient option. Among others, our analysis revealed novel connections between RNA secondary structure and splicing choice. One key advantage of our approach is that it allows us for the first time to assess cell-to-cell variability of splicing decisions in large scale and understand its regulation. We observed that the level of variability can be encoded in the DNA sequence and found several features of splice site sequences to influence the cell-to-cell variability of splicing, most notably the secondary structure around the splice site.

CONCLUSIONS
High-throughput testing of rationally designed splice sites allows us to address many open questions and gain insights into splicing regulation from a novel perspective. We are expanding our approach to cover different types of alternative splicing to identify general and splice type-specific modes of regulation. Combining all these data into a unified model will constitute a significant step towards a holistic understanding of this fundamental process and can open the door for a novel, single-cell perspective on splicing regulation and its functional implications.

Advancing parasite transcriptomics with spliced-leader sequencing experimental and computational workflows
Date: Saturday, July 22
Time: 11:45am - 11:52am
Room: Meeting Hall IB
  • Bart Cuypers, Univeristy Of Antwerp, Belgium
  • Malgorzata Domagalksa, Institute of Tropical Medicine, Antwerp, Belgium
  • Geraldine de Muylder, Institute of Tropical Medicine, Antwerp, Belgium
  • Pieter Meysman, University of Antwerp, Belgium
  • Manu Vanaerschot, Columbia University in the City of New York, United States
  • Hideo Imamura, Institute of Tropical Medicine, Belgium
  • Franck Dumetz, Institute of Tropical Medicine, Belgium
  • Thomas-Wolf Verdonckt, Institute of Tropical Medicine, Belgium
  • Peter J. Myler, Center for Infectious Disease Research, United States
  • Gowthaman Ramasamy, Center for Infectious Disease Research, United States
  • Kris Laukens, University of Antwerp, Belgium
  • Jean-Claude Dujardin, Institute of Tropical Medicine, Antwerp, Belgium

Presentation Overview: Show

BACKGROUND
The Trypanosomatida family contains many human pathogenic, parasitic species including Leishmania donovani (visceral leishmaniasis), Trypanosoma gambiense/rhodesiense (sleeping sickness) and Trypanosoma cruzi (chagas disease). Transcriptome studies of these parasites are essential for fundamental insights in their development, pathogenicity and drug resistance. However, in most patient tissue samples, host RNA is much more abundantly present than parasite RNA, imposing a complicated and time consuming parasite isolation step prior to sequencing. Interestingly, mature mRNA of Trypanosomatida differs from the host’s by starting with a fixed 39 nucleotide sequence or spliced-leader (SL), added to pre-mRNA by a process called ‘trans-splicing’[1].

RESULTS
We exploited the presence of a SL on each parasite mRNA and developed an RNA-seq protocol (SL-seq) to specifically amplify and sequence SL-containing RNA out of a pool of host cell RNA. SL-Seq first converts SL-containing mRNA to cDNA using a SL-specific primer. Amplification is carried out with overhang-extension PCR, which adds additional motives and indexes, allowing multiplexing hundreds of samples on a single sequencing lane. In addition, we developed and a new bio-informatic pipeline that can deal with the intricacies of the technology, specific to the method. It uses existing RNAseq tools (including Samtools, TopHat, HTseq and DESeq) and new tools that were developed in Python. One of the main differences with a conventional RNA-seq method that had to be addressed is that most SL-Seq sequencing reads map in the 5’-UTRs. However, since 5’-UTRs have a variable length for the same gene in Trypansomatids, they are left unannotated in all Trypanosomatid reference genomes. The SL-seq pipeline is developed to associate the reads automatically with the closest upstream gene, without the need for UTR annotation. Other modifications include the trimming of the SL sequence artifacts from the reads and differential splice site usage detection. We verified the validity and performance of SL-seq and its bio-informatic pipeline by comparing the results with those obtained with the Illumina Stranded mRNA kit (ILL-seq) on an identical pool of RNA. A strong correlation was observed between the expression values obtained with both methods (p<2e-16 and R²=0.8) and also the differentially expressed genes and enriched GO categories were largely identical. We also successfully sequenced Leishmania transcriptomes directly from infected THP-1 cells, without prior isolation of the parasites. With ILL-seq only 1.6% of the ILL-seq data was Leishmania mRNA, while this was 65.0% using the SL-seq protocol, indicating SL-seq resulted in a 42 fold enrichment of parasite mRNA.

CONCLUSIONS
We developed an RNA-seq method and corresponding bio-informatic analysis toolkit that can sequence and analyze Trypanosomatid transcriptomes, directly from infected tissues with unprecedented resolution. SL-seq could also be useful for other SL-containing organisms including nematodes, trematodes and primitive chordates.

Transcriptome-wide modelling of RNA life cycle
Date: Saturday, July 22
Time: 11:52am - 11:58am
Room: Meeting Hall IB
  • Alina Selega, University of Edinburgh, United Kingdom
  • David Schnoerr, University of Edinburgh, United Kingdom
  • Sander Granneman, University of Edinburgh, centre for Synthetic and Systems Biology (SynthSys), United Kingdom
  • Guido Sanguinetti, University of Edinburgh, United Kingdom

Presentation Overview: Show

BACKGROUND
Gene expression in cells is regulated through a complex coordination of processes. It is thus imperative to understand the relative balance and dynamical control of production and decay of macromolecules to dissect its mechanisms. When explaining the dynamic behaviour of the transcriptome, most model-based efforts focus solely on changes in the transcription rate and resort to a simplifying assumption of constant degradation rate [1, 2, 3]. The latter is often a consequence of the lack of measurements underlying degradation processes.

RESULTS
We used χCRAC [4], a cross-linking method that benefits from a greatly reduced UV irradiation time, to assay the binding of RNA degradation factors Nab3 and Nrd1 and the major exonuclease Xrn1. The data was collected in rapid time series experiments as early as 1 minute after the imposition of a nutrient shift on exponentially growing Saccharomyces cerevisiae. We discovered that a large fraction of the transcriptome binds to these crucial actors of degradation in a highly dynamic manner, often in the very early stages of stress response. Using this finding, we developed a transcriptome-wide state-space model of RNA life cycle with time-variable transcription and degradation rates.

The observation model for transcription used χCRAC time series of Pol II and the co-transcriptionally binding degradation factors Nrd1 and Nab3, and the observation model for degradation used χCRAC time series of Xrn1. RNA abundance was measured using a complementary RNA-seq data set. Our model allows identification of the relative contributions of transcription and degradation to explaining the total amount of available RNA at different time points after stress. This enables us to indicate transcripts that were transcribed more or increasingly targeted for degradation, and recover their regulation patterns during various stages of stress response.

CONCLUSIONS
Our data provide the first high-resolution time series measurements of the action of RNA degradation factors and exonuclease during the imposition of stress in any organism. Statistical analysis of the data revealed a highly dynamic behaviour, calling into question earlier modelling assumptions. We extended this by developing the first transcriptome-wide model of RNA life cycle with a more realistic choice of variable degradation rate. Our model provides the means for explaining the relative balance of transcription and degradation and thus, characterising the regulation profile under stress for each transcript.

Capturing target-specific protein-RNA interaction footprints from iCLIP-seq data
Date: Saturday, July 22
Time: 12:00pm - 12:15pm
Room: Meeting Hall IB
  • Sabrina Krakau, Max Planck Institute for Molecular Genetics, Germany
  • Hugues Richard, University Pierre and Marie Curie, France
  • Annalisa Marsico, Max Planck Institute for Molecular Genetics, Germany

Presentation Overview: Show

BACKGROUND
RNA binding sites for a protein of interest can now be detected genome-wide and at a high resolution thanks to the development of CLIP-seq technologies. Among these methods, iCLIP [1] and eCLIP [2] provide single-nucleotide resolution and are particularly powerful in characterizing protein-RNA interaction landscapes. However, current methods do not address both problems of peak calling and crosslink sites detection simultaneously, and fail to model the various sources of biases, such as transcript abundances or unspecific crosslink (CL) sequence motifs [1].

RESULTS
We developed an approach based on a non-homogeneous Hidden Markov model, which calls individual crosslink sites taking into account both regions enriched in protein bound fragments and the specifics of iCLIP truncation patterns (Fig.1a). Our modeling framework also incorporates information from various covariates, such as RNA abundances or information from CL-motifs. We extensively validated the superiority of our approach over other common strategies, both within a realistic iCLIP simulation setup (using real RNA-seq data) and on five published iCLIP/eCLIP datasets where the protein's predominant binding regions are known.
Over a large range of simulation parameters, our tool recovers binding sites with a better accuracy than other methods. Further, on all real datasets our approach is more precise in determining the bona fide binding sites (as shown on Fig. 1b. for a PUM2 eCLIP dataset [2] using the known sequence motif).

CONCLUSIONS
Our results show the importance of combining peak calling and cross link site detection when analyzing iCLIP or eCLIP data. We also show that the incorporation of covariates (input signals, as well as CL-motifs) clearly improves the accuracy of the calls (Fig. 1b). The framework is general enough to incorporate other covariates in the model.

SSMART: Sequence-structure motif identification for RNA-binding proteins
Date: Saturday, July 22
Time: 12:15pm - 12:30pm
Room: Meeting Hall IB
  • Alina Munteanu, Berlin Institute for Medical Systems Biology, Max Delbruck Center, Berlin, Germany, Germany
  • Neelanjan Mukherjee, MDC Berlin, Germany
  • Uwe Ohler, Max Delbrueck Center & Humboldt University, Germany

Presentation Overview: Show

BACKGROUND
RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized.

RESULTS
We developed SSMART, an RNA motif finder that extends cERMIT [1] and simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. The secondary structure is obtained in a prior step, by sampling suboptimal structures around binding sites and identifying local dominant combinations of base pairs [2]. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3’UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/.

CONCLUSIONS
We propose an efficient algorithm to identify the most probable sequence-structure motif, or combination of motifs, given a large set of RNA sequences. Our method can contribute to the systematic understanding of RBP-RNA binding specificity as more genome-wide experiments that determine RBP binding are performed.

From the genome-wide in vivo RNA structure probing data to the RNA secondary structure prediction
Date: Saturday, July 22
Time: 2:00pm - 2:35pm
Room: Meeting Hall IB
  • Yiliang Ding, John Innes Centre, United Kingdom
Recurrent neural network models to quantitatively predict RNA-RNA interactions
Date: Saturday, July 22
Time: 2:35pm - 2:50pm
Room: Meeting Hall IB
  • Michelle Wu, Stanford University School of Medicine, United States
  • Johan Andreasson, Stanford University School of Medicine, United States
  • Wipapat Kladwang, Stanford University School of Medicine, United States
  • William Greenleaf, Stanford University School of Medicine, United States
  • Rhiju Das, Stanford University School of Medicine, United States

Presentation Overview: Show

BACKGROUND
RNA is a functionally versatile biomolecule that plays a key role in myriad novel RNA-based synthetic biology tools, including designer riboswitches controlling living cells, CRISPR/Cas9-based genome editing, RNA-based silencing, and reengineering of mRNA translation. The ability to precisely design nucleic acid interactions is critical to gaining precise control over these systems. However, doing so relies upon a thorough understanding of the energetics of RNA interactions, and existing computational models have proven inadequate to quantitatively account for the biophysical behavior of RNA molecules. Recent technological advances have allowed for the collection of quantitative biophysical information on millions of individual RNA targets [1]. Further, algorithmic advances in the field of deep learning have demonstrated unprecedented power in extracting the relevant features from large datasets [2]. We combine these high-throughput biophysical datasets with recurrent neural networks to build quantitative models of RNA interactions.

RESULTS
Tens of thousands of RNA molecules were designed by citizen scientists through the Eterna open laboratory [3]. These sequences were designed to modulate their affinity to an RNA reporter in response to the concentrations of three RNA molecules whose expression levels are indicative of active tuberculosis [4]. The design-reporter affinities were measured in up to 16 different sets of concentrations of the three input RNAs using a massively parallel RNA array platform [1], resulting in over 7 million individual affinity measurements. These Kd values were used to train a recurrent neural network model to predict reporter affinity to any design sequence in the presence of the three input RNAs at arbitrary concentrations. This model was able to achieve blind predictions with an RMSE of 0.88 kcal/mol on a test set that included conditions not seen in the training data. Further, we show that data augmentation methods specific to sequence data are able to improve the performance to 0.78 kcal/mol.

CONCLUSIONS
In this work, we show that recurrent neural networks are able to quantitatively predict the affinities of RNA-RNA interactions. This suggests that these models have the potential to capture complex biophysical information and may enable precise design of RNA interactions for biotechnology applications.

Efficient approximations of RNA kinetics landscape using non-redundant sampling
Date: Saturday, July 22
Time: 2:50 pm - 3:10 pm
Room: Meeting Hall IB
  • Juraj Michalik, Inria Saclay, France
  • Hélène Touzet, CNRS, University of Lille and INRIA, France
  • Yann Ponty, CNRS/LIX, Polytechnique, France
pulseR: Versatile computational analysis of RNA turnover from metabolic labeling experiments
Date: Saturday, July 22
Time: 3:10 pm - 3:16 pm
Room: Meeting Hall IB
  • Alexey Uvarovskii, University Hospital Heidelberg, German Center for Cardiovascular Research, Germany
  • Christoph Dieterich, University Hospital Heidelberg, German Center for Cardiovascular Research, Germany
Genome-wide analysis for identification of lncRNAs that sponge RNA-binding proteins
Date: Saturday, July 22
Time: 3:17pm - 3:23pm
Room: Meeting Hall IB
  • Hilal Kazan, Antalya International University, Turkey
  • Saber Hafezqorani, Middle East Technical University, Turkey
Polypyrimidine tract determines splicing efficiency of lincRNAs
Date: Saturday, July 22
Time: 3:25 pm - 4:00 pm
Room: Meeting Hall IB
  • David Staněk, Czech Academy of Sciences, Czech Republic

Presentation Overview: Show

BACKGROUND
Premature termination codons (PTCs) cause cancer as well as a large proportion of inherited human genetic diseases. PTC-containing transcripts can be degraded by an mRNA surveillance pathway termed nonsense-mediated decay (NMD). However, the efficiency of NMD varies, for example being inefficient when a PTC is located downstream of the last splice site in the mRNA (known as the exon junction complex /EJC/ model). We used matched exome and transcriptome data from 9,769 human tumors to systematically elucidate the rules of NMD targeting in human cells.

RESULTS
An integrated model incorporating multiple rules beyond the canonical EJC model explains 74% of the non-random variance in NMD efficiency across thousands of PTCs. For instance, we find that start codon-proximal PTCs commonly evade NMD via downstream re-initiation of translation. Moreover, NMD is less efficiently triggered by PTCs in very long exons and by PTCs that are far upstream of the wild-type stop codon. Sequence motifs corresponding to known RNA-binding proteins may modulate NMD activity in particular instances. We also show that rapid mRNA turnover and gene dosage compensation mask the effects of NMD in many genes.

CONCLUSIONS
Applying the NMD model reveals signatures of both positive and negative selection on NMD-triggering somatic mutations in human tumors and provides a novel classification of tumor suppressor genes. Taken together therefore, this study [1] provides important mechanistic insight into NMD and into tumor evolution, as well as a broader framework for predicting the effects of nonsense variants in human disease.

DeepBound: Accurate identification of transcript boundaries via deep convolutional neural fields
Date: Saturday, July 22
Time: 4:30pm - 4:50pm
Room: Meeting Hall IB
  • Mingfu Shao, Computational Biology Department, Carnegie Mellon University, United States
  • Jianzhu Ma, University of California, San Diego, United States
  • Sheng Wang, Department of Human Genetics, University of Chicago, United States

Presentation Overview: Show

Motivation: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

Results: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods.

Availability: DeepBound is freely available at https://github.com/realbigws/DeepBound

Global analysis of pre-mRNA splicing uncovers the slow splicing kinetics of alternative splicing
Date: Saturday, July 22
Time: 4:50pm - 5:05pm
Room: Meeting Hall IB
  • Angela Garibaldi, University of California, Irvine, United States
  • Athit Kao, University of California, Irvine, United States
  • Anke Busch, University of California, Irvine, United States
  • Klemens Hertel, University of California, Irvine, United States

Presentation Overview: Show

BACKGROUND
The RNA community has established that RNA transcription and splicing often occur cotranscriptionally, indicating a structural and functional connection in which RNA synthesis may influence splicing decisions [1]. Alternative splicing is a eukaryotic mechanism to regulate protein expression and genetic diversity through the generation of different mRNA isoforms. RNA features such as exon/intron length and splice site strength may affect alternative splicing, resulting in exon skipping or intron retention; events that have been shown to play critical roles in normal and cancer cell development [2, 3, 4]. Indeed, recent evidence suggests that intron retention events are much more common than previously thought [5]. While functional roles for intron retention events are currently elucidated, it is unclear how the kinetics of transcription and pre-mRNA splicing may affect how an intron is alternatively spliced.

RESULTS
To determine the kinetics of intron synthesis and removal within expressed endogenous human genes, we labeled nascent RNA with 4-thiouridine for short time intervals. After isolating 4-thioridine-labeled nascent RNAs, expression values were determined for each intron and exon. Time course data were fit to kinetic equations that model the production and removal of each expressed intron. This resulted in the analysis of 20,783 introns and 47,243 exons. Interestingly, 5% of the evaluated introns display a retention level of greater than 20%, while about 20% of expressed exons displayed alternative inclusion behaviors. In general, retained introns are spliced more slowly. We show that intron length correlates poorly with intron retention. On the contrary, skipped exons tend to be longer, falling outside of the accepted exon definition range. While splice site strength is known to affect alternative splicing, only retained introns were significantly affected by a weaker 3’ splice site. Surprisingly, the position within a gene promoted intron retention, but not exon skipping. Furthermore, our analysis demonstrates that pre-mRNA synthesis rates strongly influence intron removal kinetics. Unexpectedly, we observe that faster synthesis rates correlated with slower intron removal. Moreover, faster synthesis does not seem to contribute globally to exon skipping. In fact, the only uniting feature between intron retention and exon skipping is slow splicing kinetics.

CONCLUSIONS
While conceptually exon skipping and intron retention entail the alternative inclusion of a sequence, these types of alternative splicing are not affected similarly by RNA synthesis rates, position within the gene, or splice site strength. However, both are driven by slow removal rates; highlighting an underlying hallmark of alternative splicing.

Widespread regulation of transcriptional readthrough is a hallmark of the mammalian proteotoxic stress response
Date: Saturday, July 22
Time: 5:05pm - 5:20pm
Room: Meeting Hall IB
  • Reut Shalgi, Technion, Israel

Presentation Overview: Show

BACKGROUND
Cells and organisms live under a constantly changing environment. Therefore, cells have evolved complex mechanisms to cope and overcome various physiological and environmental stresses, involving essential transcriptional responses that facilitate survival and adaptation. Recently, numerous emerging genome-wide studies reveal prevalent transcription beyond known protein coding gene loci, generating a variety of new classes of RNAs. However, their regulation and role in the cellular stress response are still largely a mystery.

RESULTS
Here we set out to characterize one such newly discovered class, stress-induced readthrough transcripts. We performed nuclear enriched RNAseq in NIH3T3 cells after exposing them to multiple proteotoxic stresses: heat shock, oxidative and osmotic stresses. We wrote a pipeline to identify readthrough transcripts from RNAseq data, and compared the readthrough profiles in response to the proteotoxic stress. We observe massive induction of transcriptional readthrough under all stress conditions, with significant, yet not complete overlap of readthrough-induced loci between different stresses. We investigate potential regulators of stress-induced readthrough, and find a role for the transcription factor HSF1 in the induction of some heat shock-induced readthrough transcripts., Additionally, we demonstrate its manifestation at the level of Pol-II occupancy. We explore genomic features of readthrough transcription, and find that readthrough loci have distinct sequence characteristics, Furthermore, we analyze chromatin modification data, and observe a unique chromatin signature typical to readthrough transcripts.

CONCLUSIONS
In this study we demonstrate that transcriptional readthrough is a general, intrinsic part of the transcriptional response to proteotoxic stress in mammalian cells. Importantly, our analyses suggest that stress-induced transcriptional readthrough is not a random failure process, but is rather differentially induced across different conditions. We explore the causes and consequences of stress-induced readthrough, and find a unique chromatin signature typical to readthrough transcripts, suggesting that readthrough transcription is associated with the maintenance of an open chromatin state.

Deciphering the regulation of alternative pre-mRNA splicing by coupling RBP binding profiles with long-range RNA structure
Date: Saturday, July 22
Time: 5:20pm - 5:35pm
Room: Meeting Hall IB
  • Dmitry Svetlichnyy, Skolkovo Institute of Science and Technology, Russia
  • Dmitri Pervouchine, Skolkovo Institute of Science and Technology, Russia

Presentation Overview: Show

BACKGROUND
Alternative splicing is an important mechanism for generating the diversity of protein products by expanding the repertoire of mRNA isoforms. The regulation of splicing is controlled by cis-acting RNA elements represented by two classes: splicing signals (the 5ʹ splice site, the branch point, and the 3ʹ splice site) and splicing enhancers/silencers, in which a number of RNA-binding proteins (RBP) operate. Splicing outcome is highly cell/tissue specific and regulated by many factors.

RESULTS
We utilized the results of the ENCODE project, in which genome-wide profiling of nearly 80 RBPs for two cancer cell lines (K562 and HepG2) have been performed using enhanced crosslinking and immunoprecipitation (eCLIP[1]), to build a machine learning model that integrates the spatial signal of RBPs around splice sites and RNA secondary structures. Importantly, the secondary structures involves long-range base pairing interactions[2], hence creating a more realistic distance metric compared to that of the unfolded RNA. The model predicts inclusion of an exon in a transcript. Utilizing only spatial signal of various RBPs our model achieves high performance (area under the ROC curve up to 0.89). However, the model complemented by secondary RNA structures gains even higher performance (AuROC=0.92) and allows correctly classify exon inclusion even without RBP signals in the vicinity of splice sites. We found that exon inclusion is strongly associated with the strength of the RBP signal. Additionally, we found that CSTF2T, PRPF8, and HNRNPM are among master splicing regulators. Moreover, RBPs signals exactly in the vicinity of splice sites (up to 200 bp) have the highest controlling impact.

CONCLUSIONS
Overall, our model predicts splicing outcome and supports the hypothesis that secondary RNA structures, including long-range interactions, plays an important regulatory role performing localization of RPBs to the proximity of splice sites. This mode of regulation has been recently shown for structures called RNA bridges which bring RBFOX2 binding sites to the site of action[3].

RNAcentral: The Unified Entry Point for Non-coding RNA Sequences
Date: Saturday, July 22
Time: 5:35 pm - 5:41 pm
Room: Meeting Hall IB
  • Anton I. Petrov

Presentation Overview: Show

"BACKGROUND
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that represents all types of ncRNA from a broad range of organisms [1]. RNAcentral provides a single entry point for anyone interested in ncRNA biology by integrating the data from a consortium of established RNA resources. The RNAcentral website is available at http://rnacentral.org.

RESULTS
Launched in 2014 [2], RNAcentral currently contains over ten million ncRNA sequences from more than twenty RNA databases, such as miRBase, RefSeq, GtRNAdb and others. Recent updates include ncRNA data from HGNC, Ensembl, and FlyBase. We are also integrating RNAcentral with the Rfam database [3] so that the majority of RNAcentral sequences are annotated with Rfam families and new Rfam families are build based on data from RNAcentral.

There are three main ways of browsing the data through the RNAcentral website. The text search makes it easy to explore all ncRNA sequences, compare data across different resources, and discover what is known about each ncRNA. Using the sequence similarity search one can search data from multiple RNA databases starting from a sequence. Finally, one can explore ncRNAs in select species by genomic location using an integrated genome browser. In addition, the data can be accessed programmatically using an API or downloaded for local analysis from the FTP archive.

CONCLUSIONS
RNAcentral continues to grow, with an additional one million new non-coding RNA sequences added to the database in 2016. The website has been continuously improved including a redesigned homepage, better search results, and a lightweight genome browser as a new entry point. Our immediate priorities include the incorporation of functional annotations of non-coding RNAs, such as intermolecular interactions, nucleotide modifications, and high-quality secondary structures. The ultimate goal of RNAcentral is to include curated information about all non-coding RNAs as UniProt does for proteins.
"

Integrative Deep Models for Alternative Splicing
Date: Saturday, July 22
Time: 5:41 pm - 5:55 pm
Room: Meeting Hall IB
  • Anupama Jha, University of Pennsylvania, United States
  • Matthew Gazzara, University of Pennsylvania Perelman School of Medicine, United States
  • Yoseph Barash, University of Pennsylvania, United States

Presentation Overview: Show

Motivation: Advancements in sequencing technologies have highlighted the role of alternative splicing (AS) in increasing transcriptome complexity. This role of AS, combined with the relation of aberrant splicing to malignant states, motivated two streams of research, experimental and computational. The first involves a myriad of techniques such as RNA-Seq and CLIP-Seq to identify splicing regulators and their putative targets. The second involves probabilistic models, also known as splicing codes, which infer regulatory mechanisms and predict splicing outcome directly from genomic sequence. To date, these models have utilized only expression data. In this work we address two related challenges: Can we improve on previous models for AS outcome prediction and can we integrate additional sources of data to improve predictionsfor AS regulatory factors.

Results: We perform a detailed comparison of two previous modeling approaches, Bayesian and Deep Neural networks, dissecting the confounding effects of datasets and target functions. We then develop a new target function for AS prediction in exon skipping events and show it significantly improves model accuracy. Next, we develop a modeling framework that leverages transfer learning to incorporate CLIP-Seq, knockdown and over expression experiments, which are inherently noisy and suffer from missing values. Using several datasets involving key splice factors in mouse brain, muscle and heart we demonstrate both the prediction improvements and biological insights offered by our new models. Overall, the framework we propose offers a scalable integrative solution to improve splicing code modeling as vast amounts of relevant genomic data become available. A

vailability: code and data available at majiq.biociphers.org/jha_et_al_2017/