Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Print your poster in Chicago
Poster Categories

View Posters By Category

Session A: (July 7 and July 8)
Session B: (July 9 and July 10)
A-1: HEPATIC miRNA PROFILE IN DENGUE HEMORRHAGIC FEVER PATIENTS AND ASSOCIATION WITH THE PATHOGENESIS
COSI: RNA
  • Layanna Oliveira, Evandro Chagas Institute, Brazil
  • João Vianez, Evandro Chagas Institute, Brazil
  • Amanda Andrade, Evandro Chagas Institute, Brazil
  • Carla Pagliari, University of São Paulo, Brazil
  • Leda Carvalho, University of São Paulo, Brazil
  • Taiana Silveira, University of São Paulo, Brazil
  • André Luiz Teles, University of São Paulo, Brazil
  • Jedson Cardoso, Evandro Chagas Institute, Brazil
  • Janaina Vasconcelos, Evandro Chagas Institute, Brazil
  • Maria Helena Maia, Universidade Federal do Pará, Brazil
  • Caroline Moreira-Nunes, Universidade Federal do Ceará, Brazil
  • Rommel Burbano, Universidade Federal do Pará, Brazil
  • Márcio Nunes, Evandro Chagas Institute, Brazil
  • Eduardo Santos, Universidade Federal do Pará, Brazil

Short Abstract: Abstract: Dengue virus (DENV) causes dengue hemorrhagic fever (DHF) and affect the liver, one of the most important target tissues in severe cases. We sequenced the miRNoma of human formalin fixed paraffin embedded (FFPE) liver tissue from ten DHF fatal cases. Eight miRNAs were found differentially expressed using miRDeep2 and edgeR, among that, three miRNAs were closely related to dengue immunopathogenesis: miR-126-5p -up regulated- is a regulatory molecule of endothelial cells, miR-122-5p (is liver-specific) and miR-146a-5p (Interferon-regulator) were down regulated (Fig1). Enrichment analysis of predicted target genes of overexpressed miRNAs revealed regulatory pathways of apoptosis and immune response (Fig2). We could detect 188 differentially expressed isoforms, including those differentially expressed miRNAs and were identified divergences in isomiRs and canonical expression level. Lastly, we also detected nine potential novel miRNAs targeting 76 genes, which may be involved on 131 cellular metabolic pathways and biological processes. This is the first description of hepatic human miRNA profile from DHF cases. The results demonstrated the association of miR-126-5p, miR-122-5p and miR-146a-5p with DHF liver pathogenesis, involving endothelial repair and vascular permeability regulation, control of homeostasis and expression of inflammatory cytokines, that can help to understand the regulatory mechanisms of DHF, diagnostic and anti-viral therapies.

A-2: Ab initio Identification of Novel microRNA in Plants: A Systematic Review
COSI: RNA
  • Buwani Manuweera, Montana State University, United States
  • Indika Kahanda, Montana State University, United States

Short Abstract: MicroRNAs (miRNAs) play a vital role as post-transcriptional regulators in gene expression. As the experimental determination of miRNAs is highly resource-consuming and error-prone, developing computational methods has become an active research area. This study aims to identify proposed solutions and unresolved problems in ab initio plant miRNA identification methods over the last decade. We first query five popular scientific databases for retrieving the relevant set of articles on novel plant miRNA identification. Then, a comprehensive comparative analysis is carried out on their methodologies and performance. In the last decade, there were 16 articles published on novel miRNA identification methods using plant datasets and 10 of them focused entirely on plants. Thirteen studies use supervised machine learning algorithms; Support Vector Machines algorithm is the most popular. The rest use RNA sequence mapping strategies for identifying miRNAs. We observe that, although the reported prediction accuracies of these methods are satisfactory, they still report a considerable amount of false negatives. In comparison to the large number of similar tools available for miRNA identification in animals, there is a need for more studies on plant miRNA identification, especially because the miRNA mechanisms significantly vary across animals and plants.

A-3: Discovering microRNA-offset RNA from Next Generation Sequencing data using moRNA Finder Bioinformatics Tool
COSI: RNA
  • Yang Yang, Univerisity of Macau, China
  • Changliang Wang, Univerisity of Macau, China
  • Liang Chen, University of Macau, Macao
  • Garry Wong, University of Macau, Macao

Short Abstract: Next generation sequencing (NGS) technologies have indicated that more than 90% of eukaryotic genomes are transcribed into protein-coding or non-protein-coding RNAs and approximately 98% comprise the latter. microRNA-offset RNA (moRNA) is a novel type of short non-protein-coding RNA (ncRNA), which was once considered as the co-products or degradation products of miRNAs. Previous screening studies indicated that moRNAs may act as miRNAs in some biological processes, but its biological function is still fragmentary. Currently, there is no publicly available bioinformatics tool for moRNA detection. This project succeeded in developing an effective and accurate software package named moRNA Finder that can detect moRNAs from NGS data sets. Based on Bayesian statistics theory and a previous algorithm to detect miRNAs, we constructed an optimized algorithm for moRNA identification and implemented it in a software tool. The software package provided accurate and sensitive identification of both known and novel moRNAs. This project will improve the knowledge concerning the abundance and distribution of moRNAs in biological systems and will benefit future studies for this new class of small RNA.

A-4: RNA G-quadruplex prediction to investigate a novel RNA regulation model.
COSI: RNA
  • Jean-Michel Garant, Université de Sherbrooke, Canada
  • Rachel Jodoin, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada
  • Jean-Pierre Perreault, Université de Sherbrooke, Canada

Short Abstract: G-quadruplexes (G4) are tetra-helices formed by the stacking of planar guanine tetrads. Their folding in RNA molecules were shown to affect mRNA post-transcriptional regulation and miRNA biogenesis. However, there are not enough data available to draw conclusions on the biological functions associated with RNA G4. The G4RNA tools were developed as a first step to address the issue. The G4RNA database is a reference support and a source of curated data for comparative analysis which was used to train an artificial neural network (G4NN). This approach allows the prediction of unusual observed G4 that cannot be predicted by classical motif searches. G4NN provides good classification performances and was thoroughly described during its optimization. It was validated using a set of high-throughput detected G4 occurrences and was also shown to be very efficient at discarding randomly selected sequences from the transcriptome. G4NN is integrated in G4RNA screener which scans RNA sequences to find favorable G4 folding conditions. G4RNA screener is used to identify and characterize sub-populations of G4 structures which act as shared features of regulation common to groups of RNA molecules. Its predictions have been challenged experimentally producing a G4 based structural sub-categorization that relates to colorectal cancer pathways.

A-5: Assessing the biological signal of different RNA fractions for computational deconvolution of healthy tissues
COSI: RNA
  • Francisco Avila Cobos, Ghent University, Belgium
  • Lucía Lorenzi, Ghent University, Belgium
  • Jo Vandesompele, Ghent University, Belgium
  • Gary Schroth, Illumina, United States
  • Katleen De Preter, Ghent University, Belgium
  • Pieter Mestdagh, Ghent University, Belgium

Short Abstract: Multiple approaches have been developed to infer abundance of different cell types in heterogeneous samples (=computational deconvolution). Albeit potentially applicable to different RNA fractions, current methods have been designed and tested on mRNAs only. Using expression data of long non-coding RNAs, circular RNAs, microRNAs and mRNAs from RNA-sequencing data across 160 normal cell types and 45 tissues from the RNA Atlas project, we investigated the performance of additional RNA fractions in the computational deconvolution. Tissues and cell types in the RNA-Atlas were matched based on UBERON ontology. For each cell type, we defined cell-type specific markers based on matching mRNA, lncRNA, miRNA and circRNA expression data. These markers were subsequently applied to determine the proportion of each cell type in each of the tissues through computational deconvolution. For any given tissue, we defined the “signal” as the sum of the proportions of all its constituent cell types. This signal was computed for mRNA, miRNA, lncRNA and circRNA markers separately. We found that mRNAs contained the highest amount of biological signal across tissues, closely followed by lncRNAs. Furthermore, despite having lower overall performance, both miRNAs and circRNAs can deconvolve specific tissues with higher accuracy than mRNAs and lncRNAs.

A-6: Ushering in a new era of CLIP-guided detection of miRNA targets
COSI: RNA
  • Maria D Paraskevopoulou, University of Thessaly, Greece
  • Dimitra Karagkouni, University of Thessaly, Hellenic Pasteur Institute, Greece
  • Ioannis S Vlachos, University of Thessaly, Greece
  • Spyros Tastsoglou, University of Thessaly, Hellenic Pasteur Institute, Greece
  • Artemis G Hatzigeorgiou, University of Thessaly, Hellenic Pasteur Institute, Greece

Short Abstract: AGO-PAR-CLIP is considered one of the most powerful high-throughput methodologies for miRNA target identification. Until today, PAR-CLIP experiments have been performed in numerous tissues and cell types from physiological or pathological conditions. Current AGO-CLIP-guided implementations present limitations that undermine the central position of these experiments in the characterization of miRNA targetome. They depend strongly upon the T-to-C conversions to define miRNA bindings, while the efficacy of neglected interactions remains unknown. By analyzing miRNA perturbation experiments and structural sequencing data we showed that the previously neglected non-T-to-C clusters exhibit functional miRNA binding events and strong accessibility. Our findings are integrated in microCLIP, an innovative in silico framework based on deep structured learning for CLIP-Seq-guided detection of miRNA interactions. microCLIP was trained and evaluated against a compendium of miRNA binding sites deduced by numerous low-yield techniques and the analysis of more than 200 high-throughput experiments. Contrary to existing implementations, microCLIP operates on every AGO-enriched cluster. The proper incorporation of non-T-to-C clusters yields an average 14% increase in miRNA-target interactions per PAR-CLIP library, uncovering previously elusive regulatory events. microCLIP framework robustly identifies 1.6-fold more validated binding sites compared to state-of-the-art algorithms, ushering in a new era of experimentally supported miRNA target annotation.

A-7: Integrating different transcription profiling data to determine mRNA stability upon host-pathogen interaction
COSI: RNA
  • Pooja Sethiya, University Of Macau, Macao
  • Maruti Nandan Rai, University Of Macau, Macao
  • Koon Ho Wong, University Of Macau, Macao

Short Abstract: Candida glabrata is an opportunistic pathogen that causes deadly infection in immunocompromised individuals. In order to understand how these pathogens, maintain homeostasis and establish virulence in hosts, we set out to map gene expression changes during macrophage infection. Many genome-wide studies are focused on studying gene expression at the transcription level, but expression of genes depends on mRNA stability and translation in addition to rate of mRNA synthesis. Here, we perform an integrated analysis of RNA Pol-lI occupancy by ChIP-Seq and mRNA levels through RNA-Seq to infer mRNA stability upon Candida glabrata infection of macrophage cells. We identified many genes whose relative ratio of transcription (by Pol II occupancy) to mRNA levels significantly changes upon infection, suggesting that the stability of those transcripts is altered during infection. Our preliminary result reveal transcript stability of different classes of genes with specific functions for instance, genes involved ribosome biogenesis, amino acid metabolism become unstable after C. glabrata enters the macrophage host, suggesting coordinated stability of related transcripts is a mechanism used by cells to adapt to changing environments. Our method provides a convenient means to determine mRNA stability in any organism for better understanding of gene expression regulation under given environmental condition.

A-8: Heteroformity: a global descriptor of alternative splicing
COSI: RNA
  • Stephen Mount, University of Maryland, College Par, United States
  • David Crawford, Univ. of Maryland, College Park, MD, United States

Short Abstract: The high-throughput sequencing of mRNA, together with software (e.g. Salmon and Kallisto) that allows high-throughput processing of thousands of samples, makes possible a global description of alternative splicing at the level of a sample, a tissue or a species. We define heteroformity, a measure of transcript diversity, as the fraction of transcript pairs drawn at random from a single gene that differ. The heteroformity of a gene thus varies between 0 (a single isoform) and 1 (the limit, when each transcript is different). The abundance-weighted gene heteroformity for a sample can be visualized as a cumulative distribution. Application to 11,688 human samples from 30 tissues in the Genotype-Tissue Expression (GTEx) project revealed some general patterns. About 25% of transcripts from all samples lie in genes with very low heteroformity, while the top quartile of transcripts are in genes with over 0.5 heteroformity. Tissues differ; reproductive and nervous tissue show more heteroformity. Nevertheless, there is great individual variation, with specific heart samples varying over threefold. We note that overall heteroformity and differential alternative splicing are distinct measures. Both reveal patterns of regulation. We are currently applying heteroformity to diverse samples and exploring its properties as a robust and useful metric.

A-9: Aberrant splicing in B-cell acute lymphoblastic leukemia
COSI: RNA
  • Ammar Naqvi, Children's Hospital of Philadelphia, United States
  • Kathryn Black, Children's Hospital of Philadelphia, United States
  • Katharina Hayer, Children's Hospital of Philadelphia, United States
  • Scarlett Y. Yang, Children's Hospital of Philadelphia, United States
  • Elisabeth Gillespie, Children's Hospital of Philadelphia, United States
  • Asen Bagashev, Children's Hospital of Philadelphia, United States
  • Vinodh Pillai, Children's Hospital of Philadelphia, United States
  • Sarah Tasian, Children's Hospital of Philadelphia, United States
  • Matthew R. Gazzara, University of Pennsylvania, United States
  • Martin Carrol, University of Pennsylvania, United States
  • Deanne Taylor, Children's Hospital of Philadelphia, United States
  • Kristen W. Lynch, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States
  • Andrei Thomas-Tikhonenko, Children's Hospital of Philadelphia, United States

Short Abstract: Aberrant splicing is a hallmark of leukemias with mutations in splicing factor (SF)-encoding genes. Here we investigated its prevalence in pediatric B-cell acute lymphoblastic leukemias (BALL), where SFs are not mutated. By comparing them to normal pro-B cells, we found thousands of aberrant local splice variations (LSVs) per sample, with 279 LSVs in 241 genes present in every comparison. These genes were enriched in RNA processing pathways and encoded ~100 SFs, e.g. hnRNPA1. hnRNPA1 3’UTR was pervasively misspliced, yielding the transcript subject to nonsense-mediated decay. Thus, we knocked it down in B-lymphoblastoid cells, identified 213 hnRNPA1-dependent splicing events, and defined the hnRNPA1 splicing signature in pediatric leukemias. One of its elements was DICER1, a known tumor suppressor gene; its LSVs were consistent with reduced translation of DICER1 mRNA. Additionally, we searched for LSVs in other leukemia and lymphoma drivers and discovered 81 LSVs in 41 genes. 77 LSVs were confirmed using two large independent B-ALL RNA-seq datasets. In fact, the twenty most common B-ALL drivers showed higher prevalence of aberrant splicing than of somatic mutations. Thus, post-transcriptional deregulation of SF can drive widespread changes in B-ALL splicing and likely contribute to disease pathogenesis.

A-10: TurboFold II: RNA Structural Alignment and Secondary Structure Prediction Informed by Multiple Homologs
COSI: RNA
  • Zhen Tan, University of Rochester, United States
  • Gaurav Sharma, University of Rochester, United States
  • David Mathews, University of Rochester, United States

Short Abstract: With increasing number of non-coding RNA families being identified, there is strong interest in developing computational methods to estimate sequence alignment and secondary structure. I developed TurboFold II, an algorithm that takes multiple, unaligned homologous RNA sequences, and outputs the predicted secondary structures and the structural alignment of the sequences. Secondary structure conservation information is incorporated into the alignment by using a match score, calculated from estimated base pairing probabilities to represent the secondary structural similarity between nucleotide positions in the two sequences. TurboFold II computes a multiple sequence alignment, based on a probabilistic consistency transformation and a hierarchically computed guide tree. The TurboFold II algorithm is modified for prediction of RNA secondary structures to utilize base pairing probabilities guided by SHAPE experimental data. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods.

A-11: RNA atlas: a nucleotide resolution map of the human transcriptome
COSI: RNA
  • Lucia Lorenzi, Ghent University, Belgium
  • Francisco Avila Cobos, Ghent University, Belgium
  • Robrecht Cannoodt, Ghent University, Belgium
  • Hua-Sheng Chiu, Texas Children’s Cancer Center, Baylor College of Medicine, United States
  • Tine Goovaerts, Ghent University, Belgium
  • Stephen Gross, Illumina, San Diego, California, United States
  • Tom Taghon, Ghent University, Belgium
  • Karim Vermaelen, Ghent University, Belgium
  • Ken Bracke, Ghent University, Belgium
  • Yvan Saeys, Ghent University, Belgium
  • Jeroen Galle, Ghent University, Belgium
  • Tim De Meyer, Ghent University, Belgium
  • Pieter-Jan Volders, Ghent University, Belgium
  • Thomas Hansen, Aarhus University, Denmark
  • Jørgen Kjems, Aarhus University, Denmark
  • Pavel Sumazin, Texas Children’s Cancer Center, Baylor College of Medicine, United States
  • Gary Schroth, Illumina, San Diego, California, United States
  • Jo Vandesompele, Ghent University, Belgium
  • Pieter Mestdagh, Ghent University, Belgium

Short Abstract: Technological advances in RNA expression profiling methods revealed that our genome is pervasively transcribed, producing an unexpectedly complex transcriptome consisting of various classes of RNA molecules and a huge isoform diversity. Many of these RNAs show high tissue specificity, with some being expressed in only one or few cell types. While numerous large-scale RNA-sequencing studies have been performed, samples involved are often complex tissues, masking transcripts expressed in low-frequent cell populations, and sequencing methods typically focus on one class of RNA transcripts. By applying complementary RNA sequencing methods (total RNA, poly-A RNA and small-RNA sequencing) across an extensive cohort of 300 human samples, we captured a wide variety of human transcripts, including protein coding genes, miRNAs, circular RNAs and long non-coding RNAs, a large fraction of which were previously unknown. We found that many non-coding RNAs show variable polyadenylation status across samples. We also compared cell-type specificity between different RNA species. Our results confirm the dynamic nature of the transcriptome, with many RNAs being expressed in only a limited number of cell-types. RNA atlas constitutes a unique resource for further studies on the function, organization and regulation of the different layers of the human transcriptome.

A-12: Using co-expression networks and predictive models to infer circular RNA regulatory function in colitis models
COSI: RNA
  • Bojan Losic, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Nicholas Akers, Icahn Institute for Muliscale Biology, United States
  • Carmen Argmann, Icahn Institute for Muliscale Biology, United States
  • Lauren Peters, Icahn Institute for Muliscale Biology, United States
  • Josh Friedman, Janssen Research, United States
  • Sergio Lira, Icahn Institute for Muliscale Biology, United States
  • X Huabao, Icahn Institute for Muliscale Biology, United States
  • Eric Schadt, Icahn Institute for Muliscale Biology, United States

Short Abstract: Circular RNAs (cRNA) are increasingly being recognized as an important class of noncoding RNA that are pervasively expressed in a variety of eukaryotes, display significant conservation across mammals, and are coherently expressed independently of their cognate linear isoforms. Their functional role and biogenesis remains largely unknown. Here we studied the role of cRNA in animal models of the onset of inflammatory bowel disease IBD. Leveraging ribosomal-depleted RNA sequencing obtained on 403 C57/B6 mice in longitudinal Dextran sulfate sodium (DSS) and adoptive T-cell transfer models, we compared the predictive power of cRNA and mRNA expression signatures to predict disease severity and evolution, jointly modeled cRNA and mRNA via co-expression networks to identify key drivers of colitis development, and finally detected cognate linear and circular RNA that displayed evidence of regulating different phenotypes to infer cRNA function. We found that cRNA signatures derived from blood rival the predictive power of mRNA signatures in tissue in predicting colitis disease severity, and furthermore that co-expression networks identify cRNA disease drivers and suggest that scalable functional cRNA screening is facilitated by identifying differential cognate cRNA/mRNA phenotype association.

A-13: Rfam: The transition to a genome-centric sequence database
COSI: RNA
  • Ioanna Kalvari, EMBL-EBI, United Kingdom
  • Joanna Argasinska, EMBL-EBI, United Kingdom
  • Eric Nawrocki, NCBI, United States
  • Anton Petrov, EMBL-EBI, United Kingdom
  • Rob Finn, EMBL-EBI, United Kingdom
  • Alex Bateman, EMBL-EBI, United Kingdom

Short Abstract: Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Rfam currently contains 2,772 families and continues to grow. Starting with release 13.0, Rfam switched to a new genome-based sequence database, which currently includes a non-redundant set of over 14,000 reference genomes identified by UniProt. The new database is more scalable and gives a more accurate view of the distribution of Rfam entries. Using complete genomes enables meaningful taxonomic comparisons and identification of a repertoire of RNA families found in a certain species. The text search functionality of the Rfam website was significantly improved. Users can now more easily search Rfam with the new and more powerful faceted text search. For example, it is possible to explore RNA families or ncRNAs in any annotated genome and compare annotations across genomes. The transition of Rfam to a genome-centric sequence database and the new website features make Rfam a more valuable resource for the sequence analysis community. Rfam is available at http://rfam.org.

A-14: Accurate assembly of transcripts through phase-preserving graph decomposition
COSI: RNA
  • Mingfu Shao, Carnegie Mellon University, United States
  • Carl Kingsford, Carnegie Mellon University, United States

Short Abstract: We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.

A-15: RNAcentral: The unified database of ncRNA sequences with comprehensive genomic mapping and improved quality controls
COSI: RNA
  • Blake Sweeney, EMBL-EBI, United Kingdom
  • Boris Burkov, EMBL-EBI, United Kingdom
  • Anton Petrov, EMBL-EBI, United Kingdom
  • Rob Finn, EMBL-EBI, United Kingdom
  • Alex Bateman, EMBL-EBI, United Kingdom
  • The Rnacentral Consortium, EMBL-EBI, United Kingdom

Short Abstract: Background The RNAcentral database (http://rnacentral.org) is a continuously growing, comprehensive collection containing over 11 million non-coding RNA (ncRNA) sequences of all types across a broad range of organisms. RNAcentral integrates over 25 expert resources, such as miRBase, LNCipedia, HGNC, and Ensembl, and provides an integrated faceted text and sequence search. Results To identify potentially inconsistent annotations, RNAcentral implemented new quality control procedures that annotate all RNAcentral sequences with Rfam families. These procedures warn users about partial sequences and potential contamination allowing users to identify and exclude problematic sequences from search results. Additionally, Rfam is used to annotate sequences with GO terms. RNAcentral is one of the largest sources of genome-level ncRNA annotations as it maintains a mapping of all ncRNA sequences from key species to reference genomes including sequences without annotated genomic locations or coming from non-reference assemblies. The data are available in a genome browser, a set of track hubs, and in multiple downloadable formats. Conclusions The RNAcentral website has been continuously improved with an updated text search interface and a feature viewer displaying Rfam annotations and modified nucleotides. We welcome feedback about the resource and invite new member database to join RNAcentral.

A-16: LinearFold: Linear-Time Prediction of RNA Secondary Structures
COSI: RNA
  • Dezhong Deng, School of EECS, Oregon State University, United States
  • Kai Zhao, Google, United States
  • David Hendrix, Oregon State University, United States
  • David Mathews, University of Rochester, United States
  • Liang Huang, School of EECS, Oregon State University, United States

Short Abstract: Predicting the secondary structure of an RNA sequence with speed and accuracy is useful in many applications such as drug design. The state-of-the-art predictors have a fundamental limitation: they have a runtime that scales cubically with the length of the input sequence, which is slow for longer RNAs and limits the use of secondary structure prediction in genome-wide applications. To ad- dress this bottleneck, we designed the first linear-time algorithm for this problem. which can be used with both thermodynamic and machine-learned scoring functions. Our algorithm, like previous work, is based on dynamic programming (DP), but with two crucial differences: (a) we incrementally process the sequence in a left-to-right rather than in a bottom-up fashion, and (b) because of this incremental processing, we can further employ beam search pruning to ensure linear runtime in practice (with the cost of exact search). Even though our search is approximate, surprisingly, it results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart).

A-17: High-throughput Analysis of Complex NanoString Expression Datasets
COSI: RNA
  • Nathan Riccitelli, Navigate BioPharma Services Inc., United States
  • Christian Laing, Navigate BioPharma Services Inc., United States
  • Jesus Zaragoza-Alvarez, Navigate BioPharma Services Inc., United States
  • Reinhold Pollner, Navigate BioPharma Services Inc., United States

Short Abstract: Accurate measurement of RNA expression is crucial in the quest to understand disease and identify drug targets. NanoString RNA assays represent an optimal solution to this challenge by using a direct, amplification free-expression measurement system to simultaneously detect hundreds of targets. Although NanoString assays overcome the multiplex limitations of traditional qPCR-based approaches and the stringent RNA quality and purity requirements needed for sequencing, robust data processing is crucial for reliable results. NavSIVRAC is a data analysis pipeline developed at Navigate BioPharma Services, Inc., a Novartis subsidiary, that is a modularly designed collection of open source and custom algorithms. It allows for the rapid implementation of novel analysis scripts and the high-degree of flexibility needed to suit individual clinical trial needs. The system integrates sample demographic information with the NanoString Digital Analyzer, and performs custom normalization strategies and gene expression differentiation using clustering and statistical inference. It also provides visualization tools of the raw and/or analyzed data for final result formatting and reporting. We have applied NavSIVRAC to multiple placebo/drug sets to aid in RNA profiling studies. Our pipeline identifies clear gene differentiation patterns among the data sets, maximizing the value of clinical information obtained from NanoString gene expression assays.

A-18: Identification of RNA-Binding Protein Targets with HyperTRIBE
COSI: RNA
  • Reazur Rahman, Brandeis University, United States
  • Weijin Xu, Brandeis University, United States
  • Michael Rosbash, Brandeis University, United States

Short Abstract: RNA binding proteins (RBPs) accompany RNA from birth to death, affecting RNA biogenesis and functions. Identifying RBP-RNA interactions is essential to understand their complex roles in different cellular processes. However, detecting in vivo RNA targets of RBPs, especially in a small number of discrete cells, has been a technically challenging task. We have previously developed a novel technique called TRIBE (Targets of RNA-binding proteins Identified By Editing) to overcome this problem. TRIBE expresses a fusion protein consisting of a queried RBP and the catalytic domain from RNA editing enzyme ADAR (ADARcd), which marks target RNA transcripts by converting adenosine to inosine near the RBP binding sites. These marks can be subsequently identified via high-throughput sequencing. In spite of its usefulness, TRIBE is constrained by a low editing efficiency and editing-sequence bias from the ADARcd. So, we developed HyperTRIBE by incorporating a previously characterized hyperactive mutation, E488Q, into the ADARcd. This strategy increases the editing efficiency and reduce sequence bias, which dramatically increased sensitivity of this technique without sacrificing specificity. HyperTRIBE provides a more powerful strategy to identify RNA targets of RBPs with an easy experimental and computational protocol at low cost in both flies and mammals.

A-19: Generating full-length, high quality human transcriptomes from PacBio Iso-seq data
COSI: RNA
  • Dana Wyman, University of California, Irvine, Center for Complex Biological Systems, Irvine, CA, United States
  • Gabriela Balderrama-Gutierrez, University of California, Irvine, United States
  • Shan Jiang, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States
  • Weihua Zeng, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States
  • Brian Williams, California Institute of Technology, United States
  • Barbara Wold, California Institute of Technology, United States
  • Ali Mortazavi, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States

Short Abstract: Conventional short-read RNA sequencing has been widely used to quantify gene expression in a variety of applications. However, short reads on their own lack the ability to resolve full-length isoforms, which can be several kilobases in length. Furthermore, computational methods developed to reconstruct isoforms from short read data are plagued by challenges, and results from different algorithms tend to be inconsistent. While long read sequencing technologies such as PacBio Iso-seq and Oxford Nanopore have a higher error rate than Illumina sequencing, they have great potential for isoform discovery and characterization of the 90% of multi-exon human genes that are thought to undergo alternative splicing. To take advantage of these properties, we develop a computational pipeline to process long reads into cleaned isoforms and generate a high-quality, full-length transcriptome. We demonstrate this process on PacBio Iso-seq data from human cell lines K562, GM12878, and HepG2 and show that the technology is mature enough to produce full-length transcriptomes by comparing the results to existing ENCODE data.

A-20: High-confidence computational identification of microRNAs that post-transcriptionally regulate sulfotransferase 2A1
COSI: RNA
  • Dongying Li, US Food and Drug Administration, United States
  • Bridgett Knox, US Food and Drug Administration, United States
  • Leihong Wu, NCTR, FDA, United States
  • Gokhan Yavas, US Food and Drug Administration, United States
  • Wenming Xiao, National Center for Toxicological Research, United States
  • Weida Tong, National Center for Toxicological Research, United States
  • Baitang Ning, US Food and Drug Administration, United States

Short Abstract: MicroRNAs (miRNAs) play important roles in interindividual variability in drug safety by modulating the expression of drug metabolizing enzymes. The Phase II drug metabolizing enzyme sulfotransferase 2A1 (SULT2A1) catalyzes many drugs to increase their solubility and facilitate their elimination. Down-regulation of SULT2A1 may affect drug-induced toxicity and is associated with several liver diseases including cholestasis and primary sclerosing cholangitis. However, little is known about the roles of miRNAs in down-regulation of SULT2A1. We utilized two prediction programs to identify potential binding positions of miRNAs on SULT2A1 mRNA. To evaluate the binding strength, the minimum free energy (MFE) of miRNA-mRNA interaction was then calculated by using RNAhybrid. Furthermore, we extracted RNA-seq and miRNA-seq data from The Cancer Genome Atlas (TCGA) and conducted Pearson correlation analyses of the levels of SULT2A1 mRNA and miRNA candidates. We found that hsa-mir-495 and hsa-mir-486 may target SULT2A1 at the 5’UTR and 3’UTR respectively and that their expression levels are inversely correlated with that of SULT2A1 in human liver samples. Our integrative analyses provide a foundation for investigating the repressive regulation of SULT2A1 by miRNAs.

A-21: Uncovering new non-coding RNA genes in human with TGIRT-Seq.
COSI: RNA
  • Vincent Boivin, Université de Sherbrooke, Canada
  • Olivier Boisvert, Université de Sherbrooke, Canada
  • Sonia Couture, Université de Sherbrooke, Canada
  • Sherif Abou Elela, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada

Short Abstract: High-throughput sequencing methods such as RNA-Seq have offered us a way to reveal the transcriptomic landscape’s complexity. However, it has become increasingly obvious that classical RNA-Seq poorly detects highly structured RNAs, which has contributed to their poor characterization. Our previous studies led us to favor the TGIRT-Seq method which substitutes the retroviral reverse transcriptase for a Thermostable Group II Intron Reverse Transcriptase (TGIRT). TGIRT-Seq allows to detect highly structured RNAs such as snoRNA and tRNA in their correct biological abundance. We present here the discovery of hundreds of non-annotated non-coding RNA genes that are only found in these TGIRT-Seq datasets. We show that many of these novel genes share high sequence and structure similarity with known RNAs such as snoRNAs and tRNAs. Comparisons with RNA polymerase III ChIP datasets and ddPCR following the depletion of specific RNA-binding proteins validate that many of these genes are, indeed, actively transcribed, and give indications of their functions. Understanding the function of these genes is a challenge as more than a third show no similarity with known genes. Nevertheless, it is clear that much remains to be understood about highly-structured RNA and that the endeavor of gene annotation in human is not over yet.

A-22: Global search for small RNAs in Bordetella pertussis using RNA-seq pipelines READemption and ANNOgesic
COSI: RNA
  • Chin-Hsien Emily Tai, National Cancer Institute, NIH, United States
  • Kyung Moon, NIDDK, United States
  • Jeffers Neuyen, NIDDK, United States
  • Charlotte Merzbacher, NIDDK, United States
  • David Kim, NIDDK, United States
  • Qing Chen, FDA, United States
  • Scott Stibitz, FDA, United States
  • Deborah Hinton, NIDDK, United States
  • Konrad Förstner, University of Würzburg, Germany
  • Sung-Huan Yu, Max Planck Institute of Biochemistry, Germany

Short Abstract: Bordetella pertussis (Bp) is the causative agent of highly contagious whooping cough. Expression of virulence genes is under a master two-component regulation system, BvgAS. BvgS, a sensor kinase, phosphorylates a response regulator, BvgA, which then forms the active phosphorylated dimers (BvgA~P). Previous studies have shown that the RNA chaperone Hfq is important in virulence of Bp, which suggested that Hfq-dependent small RNA may play a crucial role in virulence regulation. Therefore, we conducted genome wide search of sRNA in Bp under various conditions. The RNA-seq pipeline READemption was used for alignment and differential expression analysis; sRNA and sRNA target prediction were performed with ANNOgesic. Our RNA-seq data revealed about 150 possible sRNA, 33 of them are potential Hfq-binders. Among the 15 predicted sRNA tested by Northern Blot, the number of True Positive, True Negative, False Positive and False Negative are 7, 3, 3, 2 respectively. S17 is an example of a Hfq-bidning sRNA. The level of S17 increases in the presence of BvgA~P and Hfq in both RNA-seq and Northern blot analyses. It suggests that an unknown repressor(s) may be involved in the expression of this RNA. Target genes of S17 are currently under investigation.

A-23: Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci
COSI: RNA
  • Namshik Han, University of Cambridge, United Kingdom
  • Tony Kouzarides, University of Cambridge, United Kingdom

Short Abstract: We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other’s expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.

A-24: RPIDisorder: A machine learning method for improved prediction of RNA-Protein interaction partners
COSI: RNA
  • Carla Mann, Iowa State University, United States
  • Drena Dobbs, Iowa State University, United States

Short Abstract: RNA-protein interactions are implicated in a wide range of critical regulatory and structural roles whose disruption can lead to numerous diseases. Computational methods for predicting RNA-protein interaction partners (RPIPs) are valuable because experimentally characterizing these interactions is time-consuming and expensive. Published prediction methods utilize various sequence and structural features, but are generally limited by high false positive rates (FPRs) and/or query sequence length. Because intrinsically disordered regions (IDRs) are abundant in RNA-binding sites of proteins, we hypothesized that incorporating IDR information with sequence features could improve prediction of RPIPs. We developed a new random forest machine learning classifier, RPIDisorder, which requires only primary sequences of potential RNA and protein interaction partners as input. RPIDisorder outperformed our published classifier, RPISeq, on an independent test set of 11,281 RPIPs and 971 non-interacting pairs, with MCC 0.68 (vs 0.47) and FPR 21% (vs 55%). In a case study, RPIDisorder was used to identify RNAs bound to the Fragile-X Mental Retardation Protein (FMRP). On a test set of 30 RNAs (14 binding and 16 non-binding ncRNAs), RPIDisorder achieved an MCC of 0.73 and FPR 6.3%. These results indicate that incorporating IDR information can improve the reliability of RNA-protein partner prediction over sequence composition alone.

A-25: Negative selection as an unbiased benchmark for evaluation of mis-splicing predictors
COSI: RNA
  • Philip Fradkin, Deep Genomics, Inc., Canada
  • Daniele Merico, Deep Genomics, Inc., Canada

Short Abstract: Presence or absence of genetic variation in the general population is reflective of negative selection (particularly for genes predicted to be haploinsufficient) and mutation probability. In this work, we used this property to evaluate changes in splicing regulatory sequence. We considered all genomic positions covered by gnomAD 120,000 exomes. We then looked at variance in negative selection with respect to mutation type corrected for nucleotide mutation probability. We observed 10% of all possible synonymous variants and 5.5% of all possible missense as well as 1% of all nonsense mutations (stop gains, splice-acceptor,splice-donor). Based on selection constraint methodology we present an unbiased approach for evaluating mis-splicing predictors. MaxEntScan and DeepScan,a deep learning tool, are the best splicing variant effect predictors, when considering only variants in the splicing consensus region. However, splicing effects in the distal intronic and exonic regions appear to be too weak to be detected. In conclusion we present an investigation into impact of negative selection with regards to mutation type as well as an alternative approach for splicing predictor evaluation that trades a reduction in statistical power for an unbiased evaluation set.

A-26: ReQTL – an interpretable allele-level measure of variation-expression genomic relationships
COSI: RNA
  • Anelia Horvath, George Washington University, United States
  • Liam Spurr, George Washington University, United States
  • Li Muzi, Georgetown University, United States

Short Abstract: We introduce ReQTL, a method to assess co-regulated genomic regions via correlation between gene expression and ex-pressed allele frequency at single nucleotide variant (SNV) positions from RNA-sequencing data. We exemplify the application on sets of cancer genomic data from TCGA and demonstrate that ReQTL analyses show consistently high performance and sufficient power to outline both previously known and novel molecular associations. ReQTL analyses are computationally feasible and do not require matched DNA data, hence hold a strong potential to facilitate the discovery of novel molecular interactions through exploration of the increasingly accessible RNA-sequencing datasets.ReQTL toolkit is available from: https://github.com/HorvathLab/ReQTL

A-27: GENAVi​: a Shiny web application for G​ene E​xpression N​ormalization A​nalysis and Vi​sualization
COSI: RNA
  • Alberto Reyes, Cedars-Sinai Medical Center, United States
  • Tiago Silva, Cedars-Sinai Medical Center, United States
  • Simon Coetzee, Cedars-Sinai Medical Center, United States
  • Jasmine Plummer, Cedars-Sinai Medical Center, United States
  • Dennis Hazelett, Cedars-Sinai Medical Center, United States
  • Kate Lawrenson, Cedars-Sinai Medical Center, United States
  • Benjamin Berman, Cedars-Sinai Medical Center, United States
  • Simon Gayther, Cedars-Sinai Medical Center, United States
  • Michelle Jones, Cedars-Sinai Medical Center, United States

Short Abstract: The development of next generation sequencing (NGS) methods has resulted in a rapid increase in the generation of large genomic datasets. However, the development of tools accessible to those without bioinformatics training has not progressed at the same pace, and the lack of user-friendly tools remains a significant challenge. Additionally, the correct processing pipeline and normalization strategy for NGS data is important for downstream analysis; several pipelines have been published, and expert knowledge of genomics and statistics is required to select the appropriate methods. This presents a two-fold challenge; selection of the most appropriate analysis pipelines, and bioinformatics skills sufficient to apply this pipeline. To address these challenges we have combined R packages used for RNA-Seq analysis and visualization in a Shiny Web Application to create GENAVi (Gene Expression Normalization Analysis and Visualization). This GUI based application provides a user-friendly platform to normalize expression data, cluster samples based on expression, perform differential expression analysis and visualize results. We have performed RNA-Seq on a panel of 20 cell lines frequently used for the study of breast and ovarian cancer and included this data within GENAVi as a resource, and a foundation for users to bring their own data to the application.

A-28: Exon Size and Sequence Conservation Improves Identification of Splice Altering Nucleotides
COSI: RNA
  • Maliheh Movassat, UCI, United States
  • Elmira Forouzmand, UCI, United States
  • Klemens Hertel, UCI, United States

Short Abstract: Pre-mRNA splicing is an essential step of gene expression that is regulated through multiple trans-acting splicing factors interacting at intronic and exonic positions. Since most exons are protein coding, evolution of exons must be modulated by a combination of selective coding and splicing pressures. We have previously demonstrated that deconvolution of splicing pressures is enhanced when phylogenetic comparisons are made in the framework of identically sized exons. We hypothesize that exon size-filtered sequence alignments may improve identification of nucleotides that have evolved to mediate efficient exon ligation. To address this, an exon size database was generated evaluating 100 vertebrate sequence alignments based on exon size conservation. The inclusion of splice site strength, gene position, and flanking intron length information in the database, permits identification of exons simultaneously conserved by sequence and size. While highly size-conserved exons are always sequence conserved, sequence conservation did not necessitate exon size conservation. Our analysis identified exons unique to humans/primates, indicative of exons considered to be evolutionarily young. By further comparing exon-size alignments with a published dataset of disease-associated SNPs, we demonstrated that coding pressures dominate nucleotide composition at invariable codon positions. This exon-size alignment approach permits identification of splice-altering nucleotides specifically at wobble positions.

A-29: Endogenous TDP­‐43 mutant mice have novel gain of splicing function and ALS characteristics in vivo
COSI: RNA
  • Prasanth Sivakumar, University College London, United Kingdom
  • Jack Humphrey, University College London, United Kingdom
  • Kitty Lo, University College London, United Kingdom
  • Thomas Ricketts, MRC Harwell, United Kingdom
  • Hugo Oliveira, MRC Harwell, United Kingdom
  • Francisco Baralle, International Center for Genomic Engineering and Biotechnology, Italy
  • Emanuele Buratti, International Center for Genomic Engineering and Biotechnology, Italy
  • Linda Greensmith, University College London, United Kingdom
  • Vincent Plagnol, University College London, United Kingdom
  • Elizabeth Fisher, University College London, United Kingdom
  • Abraham Acevedo-Arozena, MRC Harwell, United Kingdom
  • Pietro Fratta, University College London, United Kingdom

Short Abstract: Background: RNA processing dysfunction has been implicated the pathology of the neurodegenerative disease amyotrophic lateral sclerosis (ALS), notably due to the characteristic mislocalisation of crucial RNA-binding protein TDP-43. This indicates the importance of investigating the widespread TDP-43 dysfunction-mediated changes in RNA processing, with the aim of identifying differential gene and transcript expression in the context of neurodegenerative disease. Methods: We investigated two mouse models of TDP-43, each containing a single substitution within the coding region of the TDP-43 gene. One mutation in the RRM2 domain, the other in the C-terminal hotspot for ALS-causative mutations. RNA sequencing was used to examine differential gene expression and alternative splicing events, while iCLIP highlighted changes in RNA-binding patterns. Results and discussion: Severe molecular dysregulation was identified in both models. The mutation of RRM2 led to dose-dependent preferential exon inclusion, including cryptic exons. Alongside this was downregulation of long intron-containing genes, typically neuronal. The altered C-terminus mutation caused greater levels of exon skipping, including novel gain-of-function splicing which resulted in mutant-specific ‘skiptic’ transcripts. iCLIP confirmed both cryptic and ‘skiptic’ events to be enriched for TDP-43 binding sites. Collectively these results highlight the array of TDP-43-mediated disrupted RNA processing features in neurodegenerative disease models.

A-30: Dissecting newly transcribed and old RNA using GRAND-SLAM
COSI: RNA
  • Christopher Jürges, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany
  • Lars Dölken, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany
  • Florian Erhard, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany

Short Abstract: Global quantification of total RNA is used to investigate steady state levels of gene expression. However, being able to differentiate pre-existing RNA and newly transcribed RNA can provide invaluable information (estimate RNA half-lives, identify fast and complex regulatory processes,...). Recently, new techniques based on metabolic labeling and RNA-seq have emerged that allow to quantify new and old RNA: Nucleoside analogs are incorporated into newly transcribed RNA and are made detectable as point mutations in mapped reads. However, relatively infrequent incorporation events and significant sequencing error rates make the differentiation between old and new RNA a highly challenging task. We developed a statistical approach termed GRAND-SLAM that, for the first time, allows to estimate the proportion of old and new RNA in such an experiment. Uncertainty in the estimates is quantified in a Bayesian framework. Simulation experiments show our approach to be unbiased and highly accurate. Furthermore, we analyze how uncertainty in the proportion translates into uncertainty in estimating RNA half-lives and give guidelines for planning experiments. Finally, we demonstrate that our estimates of RNA half-lives compare favorably to other experimental approaches and that biological processes affecting RNA half-lives can be investigated with greater power than offered by any other method.

A-31: An RNA-based Cis-Pi score discovers a new class of transcriptional regulatory lincRNAs
COSI: RNA
  • Zhezhen Wang, The University of Chicago, United States
  • Carlos Perez-Cervantes, The University of Chicago, United States
  • John Cunningham, The University of Chicago, United States
  • Xinan Yang, The University of Chicago, United States

Short Abstract: Background: Long intergenic noncoding RNAs (lincRNAs) have risen to prominence in cancer biology. Association of lincRNAs with cis-regulatory DNA elements (enhancers) provides mechanistic insight into transcriptional regulation; however, in the absence of an enhancer, functional lincRNAs remain challenging for computational prediction. Methods: We designed and evaluated a cis-pi score to predict regulatory lincRNAs by assessing the mutual biological relevance between lincRNAs and target genes. To predict transcriptional regulatory lincRNAs in neuroblastoma, an aggressive pediatric cancer, we enhanced this scoring system and developed a novel side-by-side analytics pipeline for RNA-Seq data to measure lincRNAs with relatively low expression levels. Results: Risk-dependently transcribed lincRNAs over-represented neuroblastoma susceptibility loci and recaptured novel clinical biomarkers. The lincRNAs prioritized by cis-pi not only dissected independent high-risk patients but were significantly prognostic. The predicted target genes further inherited the prognostic significance of these lincRNAs. Conclusion: Altered expression of lincRNAs that stratifies tumor risk is an informative readout of oncogenic enhancer activity. Risk-dependent and prognostic lincRNAs provide cis-regulatory insights into cancer biology. Significance: RNA-Seq alone is sufficient to identify regulatory lincRNAs using our methodologies, allowing broader applications. Regulatory lincRNAs that have polyA tails without a hallmark of enhancer activity could represent a new class of functional lincRNAs.

A-32: HMMER4: Scaling Sequence Homology Analysis to Very Large Datasets
COSI: RNA
  • Nicholas Carter, Harvard University, United States
  • Sean Eddy, HHMI and Harvard University, United States

Short Abstract: We describe new techniques and technologies under development for HMMER4, the fourth generation of the HMMER software for identifying homologous biological sequences using profile HMMs. These advances allow HMMER4 to efficiently analyze billion-sequence databases and million-sequence family alignments while improving its ability to recognize remote homologs. HMMER4 replaces HMMER3's local-only alignments with a combined global/local alignment probability model that is better able to annotate the bounds of complete sequence domains when they are present. A new probabilistic domain identification algorithm annotates domain coordinates in multidomain proteins using an ensemble calculation that takes alignment uncertainty into account, rather than relying on a single optimal alignment. Memory-efficient sparse dynamic programming and checkpointing techniques allow HMMER4 to support 100,000-element sequences and 100,000-position HMMs while consuming less than 1GB of RAM per core. Use of wider AVX and AVX-512 vector instructions increases performance by about 2x over HMMER3's 128-bit SSE implementation. An improved data format allows HMMER4 to scale well on multi-core processors, unlike HMMER3, which saturated at 2-4 cores. Improved load-balancing and parallelization improve performance on multi-computer systems, delivering sub-second search times on a 16-server cluster. A prototype GPU implementation shows potential to further improve performance.

A-33: Alternative Splicing of Core Hippo Signaling Components Supports Hepatocyte Proliferation During Liver Regeneration
COSI: RNA
  • Sushant Bangru, University of Illinois at Urbana-Champaign, United States
  • Waqar Arif, University of Illinois at Urbana-Champaign, United States
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States

Short Abstract: During liver regeneration, most new hepatocytes arise from pre-existing ones; yet, the underlying mechanisms that drive quiescent hepatocytes to proliferate following injury remain poorly defined. By combining high-resolution transcriptome and polysome profiling of hepatocytes purified from quiescent and toxin-injured adult mouse livers, we uncover pervasive shifts in ribosome occupancies for transcripts encoding metabolic and RNA processing factors. The translational remodeling modulates protein levels of a set of splicing factors, amongst which, downregulation of Epithelial Splicing Regulatory Protein 2 (ESRP2) activates a neonatal splicing program that rewires the Hippo signaling pathway in regenerating hepatocytes. We show that neonatal Hippo protein isoforms have lower signaling capacity, which allows higher transcriptional activity of the downstream YAP1 and TEAD1 effectors, thereby sustaining hepatocyte proliferation. We further demonstrate that ESRP2 knockout mice manifest excessive hepatocyte proliferation upon injury, whereas forced expression of ESRP2 inhibits hepatocyte proliferation by impeding the production of neonatal Hippo isoforms. Thus, our findings reveal an ESRP2-Hippo pathway alternative-splicing axis that controls hepatocyte proliferation in response to chronic liver injury.

A-34: Measuring ribosome profiling at isoform level: towards unveiling the functional impact of alternative splicing
COSI: RNA
  • Marina Reixachs, Universitat Pompeu Fabra, Spain
  • Jorge Ruiz, IMIM, Spain
  • Mar Albà, IMIM - ICREA, Spain
  • Eduardo Eyras, Universitat Pompeu Fabra - ICREA, Spain

Short Abstract: The differential production of transcript isoforms through the mechanism of alternative splicing is crucial in multiple biological processes as well as pathologies, including cancer. This has been exhaustively shown at RNA level but it remains elusive at protein level. Sequencing of ribosome-protected mRNA fragments (ribosome profiling) provides information on the transcripts being translated. We describe a new pipeline for the quantification of individual transcript coding sequences from ribosome profiling using both RNA-seq and Ribo-seq. Using multiple datasets, we find evidence of translation for 50-70% of the isoforms quantified with RNA-seq. Additionally, we performed differential splicing analysis between glia and glioma samples from human and mouse and found consistent changes occurring in both RNA-seq and Ribo-seq for the majority of cases, indicating that changes in the relative abundance of transcript isoforms lead to changes in the production of protein isoforms in the same direction. Among the cassette exon events changing splicing, we identified an enrichment of orthologous exons with the majority of them preserving the directionality of the change. Interestingly, there was a significant enrichment of microexons that decrease inclusion in glioma compared to glia in both, human and mouse, suggesting a concerted mechanism of dedifferentiation in glioma.

A-35: Overexpression of a non-muscle Rbfox2 splice isoform drives cardiac dysfunctions in Myotonic Dystrophy type 1
COSI: RNA
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States

Short Abstract: Myotonic dystrophy type 1 (DM1) is a dominantly inherited neuromuscular disease caused by a CTG repeat expansion in 3’-UTR of DMPK gene. DM1 affects multiple tissues, but cardiac dysfunctions are the second leading cause of death. The best characterized pathogenic mechanism of DM1 is toxic gain-of-function of expanded CUG repeat RNA that accumulates to form ribonuclear foci affecting MBNL and CELF family of splicing factors. However, misregulation of MBNL1 or CELF1 does not explain the cardiac phenotypes observed in DM1. We have discovered that steady-state protein levels of RBFOX2, a critical splicing regulator, are drastically increased in DM1 heart tissue, which is accompanied by simultaneous skipping of a muscle-specific exon in the Rbfox2 transcript. We demonstrate that tet-inducible overexpression of the non-muscle RBFOX isoform, or CRISPR/Cas9 mediated deletion of the muscle-specific RBFOX2 exon in the mouse heart results in prolonged PR and QRS intervals, slower conduction velocity, and cardiac arrhythmias that mirror human DM1 pathology. RNA-sequencing of the isolated cardiomyocytes from these mice identified a core network of mRNA splicing defects in genes involved in cardiac conduction and excitation-contraction coupling. Collectively, our study has uncovered a novel role for a non-muscle Rbfox2 splice isoform in DM1 cardiac pathogenesis.

A-36: Identification and Characterization of Untranslated Region (UTR) Extensions under Drought and Heat Stress in Switchgrass (Panicum virgatum L.)
COSI: RNA
  • Sulbha Choudhari, Purdue University, United States
  • Shaojun Xie, Purdue University, United States
  • Ketaki Bhide, Purdue University, United States
  • Malay C Saha, Noble Research Institute, United States
  • Venu Kalavacharla, Delaware State University, United States
  • Jyothi Thimmapuram, Purdue University, United States

Short Abstract: The untranslated regions (UTRs) of mRNAs have characteristics of noncoding RNAs that regulate gene expression. Due to their sessile nature, plants are exposed to various biotic and abiotic stresses and undergo different post-transcriptional modifications to combat such stresses leading to 5’/3’ UTR lengthening. Here, we report that drought and heat stress for a prolonged time may result in an extension of 5’/3’ UTR in genes related to stress compared to non-treated controls in switchgrass. We identified more than 17,000 UTR extensions of varying lengths. Based on the differential expression of these UTR extensions, we selected 330 extensions for further characterization. Of these 148 are 5’ UTR and 182 are 3’ UTR extensions. The characterization of these extensions revealed their similarities to long-noncoding RNAs based on length distribution and coding potential. Since the reference genome is still in draft, some of these extensions may be due to misannotation as 38 of 330 extensions are predicted to have coding potential. Based on the differential expression of mRNA with and without UTR extensions reads, we identified putative UTR extensions that may play an important role in stress response genes and thus expand our current understanding of switchgrass transcriptome.

A-38: Discovery of structured noncoding RNAs in bacteria by examining long, GC-rich intergenic regions.
COSI: RNA
  • Kenneth Brewer, Yale University, United States
  • Ronald Breaker, Yale University, United States

Short Abstract: Since the experimental validation of the first riboswitch classes in 2002, more than 40 additional classes have been reported. With the recent identification of the ligands for some long-standing ‘orphan’ riboswitch candidates such as ykkC (guanidine-I riboswitch), it is increasingly likely that all the most widespread riboswitch classes have already been found. However, it has been proposed that many thousands of additional riboswitch classes that are less widespread remain to be discovered. New computational approaches will be required to enable their rapid discovery. We have developed a computational approach that is optimized for discovering new, less-common classes of structured noncoding RNAs (ncRNAs), including new riboswitch candidates. This approach employs in-depth homology searches on the longer, GC-rich intergenic regions of individual bacterial genomes. Potential ncRNA motifs can then be evaluated based on predicted structure, sequence conservation, nucleotide covariation, and genetic context to assign probable biological functions. Preliminary results on a set of five bacterial genomes has revealed the existence of wide variety of probable regulatory RNA motifs including uORFs and riboswitch candidates.

A-39: Novel Insights into Gene Expression Regulation during Meiosis Revealed by Translation Elongation Dynamics
COSI: RNA
  • Renana Sabi, Tel Aviv University, Israel
  • Tamir Tuller, Tel Aviv University, Israel

Short Abstract: Numerous studies have demonstrated the critical role of translational control in the dynamic regulation of protein synthesis. However, most of them suggested that the elongation phase is not regulated in a condition-specific manner and is rather 'static'. Here, we employ novel computational approaches applied to ribosome profiling data to estimate for the first time the distinct changes in translation elongation and initiation at multiple time points during yeast meiosis. We show that codon decoding rates and thus mRNAs elongation rates change dynamically and substantially during meiosis to facilitate the translation of transcripts whose proteins are required at specific time points. Our approach captured a unique elongation pattern at the onset of anaphase II that was invisible to previous translational analyses. Particularly, we identified a large cluster of lowly expressed genes involved in sister chromatid segregation that showed a strong temporal shift toward increased elongation efficiency precisely when these processes occurred. Also at this time point, the elongation of the ribosomal proteins is decreased but their initiation is maintained to promote the translation of these anaphase II genes. Our analysis provides new insights into gene expression regulation during meiosis and demonstrates a functional role of translation elongation dynamics.

A-40: Transcriptome analysis reveals divergent cardiac signature in patients with stable coronary artery disease and preserved ejection fraction with or without left ventricular dysfunction
COSI: RNA
  • Christoffer Frisk, Uppsala University, Sweden
  • Sarbashis Das, Uppsala University, Sweden
  • Anna Walentinsson, AstraZeneca R&D, Sweden
  • Chanchal Kumar, Karolinska University Hospital, Department of Clinical Physiology, Sweden
  • Maria J Eriksson, Karolinska Institutet, Department of Molecular Medicine and Surgery, Sweden
  • Camilla Hage, Karolinska Institutet, Department of Medicine, Sweden
  • Hans Persson, Karolinska Institutet, Department of Clinical Sciences, Sweden
  • Cecilia Linde, Karolinska Institutet, Department of Medicine, Sweden
  • Bengt Persson, Uppsala University, Sweden

Short Abstract: Background. Heart failure affects 2–3 % of the adult Western population and its prevalence increases, in particular the proportion of heart failure with preserved (P) left ventricular (LV) ejection fraction (EF). We hypothesized that patients undergoing elective coronary by pass surgery (CABG) with PEF physiology will show distinctive gene expression compared to patients with normal LV physiology. Methods. Cardiac biopsies from the left ventricle were obtained from the CABG patients. The patients were divided into two groups, Normal or PEF physiology, according to echocardiography, NTproBNP levels and HF guidelines definitions. Results. Of a total of 16 patients 5 were classified as having PEF and 11 as having Normal physiology. Utilizing principal component analysis on batch corrected normalized gene expression data, the samples clearly clustered into these two groups. A total of 743 differentially expressed genes were identified and analyzed to characterize functional correlations and regulatory properties. We found that the top biological functions associated with down-regulated genes in PEF were cardiac muscle contraction, oxidative phosphorylation, endocytosis and matrix organization. Conclusions. This exploratory study could confirm our hypothesis that patients undergoing elective CABG with PEF physiology had distinctive gene expression compared to patients with normal physiology.

A-41: Dual Transcriptomic profiling of host and microbiota during non-gluten and gluten-feed chickens
COSI: RNA
  • Ki-Duk Song, Chonbuk National University, South Korea
  • Kwan Seob Shim, Chonbuk National University, South Korea
  • Da Rae Kang, Chonbuk National University, South Korea
  • Kyung Hye Won, Chonbuk National University, South Korea
  • Donghyun Shin, Chonbuk National University, South Korea

Short Abstract: The effects of consumption of insoluble proteins on physiology, including host response and microbial community has not been studied in chicken. In this study, we adapted a novel approach that combines RNA microbial identification with host gene expression to characterize and validate metagenomic taxonomic profiling to elucidate the genomic responses in intestines of gluten, as insoluble proteins, fed chickens. Using whole metagenomic shotgun RNA sequencing, we identified and compared the microbial communities of individuals with gluten-feed and control chickens (1-week old and 4-week old, respectively). Microbial reads were used to characterize the microbial diversity, and potential difference between those groups. Chicken reads were used to estimate the expression of known genes involved in the host response by gluten and detect potential differences between those groups. We identified 289 differentially expressed host genes in comparison of those groups. These DEGs were analyzed by KEGG pathway, leading to identification of PPAR signaling pathway and ribosome, as enriched pathways by gluten uptake. And microbial communities in the small intestine of chickens differed significantly between those groups, and especially, Shigella sonnei was significantly overrepresented in gluten-fed chickens, showing that the dual RNA-sequencing approach can be applied to dissect the interactions between host and microbes.

A-42: Ribothrypsis, a novel process of canonical mRNA decay, and its implications
COSI: RNA
  • Fadia Ibrahim, University of Pennsylvania, United States
  • Manolis Maragkakis, University of Pennsylvania, United States
  • Panagiotis Alexiou, University of Pennsylvania, United States
  • Zissimos Mourelatos, University of Pennsylvania, United States

Short Abstract: What do we truly measure with RNA-Seq? In eukaryotes, aberrant mRNAs are eliminated by mRNA surveillance pathways whereas canonical mRNAs are degraded by deadenylation, decapping and exonucleolysis. Until recently, decay and translation were considered distinct processes but new studies are beginning to show otherwise. In this work, we simultaneously capture the 3' and 5' ends of capped and polyadenylated RNAs respectively, in human cells, in vivo. We integrated this with large-scale genomic datasets and found that unexpectedly, mRNAs are subject to repeated, cotranslational, ribosome-phased, endonucleolytic cuts in a process that we termed ribothrypsis. We showed that mRNA decay is initiated by a ribosome stall that triggers an endonucleolytic cleavage and propagates by upstream ribosomes cleavages. Ribothrypsis is a conserved process with potential regulatory roles and can be triggered by G-quadruplexes. Our results demonstrate a cotranslational mRNA decay far beyond expectations with a remarkable ~64% of the 3′ ends of capped RNAs and ~63% of the 5′ ends of polyadenylated RNAs mapping within coding sequences. Also, cells are awash with mRNA fragments, residuals of ribothrypsis, challenging the central assumption behind profiling methods such as RNA-Seq, microarrays and RT-PCR, that mRNAs exist as full-length molecules in cells.

A-43: Long noncoding RNA (lncRNA)-Protein coding gene (PCG) regulatory networks responsive to diverse xenobiotics in rat liver
COSI: RNA
  • Kritika Karri, Boston University, United States
  • David J Waxman, Boston University, United States

Short Abstract: The role of lncRNAs in the extensive genomic and epigenetic responses of mammalian liver to xenobiotic exposure remains elusive. Here, we analyzed 115 liver RNA-seq data sets from male rats exposed to 27 chemicals representing diverse mechanisms of action, ranging from activation of nuclear receptors to induction of DNA damage, to assemble the long non-coding transcriptome. We characterized gene structures and response patterns for 5798 rat liver lncRNAs, of which 1447 were differentially expressed by xenobiotic exposure. Remarkably, 280 of these lncRNAs responded to >10 of the 27 xenobiotics. In most cases, chemicals with common mode of action clustered tightly based on gene expression pattern. Weighted Correlation Network Analysis (WGCNA) identified lncRNA- PCG regulatory modules enriched for specific biological functions, and revealed putative regulatory lncRNAs occupying key points (hubs) in co-expression networks with genes involved in liver metabolism and hepatoxicity. These putative lncRNA regulators showed strong co-expression patterns with local (cis effect) and distal PCGs (trans effect). Many of these PCGs belonged to Cyp and Sult family of genes with known involvement in xenobiotic metabolism. Our findings will guide further mechanistic research on the roles of these lncRNAs in the hepatotoxicity or detoxification responses to diverse chemical exposures.

A-44: Computationally Reconstructing Cotranscriptional RNA Folding Pathways from Experimental Data
COSI: RNA
  • Angela Yu, Weill Cornell Medicine, Northwestern University, United States
  • Paul Gaspar, University at Albany, State University of New York, United States
  • Eric Strobel, Northwestern University, United States
  • Kyle Watters, University of California, Berkeley, United States
  • Alan Chen, University at Albany, State University of New York, United States
  • Julius Lucks, Northwestern University, United States

Short Abstract: Formation of RNA structure begins during RNA transcription, and the final folded structure can depend on the series of folding events during transcription called a cotranscriptional folding pathway. However, few methods exist that can generate high resolution models of this ubiquitous folding process that is important for many biological processes including gene expression, splicing, and macromolecular assembly. Previous work showed improvement in equilibrium RNA structure predictions when experimental RNA structure probing data is incorporated in the algorithm. Not many existing computational methods can predict out of equilibrium structures that occur during transcription and none of these take RNA structure probing data as input to guide the predictions. We present a novel method to predict RNA secondary and tertiary structure from cotranscriptional SHAPE-Seq data called Reconstructing RNA Dynamics from Data (R2D2). We applied R2D2 to the E. coli Signal Recognition Particle (SRP) RNA. Our predictions informed a point mutation design that disrupts the wildtype cotranscriptional folding pathway and precludes the formation of the wildtype final structure, which is predicted by computational minimum free energy structure methods. Overall the R2D2 algorithm provides a powerful starting point for utilizing experimental data to gain deeper insights into cotranscriptional RNA folding and its biological impacts.

A-45: Predicting the diagnosis of bacterial pneumonia in mechanically ventilated patients: A transcriptomic model
COSI: RNA
  • Ziyou Ren, Northwestern University, United States
  • James Walter, Northwestern University, United States
  • Luis Amaral, Northwestern University, United States
  • Scott Budinger, Northwestern University, United States

Short Abstract: Mechanically ventilated patients in the intensive care unit (ICU) are frequently exposed to unnecessary antibiotics. Novel approaches to exclude bacterial pneumonia in critically ill patients are urgently needed to avoid antibiotic-induced complications. We used RNA-Seq to analyze the mRNA transcriptome in flow-sorted alveolar macrophages collected from mechanically ventilated patients with and without bacterial pneumonia defined by quantitative culture. A transcriptional signature of bacterial infection was present in both resident and recruited alveolar macrophages. Gene signatures from both cell types identified patients with bacterial pneumonia. Test characteristics were used to construct a positive prediction model. Informative transcriptomic biomarkers can be generated from BAL fluid obtained during routine clinical care in the ICU. Transcriptomic profiling of BAL fluid offers promise in aiding antibiotic stewardship efforts in the ICU.

A-46: Assembling the building blocks for a unified splicing code
COSI: RNA
  • Anupama Jha, University of Pennsylvania, United States
  • Matthew R. Gazzara, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States

Short Abstract: Background Alternative splicing has a key role in increasing transcriptome diversity. Since differential splicing is prevalent among tissues, cell types and developmental stages, its misregulation can lead to diseases. This motivates computational research efforts to uncover splicing regulatory mechanisms. Splicing codes are probabilistic graphical models that predict splicing outcome in different conditions. The connections in these models can be queried to understand the contribution of different regulatory mechanisms towards the splicing outcome. A key limitation of previous splicing codes is that they model only cassette/exon-skipping events. Results Here, we propose a computational framework that extends the work from Jha et al. 2017 in three directions. First, we introduce a unified framework for splicing code for alternative 3’ and 5’ events in addition to the exon-skipping events. Second, we improve the framework to handle inherent structure in the splicing data, modeling it explicitly. Finally, we develop a convolutional neural network that learns motifs from the RNA sequence de-novo while making use of the existing and newly added features. We evaluate the new framework on diverse tissue datasets from human and mouse and demonstrate its improvement compared to previous models.

A-47: Variation in LINE1 transcript levels within and between 17 normal somatic tissues
COSI: RNA
  • Gm Jonaid, University of Nevada, Las Vegas., United States
  • Nicky Chung, University of Nevada, Las Vegas, United States
  • Sophia Quinton, University of Nevada, Las Vegas, United States
  • Austin Ross, University of Nevada, Las Vegas, United States
  • Cody Clymer, University of Nevada, Las Vegas, United States
  • Adrian Alberto, University of Nevada, Las Vegas, United States
  • Mira Han, University of Nevada, Las Vegas, United States

Short Abstract: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recently we learned that full length or partial transcripts of LINE1 are frequently found in the somatic cells. However, the extent of variation in LINE1 transcript levels across different tissues and different individuals, and the genes and pathways that are co-expressed with LINE1 are unknown. Here we report the extent of tissue-specific variation in LINE1 expression levels across tissues and between individuals observed in the normal tissues collected for The Cancer Genome Atlas (TCGA). Our results confirm earlier reports of higher L1HS expression in the esophagus and stomach tissue. We also show that mitochondrial functions are enriched among the genes that show negative correlation with L1HS in transcript level, and that PHD fingers, bromodomains and KRAB-zinc fingers (KRAB-ZFPs) are enriched among the genes positively co-expressed with L1HS. The stable tissue-specific expression of individual LINE1 integrants and their correlated expression with KRAB-ZFPs support the hypothesis that specific LINE1 integrants are co-opted as part of the human gene regulation network, with many KRAB-ZFPs as their activators.

A-48: Impact of cell-type specific alterations on the analysis and interpretation of bulk tissue data
COSI: RNA
  • Lilah Toker, The University of British Columbia, Canada
  • Ogan Mancarci, The University of British Columbia, Canada
  • Shreejoy J Tripathy, The University of British Columbia, Canada
  • Paul Pavlidis, The University of British Columbia, Canada

Short Abstract: High-throughput methods are commonly used to study polygenic disorders. Despite the increasing use of single-cell data, the main source for these analyses remains bulk tissue, especially in the field of neuropsychiatric and neurodevelopment disorders. However, while bulk tissue data is relatively abundant, the analysis and interpretation of these data are not straightforward since the observed transcriptional alterations can represent alterations in cellular densities as well as functional or regulatory changes. We have previously demonstrated that cellular marker-genes can be used to infer cell-type specific changes from brain bulk tissue expression data. We have now implemented this approach in the analysis of multiple datasets of bipolar disorder and schizophrenia subjects, demonstrating robust changes in astrocytes and parvalbumin cells in both disorders. Importantly, accounting for alterations in these two cell-types had a dramatic effect on the outcome of differential expression and functional enrichment analyses. Specifically, our results indicate that the previously reported downregulation of mitochondria-related genes might merely be an outcome of decrease in parvalbumin interneurons exhibiting high expression of these genes rather than global alteration in mitochondrial function. Our results emphasize that analysis and interpretation of bulk tissue data should always be done with the consideration of possible cell-type specific alterations.

A-49: New tools for RNA epigenetics: an open-source approach to RNA modification analysis
COSI: RNA
  • Samuel Wein, University of Pennsylvania, United States
  • Byron Andrews, STORM Therapeutics Limited, United Kingdom
  • Timo Sachsenberg, University of Tübingen, Germany
  • Helena Santos-Rosa, University of Cambridge, United Kingdom
  • Tony Kouzarides, University of Cambridge, United Kingdom
  • Benjamin Garcia, University of Pennsylvania, United States
  • Hendrik Weisser, STORM Therapeutics Limited, United Kingdom

Short Abstract: The importance of chemical modifications of RNA sequences in different biological contexts is being increasingly appreciated, giving rise to the field of RNA epigenetics. A pivotal challenge in this area is the identification of modified RNA residues within their sequence contexts. Mass spectrometry would offer a solution by using approaches analogous to shotgun proteomics. However, software support for the necessary data analyses is currently lacking. In particular, search engines that match tandem mass spectra to theoretical spectra derived from sequence databases are required. We present a database search engine for RNA sequences, developed in C++ within the OpenMS framework for computational mass spectrometry. We implemented classes representing endonucleases, (modified) ribonucleotides, RNA sequences, and a corresponding generator for theoretical spectra. We integrated modification data from the MODOMICS database and developed an output format for RNA identification results based on the proteomics standard mzTab. Finally, we added visualisation capabilities for these results to OpenMS’ viewer application. Our search engine supports the estimation of false discovery rates (FDR) based on target-decoy search strategies. We evaluated the performance of our software based on two benchmark samples, containing modified and unmodified versions of in vitro transcribed and chemically synthesised RNA, respectively, with promising initial results.

A-50: Prediction and evolutionary analysis of RNA Binding Proteins across eukaryotic genomes.
COSI: RNA
  • Huzaifa Hassan, Indiana University – Purdue University Indianapolis, United States
  • Sarath Chandra Janga, Indiana University – Purdue University Indianapolis, United States

Short Abstract: RNA Binding Proteins (RBP's) are key players in several post-transcriptional regulatory mechanisms. High throughput technologies have led to the identification of large number of RBP's and RNA binding regions. Although experimental methods have increased the repertoire of RBPs in model systems, the complete repertoire of RBP's across species is far from complete. In this study, we developed a computational pipeline to predict RNA binding proteins using RNA binding domain (RBD's) and Homology information. Our approach involved, using peptides of RNA binding regions from 529 RBP's and a dataset of 1344 experimentally known Human RBP's as a reference set. Domain-based prediction using HMMER was integrated with homology information to get an integrated genome-wide prediction of RBP's. Benchmarking of our predictions against mouse genes annotated with GO term 'RNA Binding' resulted in a precision of 60% and recall of 75%. An average of 1750 RBPs were identified across eukaryotes with few lower order species exhibiting fewer RBPs suggestive of the divergence of RBP repertoire in distant relatives. In contrast to transcription factors and kinases, RBPs exhibited an increase in their number with increase in genome size. A co-occurrence network of RBDs revealed prominent enrichment of classical RBDs with other domains.

A-51: Language for representing gene model alignment of alternative gene transcripts
COSI: RNA
  • Ognyan Kulev, Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics, Bulgaria
  • Paweł P. Łabaj, MCB UJ, Kraków, Poland & Austrian Academy of Sciences, Vienna, Austria
  • Irena Avdjieva, Sofia University "St. Kliment Ohridski", Bulgaria
  • Dimitar Vassilev, Sofia University "St. Kliment Ohridski", Faculty of Mathematics and Informatics, Bulgaria

Short Abstract: Alternative splicing and alternative 5' and 3' splice sites are an important evolutionary advantage for eukaryotes since they allow a gene to have more than one product. There are seven canonical models describing basic splicing mechanisms but they are not sufficient for the representation of complex splicing events. In literature, there is already an established diagrammatic visual language for comparing gene models of alternative gene transcripts. This work presents a new formal regular language that allows such alignments to be represented in multiple valid ways with different levels of generalization. More general representations cover much greater variety of gene models. These multiple valid representations for an alignment are nested with IS-A relation between them. Such relations form a partially ordered set, or directed acyclic graph, that can be used for refined summarizing of all alternative gene transcripts in a genome. This is basis for our work on analysing alternative transcripts in evolution and evaluating genome annotation maturity.

A-52: Differentiating between cell type proportion changes and gene regulation in whole tissue expression profiles
COSI: RNA
  • Ogan Mancarci, The University of British Columbia, Canada
  • Lilah Toker, The University of British Columbia, Canada
  • Shreejoy J Tripathy, The University of British Columbia, Canada
  • Paul Pavlidis, The University of British Columbia, Canada

Short Abstract: Analysis of gene expression in whole tissues remain an important tool for the study of neurological disorders. These types of analyses are complicated by the heterogeneity of brain tissues due to difficulties in differentiating cell type specific differences from global changes in gene expression. Recently, we published a method for summarising expression of cell type markers using principal component analysis (marker gene profiles) and demonstrated that they reflect cell type proportion changes in select datasets. We now expand the scope of our analysis by introducing quality metrics to ensure that the differences in marker gene profiles reflect cell type specific changes in arbitrary whole tissue studies. Our quality metrics examine the effect size, the number of correlating cell type markers and how much variance is explained by the first principal component of marker gene expression. We show that quality metrics are useful in differentiating between true (whole tissue studies with known differences in cell type proportions) and false positives (studies where no cell type proportion differences are expected). Finally, we examine marker gene profiles in ~400 previously published whole tissue datasets from mouse and human brains to detect cell type specific differences not analysed by the original studies.

A-53: Probing RNA Structure in the 5’ Untranslated Region of Coxsackievirus B3 Genomic RNA
COSI: RNA
  • Christopher Horn, University of Nebraska at Omaha, United States
  • Quinn Nelson, University of Nebraska at Omaha, United States
  • Bejan Mahmud, University of Nebraska at Omaha, United States
  • Alyssa Averhoff, University of Nebraska at Omaha, United States
  • William Tapprich, University of Nebraska at Omaha, United States

Short Abstract: Coxsackievirus B3 (CVB3) is a cardiovirulent enterovirus from the family Picornaviridae. The RNA genome houses an internal ribosome entry site (IRES) in the 5’ untranslated region (5’UTR) that enables cap-independent translation. Ample evidence suggests that the structure of the 5’UTR is a critical element for virulence. We probe RNA structure in solution using base-specific modifying agents such as dimethyl sulfate as well as backbone targeting agents such as N-methylisatoic anhydride used in Selective 2’-Hydroxyl Acylation Analyzed by Primer Extension (SHAPE). We have developed a pipeline that merges and evaluates base-specific and SHAPE data together with statistical analyses that provides confidence intervals for reactivity values. Combining the “2%-8% rule” for normalization with base-specific mean and standard deviation calculations, ANOVA and multiple comparison procedures, we generate confidence intervals for each position, thereby verifying resulting secondary and tertiary structure models. Our datasets demonstrate that reactivity of each nucleotide base primarily parallels modification of the backbone, but not at every position. Using reactivity values validated by our statistical analyses, we are now in position to provide base-by-base analysis of RNA structural transitions. Understanding these transitions extends our previous comparative analysis of genomes from virulent and avirulent serotypes and sequential structural states during RNA-protein interaction.

A-54: Lantern: a semi-automated pipeline and repository for annotating lncRNA with ontologies
COSI: RNA
  • Swapna Vidhur Daulatabad, Indiana University-Purdue University Indianapolis, United States
  • Rajneesh Srivastava, Indiana University-Purdue University Indianapolis, United States
  • Sarath Chandra Janga, Indiana University – Purdue University Indianapolis, United States

Short Abstract: The accelerating rate of evidence discovery for long non-coding RNAs’ (lncRNAs) role in various critical biochemical, cellular and physiological processes is necessitating the need for robust lncRNA annotation resources. Although, there are a plethora of resources for annotating protein-coding genes, resources with lncRNA-ontology annotations are rare. Here, we present lncRNA annotation extractor and repository -Lantern (http://www.iupui.edu/~sysbio/lantern/), which provides high quality-controlled ontology annotations, extracted by mining recent lncRNA literature using a robust Natural Language Processing (NLP) based approach. LncRNA-relevant literature was obtained as a corpus of abstracts and ontology annotations were extracted using NCBO’s ontology-recommender system using a semi-automated pipeline. Benchmarking analysis was performed by evaluating the extracted annotations against lncRNAdb’s manually curated free-text. Lantern’s extracted annotations have a recall of 0.62 and precision of 0.8. Lantern’s web-interface not only provides Gene, Human Phenotype, SNOMEDCT and Disease Ontology annotations, but also houses an extensive range of functional omics data like: tissue-specific lncRNA expression, lncRNA-RBP interactions, lncRNA-protein co-expression, coding-potential, sub-cellular localization and SNPs on lncRNAs, computed and extracted via robust NGS pipelines. Thus, making it a holistic resource for improving the understanding of the noncoding transcriptome with the extracted annotations and functional associations for ~11,000 lncRNAs in the human genome.

A-55: Benchmark of lncRNA Quantification for RNA-Seq of Cancer Samples
COSI: RNA
  • Hong Zheng, Stanford University, United States
  • Kevin Brennan, Stanford University, United States
  • Mikel Hernaez, University of Illinois at Urbana-Champaign, United States
  • Olivier Gevaert, Stanford University, United States

Short Abstract: Long non-coding RNAs (lncRNAs) emerge as important regulators of various biological processes. While many studies have exploited public resources such as The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for expression quantification of lncRNAs. In this benchmarking study, we compared the performance of pseudoalignment methods Kallisto and Salmon, and alignment-based methods HTSeq, featureCounts, and RSEM, in lncRNA quantification, by applying them to a simulated RNA-Seq dataset and a pan-cancer dataset. Pseudoalignment-based methods detect more lncRNAs than alignment-based methods and correlate highly with simulated ground truth, while alignment-based methods underestimate the expression for some lncRNAs, including cancer-relevant lncRNAs TERC and ZEB2-AS1. Overall, 10-16% of lncRNAs are detected in the samples, with antisense and lincRNAs the two most abundant categories. A higher proportion of antisense RNAs are detected than lincRNAs. Moreover, antisense RNAs, lncRNAs with fewer transcripts, less than three exons, and lower sequence uniqueness are more discordant with ground truth. Full transcriptome annotation, including both protein coding and noncoding RNAs, greatly improves the specificity of lncRNA quantification. In summary, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.

A-56: Visual representation and analysis of complex splicing variations derived from large heterogeneous RNASeq datasets
COSI: RNA
  • Christopher Green, Biociphers, United States
  • Yoseph Barash, Biociphers, United States
  • Jordi Vaquero-Garcia, BioCiphers, United States

Short Abstract: Large heterogeneous datasets such as GTEX and TCGA pose not only computational challenges but also visualization and usability challenges for exploring and interpreting the results. Specifically, differential splicing analysis using these datasets requires software that allows users to visualize and interrogate large sets of local splicing variations (LSV) across multiple comparisons such as tissues or ethnicity groups. The LSVs can be complex, involving more than two alternative junctions, and the group sizes can vary from few samples to hundreds. Users should be able to easily navigate between those at the gene, LSV, or junction level, by groups or by individual samples of interest (e.g. outliers). Finally, the differential splicing analysis should be able to connect to other resources such as raw data, protein domain annotation, and primer design for validation. In order to address the above challenges we are developing an extension of MAJIQ’s visualization package VOILA (Vaquero et. al Elife 2016). The VOILA enhancements include new visualization modes for large datasets and multiple comparisons; efficient Javascript libraries to store within the browser large data and provide asynchronous random access; ability to export data into other tools; and an API for developers to use or extend VOILA independently.

A-57: Gene isoform abundance quantification with third generation transcriptome sequencing
COSI: RNA
  • Andrew Thurman, University of Iowa, United States
  • Yue Zhao, University of Iowa, United States
  • Haomin Li, University of Iowa, United States
  • Kin Fai Au, University of Iowa, United States

Short Abstract: Third generation sequencing platforms produce reads from DNA molecules with much larger read lengths than second generation sequencing platforms but with lower throughput. For transcriptome sequencing, long reads have been used to construct full-length gene isoforms, while higher throughput short reads have remained popular for quantifying isoform abundance. PacBio long read RNA sequencing also requires a size selection step to alleviate bias due to sequencer preference for shorter molecules. Here, we have developed a method for gene isoform abundance quantification using long reads that allows for ambiguity in assignment of reads to isoforms and accounts for sampling bias due to isoform length. We conducted numerical studies to understand situations where bias correction is necessary and analyzed statistical properties of our method. We also analyzed short reads and long reads simulated from the human transcriptome to understand how read length, number of reads, and repetitive regions of the genome impact abundance quantification. Further, to evaluate the adequacy of our method to improve quantification, we compared our method to a standard quantification approach on three long read data sets.

A-58: Meta-analysis of ALS patients and TDP-43 model systems reveals shared splicing variations and regulatory signatures
COSI: RNA
  • Matthew R. Gazzara, University of Pennsylvania, United States
  • Anupama Jha, University of Pennsylvania, United States
  • Yoseph Barash, Biociphers, United States

Short Abstract: TDP-43 is an RNA-binding protein associated with the neurodegenerative diseases amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). This protein is ubiquitously expressed and is localized primarily in the nucleus where it regulates pre-mRNA processing, such as alternative splicing. While many datasets have been generated in order to elucidate the regulatory targets of TDP-43, previous studies focused on a single context or type of splicing event (e.g. cryptic exons). Furthermore, little effort has been made to link TPD-43 regulated splicing to changes seen in ALS patients on a transcriptome-wide scale. To this end, we performed a comprehensive meta-analysis of available RNA-seq using MAJIQ, a splicing quantification algorithm that captures both classical as complex splicing variations. Our analysis includes TDP-43 depletion, knockout, overexpression, and mutations across a wide array of metazoans including human, mouse, Drosophila, and C. elegans. We also include samples from control and ALS patient tissues including cortex, cerebellum, spine, and PBMCs. Strikingly, we find common splicing variations that are regulated by TDP-43 across evolution and are dysregulated in ALS patients. Finally, analysis of sequence features of using probabilistic splicing code models for TDP-43 regulated events or those dysregulated in ALS showed common regulatory signatures, suggesting novel co-regulators.

A-59: Comparative transcriptome analysis of Dengue 2, Brazilian Zika AB and Zika MR-766 reveals differences in the induction of cytoskeletal reorganization in human neurospheres
COSI: RNA
  • Joao Lidio Vianez Junior, Evandro Chagas Institute, Brazil
  • Janaina Vasconcelos, Evandro Chagas Institute, Brazil
  • Tibério Cesar Tortola Burlamaqui, Evandro Chagas Institute, Brazil
  • João Vianez, Evandro Chagas Institute, Brazil

Short Abstract: The Zika virus (ZIKV) can cause a congenital syndrome that leads to early brain development impairment by affecting neural progenitor cells. However, the molecular mechanisms of this pathology have not been established. We employed whole transcriptome sequencing of human neurospheres exposed to ZIKV isolated in Brazil (AB strain). In addition, we also investigated changes in gene expression induced by ZIKV MR-766 and Dengue 2. When comparing the brazilian ZIKV strain with ZIKV766 and Dengue 2, 455 and 91 differentially expressed genes (DEGs) were found, respectively. Several DEGs involved in the regulation of actin and cytoskeleton were overexpressed in both ZIKV strains, but significantly more on the AB strain. The same was true for the gap junction and tight junction pathways. GO analysis revealed a significant increase in the biological processes of the GO categories such as “cell adhesion”, “cell projection morphogenesis”, “cell morphogenesis involved in neuron differentiation” and “neuron projection morphogenesis” in the Brazilian Zika strain, when compared to ZIKV 766 and Dengue 2. Several studies identified alterations of the cytoskeleton by some flaviviruses, including Dengue-2 and Zika, but out results suggests that the AB strain can disrupt cytoskeletal reorganization more effectively than ZIKV 766 and Dengue 2.

A-60: ROGUE: an R Shiny app for advanced RNAseq analysis and biomarker discovery
COSI: RNA
  • Alvin Farrel, Children’s Hospital of Philadelphia, United States
  • Peng Li, National Institutes of Health, United States
  • Sharon Veenbergen, Erasmus Medical Center, Netherlands
  • Warren Leonard, National Institutes of Health, United States

Short Abstract: The growing power and reducing cost of RNA sequencing (RNAseq) technologies have resulted in an explosion of RNAseq dataset production. Comparing gene expression values within RNAseq datasets is relatively trivial for many interdisciplinary biomedical researchers, however, more complexed analyses and a deep exploration of multiple datasets, is bottlenecked by the availability of highly skilled bioinformaticians. ROGUE (RNAseq Ontology Graphic User Environment) is a user-friendly R Shiny application that allows a biologist to perform differentially expressed gene analysis, gene ontology and pathway enrichment analysis, potential biomarker identification, and advanced statistical analyses. Here we use ROGUE to identify potential biomarkers and show unique enriched pathways between various tumors derived from the neural crest. User-friendly tools for the advanced analyses of NGS data, such as ROGUE, will allow biologists to efficiently explore their datasets, discover expression patterns, and advance their research with the developing and testing of hypotheses in the absence of a bioinformatician.