Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

RNA: Computational RNA Biology

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Saturday, July 7th
10:15 AM-10:20 AM
RNA: Introduction
Room: Grand Ballroom B
10:20 AM-11:00 AM
Invited Talk 1
Room: Grand Ballroom B
11:00 AM-11:20 AM
Aberrant splicing in B-cell acute lymphoblastic leukemia
Room: Grand Ballroom B
  • Ammar Naqvi, Children's Hospital of Philadelphia, United States
  • Kathryn Black, Children's Hospital of Philadelphia, United States
  • Katharina Hayer, Children's Hospital of Philadelphia, United States
  • Scarlett Y. Yang, Children's Hospital of Philadelphia, United States
  • Elisabeth Gillespie, Children's Hospital of Philadelphia, United States
  • Asen Bagashev, Children's Hospital of Philadelphia, United States
  • Vinodh Pillai, Children's Hospital of Philadelphia, United States
  • Sarah Tasian, Children's Hospital of Philadelphia, United States
  • Matthew R. Gazzara, University of Pennsylvania, United States
  • Martin Carrol, University of Pennsylvania, United States
  • Deanne Taylor, Children's Hospital of Philadelphia, United States
  • Kristen W. Lynch, University of Pennsylvania, United States
  • Yoseph Barash, University of Pennsylvania, United States
  • Andrei Thomas-Tikhonenko, Children's Hospital of Philadelphia, United States

Presentation Overview: Show

Aberrant splicing is a hallmark of leukemias with mutations in splicing factor (SF)-encoding
genes. Here we investigated its prevalence in pediatric B-cell acute lymphoblastic leukemias (BALL),
where SFs are not mutated. By comparing them to normal pro-B cells, we found
thousands of aberrant local splice variations (LSVs) per sample, with 279 LSVs in 241 genes
present in every comparison. These genes were enriched in RNA processing pathways and
encoded ~100 SFs, e.g. hnRNPA1. hnRNPA1 3’UTR was pervasively misspliced, yielding the
transcript subject to nonsense-mediated decay. Thus, we knocked it down in B-lymphoblastoid
cells, identified 213 hnRNPA1-dependent splicing events, and defined the hnRNPA1 splicing
signature in pediatric leukemias. One of its elements was DICER1, a known tumor suppressor
gene; its LSVs were consistent with reduced translation of DICER1 mRNA. Additionally, we
searched for LSVs in other leukemia and lymphoma drivers and discovered 81 LSVs in 41 genes.
77 LSVs were confirmed using two large independent B-ALL RNA-seq datasets. In fact, the
twenty most common B-ALL drivers showed higher prevalence of aberrant splicing than of
somatic mutations. Thus, post-transcriptional deregulation of SF can drive widespread changes
in B-ALL splicing and likely contribute to disease pathogenesis.

11:20 AM-11:40 AM
Measuring ribosome profiling at isoform level: towards unveiling the functional impact of alternative splicing
Room: Grand Ballroom B
  • Marina Reixachs, Universitat Pompeu Fabra, Spain
  • Jorge Ruiz, IMIM, Spain
  • Mar Albà, IMIM - ICREA, Spain
  • Eduardo Eyras, Universitat Pompeu Fabra - ICREA, Spain

Presentation Overview: Show

The differential production of transcript isoforms through the mechanism of alternative splicing is crucial in multiple biological processes as well as pathologies, including cancer. This has been exhaustively shown at RNA level but it remains elusive at protein level. Sequencing of ribosome-protected mRNA fragments (ribosome profiling) provides information on the transcripts being translated. We describe a new pipeline for the quantification of individual transcript coding sequences from ribosome profiling using both RNA-seq and Ribo-seq. Using multiple datasets, we find evidence of translation for 50-70% of the isoforms quantified with RNA-seq.
Additionally, we performed differential splicing analysis between glia and glioma samples from human and mouse and found consistent changes occurring in both RNA-seq and Ribo-seq for the majority of cases, indicating that changes in the relative abundance of transcript isoforms lead to changes in the production of protein isoforms in the same direction. Among the cassette exon events changing splicing, we identified an enrichment of orthologous exons with the majority of them preserving the directionality of the change. Interestingly, there was a significant enrichment of microexons that decrease inclusion in glioma compared to glia in both, human and mouse, suggesting a concerted mechanism of dedifferentiation in glioma.

11:40 AM-12:00 PM
Overexpression of a non-muscle Rbfox2 splice isoform drives cardiac dysfunctions in Myotonic Dystrophy type 1
Room: Grand Ballroom B
  • Auinash Kalsotra, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

Myotonic dystrophy type 1 (DM1) is a dominantly inherited neuromuscular disease caused by a CTG repeat expansion in 3’-UTR of DMPK gene. DM1 affects multiple tissues, but cardiac dysfunctions are the second leading cause of death. The best characterized pathogenic mechanism of DM1 is toxic gain-of-function of expanded CUG repeat RNA that accumulates to form ribonuclear foci affecting MBNL and CELF family of splicing factors. However, misregulation of MBNL1 or CELF1 does not explain the cardiac phenotypes observed in DM1.
We have discovered that steady-state protein levels of RBFOX2, a critical splicing regulator, are drastically increased in DM1 heart tissue, which is accompanied by simultaneous skipping of a muscle-specific exon in the Rbfox2 transcript. We demonstrate that tet-inducible overexpression of the non-muscle RBFOX isoform, or CRISPR/Cas9 mediated deletion of the muscle-specific RBFOX2 exon in the mouse heart results in prolonged PR and QRS intervals, slower conduction velocity, and cardiac arrhythmias that mirror human DM1 pathology. RNA-sequencing of the isolated cardiomyocytes from these mice identified a core network of mRNA splicing defects in genes involved in cardiac conduction and excitation-contraction coupling. Collectively, our study has uncovered a novel role for a non-muscle Rbfox2 splice isoform in DM1 cardiac pathogenesis.

12:00 PM-12:10 PM
Integrating different transcription profiling data to determine mRNA stability upon host-pathogen interaction
Room: Grand Ballroom B
  • Pooja Sethiya, University Of Macau, Macao
  • Maruti Nandan Rai, University Of Macau, Macao
  • Koon Ho Wong, University Of Macau, Macao

Presentation Overview: Show

Candida glabrata is an opportunistic pathogen that causes deadly infection in immunocompromised individuals. In order to understand how these pathogens, maintain homeostasis and establish virulence in hosts, we set out to map gene expression changes during macrophage infection. Many genome-wide studies are focused on studying gene expression at the transcription level, but expression of genes depends on mRNA stability and translation in addition to rate of mRNA synthesis. Here, we perform an integrated analysis of RNA Pol-lI occupancy by ChIP-Seq and mRNA levels through RNA-Seq to infer mRNA stability upon Candida glabrata infection of macrophage cells. We identified many genes whose relative ratio of transcription (by Pol II occupancy) to mRNA levels significantly changes upon infection, suggesting that the stability of those transcripts is altered during infection. Our preliminary result reveal transcript stability of different classes of genes with specific functions for instance, genes involved ribosome biogenesis, amino acid metabolism become unstable after C. glabrata enters the macrophage host, suggesting coordinated stability of related transcripts is a mechanism used by cells to adapt to changing environments. Our method provides a convenient means to determine mRNA stability in any organism for better understanding of gene expression regulation under given environmental condition.

12:20 PM-12:30 PM
High-throughput single-cell transcriptomics profiling interneuron specification during brain development
Room: Grand Ballroom B
  • Vanessa Aguiar-Pulido, Cornell University, United States
  • Anna Katharina Schlusche, Cornell University, United States
  • Wenying Angela Liu, Memorial Sloan Kettering Cancer Center, United States
  • Shawn Singh, Cornell University, United States
  • Songhai Shi, Memorial Sloan Kettering Cancer Center, United States
  • Margaret Elizabeth Ross, Cornell University, United States

Presentation Overview: Show

Interneurons represent ~20% of the neuron population in the adult brain and are crucial for its correct operation. Defects in this population have been associated with a variety of neurological and psychiatric disorders. In this work, single-cell RNA sequencing was used to study cell diversity in the germinal region of the two largest GABAergic interneuron populations (Parvalbumin, PV+, and Somatostatin, SST+). We investigated two genetically modified mice with altered expression of known regulators of cell cycle dynamics: cyclin D2 knockout (cD2 KO) and partition defective 3 conditional knockout (Pard3 cKO), which are altering PV vs. SST interneuron ratios by influencing cell fate determination of mitotic progenitors in the medial ganglionic eminence (MGE). The MGE was dissected from the different genotypes at the peak of neurogenesis (mouse embryonic day 13.5) and the transcriptome of 16,293 cells was sequenced employing the 10X Genomics Chromium platform. The cD2 KO and Pard3 cKO animals showed missing clusters (in comparison with wild type), which may represent a shift in the peak generation of PV+ vs. SST+ interneurons. Moreover, data are consistent with a previously proposed bias for SST+ cells to arise from the dorsal MGE while PV+ interneurons originate in the ventral MGE.

12:30 PM-12:40 PM
TurboFold II: RNA Structural Alignment and Secondary Structure Prediction Informed by Multiple Homologs
Room: Grand Ballroom B
  • Zhen Tan, University of Rochester, United States
  • Gaurav Sharma, University of Rochester, United States
  • David Mathews, University of Rochester, United States

Presentation Overview: Show

With increasing number of non-coding RNA families being identified, there is strong interest in developing computational methods to estimate sequence alignment and secondary structure.
I developed TurboFold II, an algorithm that takes multiple, unaligned homologous RNA sequences, and outputs the predicted secondary structures and the structural alignment of the sequences. Secondary structure conservation information is incorporated into the alignment by using a match score, calculated from estimated base pairing probabilities to represent the secondary structural similarity between nucleotide positions in the two sequences. TurboFold II computes a multiple sequence alignment, based on a probabilistic consistency transformation and a hierarchically computed guide tree. The TurboFold II algorithm is modified for prediction of RNA secondary structures to utilize base pairing probabilities guided by SHAPE experimental data. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone.
To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:20 PM
Proceedings Presentation: aliFreeFold: an alignment-free approach to predict secondary structure from homologous RNA sequences
Room: Grand Ballroom B
  • Jean-Pierre Glouzon, University of Sherbrooke, Canada
  • Aida Ouangraoua, Université de Sherbrooke, Canada

Presentation Overview: Show

Motivation: Predicting the conserved secondary structure of homologous ribonucleic acid (RNA) sequences is crucial for understanding RNA functions. However, fast and accurate RNA structure prediction is challenging, especially when the number and the divergence of homologous RNA increases. To address this challenge, we propose aliFreeFold, based on a novel alignment-free approach which computes a representative structure from a set of homologous RNA sequences using suboptimal secondary structures generated for each sequence. It is based on a vector representation of suboptimal structures capturing structure conservation signals by weighting structural motifs according to their conservation across the suboptimal structures.

Results: We demonstrate that aliFreeFold provides a good balance between speed and accuracy regarding predictions of representative structures for sets of homologous RNA compared to traditional methods based on sequence and structure alignment. We show that aliFreeFold is capable of uncovering conserved structural features fastly and effectively thanks to its weighting scheme that gives more (resp. less) importance to common (resp. uncommon) structural motifs. The weighting scheme is also shown to be capable of capturing conservation signal as the number of homologous RNA increases. These results demonstrate the ability of aliFreefold to efficiently and accurately provide interesting structural representatives of RNA families.

Availability: aliFreeFold was implemented in C++. Source code and Linux binary are freely available at https://github.com/UdeS-CoBIUS/aliFreeFold.

2:20 PM-2:40 PM
LinearFold: Linear-Time Prediction of RNA Secondary Structures
Room: Grand Ballroom B
  • Dezhong Deng, School of EECS, Oregon State University, United States
  • Kai Zhao, Google, United States
  • David Hendrix, Oregon State University, United States
  • David Mathews, University of Rochester, United States
  • Liang Huang, School of EECS, Oregon State University, United States

Presentation Overview: Show

Predicting the secondary structure of an RNA sequence with speed and accuracy is useful in many applications such as drug design. The state-of-the-art predictors have a fundamental limitation: they have a runtime that scales cubically with the length of the input sequence, which is slow for longer RNAs and limits the use of secondary structure prediction in genome-wide applications. To ad- dress this bottleneck, we designed the first linear-time algorithm for this problem. which can be used with both thermodynamic and machine-learned scoring functions. Our algorithm, like previous work, is based on dynamic programming (DP), but with two crucial differences: (a) we incrementally process the sequence in a left-to-right rather than in a bottom-up fashion, and (b) because of this incremental processing, we can further employ beam search pruning to ensure linear runtime in practice (with the cost of exact search). Even though our search is approximate, surprisingly, it results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart).

2:40 PM-3:00 PM
Rfam: The transition to a genome-centric sequence database
Room: Grand Ballroom B
  • Ioanna Kalvari, EMBL-EBI, United Kingdom
  • Joanna Argasinska, EMBL-EBI, United Kingdom
  • Eric Nawrocki, NCBI, United States
  • Anton Petrov, EMBL-EBI, United Kingdom
  • Rob Finn, EMBL-EBI, United Kingdom
  • Alex Bateman, EMBL-EBI, United Kingdom

Presentation Overview: Show

Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Rfam currently contains 2,772 families and continues to grow.

Starting with release 13.0, Rfam switched to a new genome-based sequence database, which currently includes a non-redundant set of over 14,000 reference genomes identified by UniProt. The new database is more scalable and gives a more accurate view of the distribution of Rfam entries. Using complete genomes enables meaningful taxonomic comparisons and identification of a repertoire of RNA families found in a certain species.

The text search functionality of the Rfam website was significantly improved. Users can now more easily search Rfam with the new and more powerful faceted text search. For example, it is possible to explore RNA families or ncRNAs in any annotated genome and compare annotations across genomes.

The transition of Rfam to a genome-centric sequence database and the new website features make Rfam a more valuable resource for the sequence analysis community. Rfam is available at http://rfam.org.

3:00 PM-3:20 PM
Enhanced prediction of CRISPR-Cas9 off-targets through modeling of nucleic acid duplex interactions
Room: Grand Ballroom B
  • Ferhat Alkan, University of Copenhagen, Denmark
  • Anne Wenzel, University of Copenhagen, Denmark
  • Jakob Hull Havgaard, University of Copenhagen, Denmark
  • Jan Gorodkin, University of Copenhagen, Denmark

Presentation Overview: Show

CRISPR-Cas9 is a widely used tool for genome editing, however off-target binding of the guide RNA (gRNA) can lead to unintended editing. Available computational methods score off-targets, defined as the complement of the spacer in the gRNA, allowing for a maximum number of mismatches. Even though these methods include machine learning strategies, none of them make use of the full nucleotide duplex energies including gRNA-DNA interactions. In contrast, we exploit these interactions in an approximate free energy model for Cas9-gRNA-DNA complex that includes parameters for RNA-DNA interactions, intramolecular interactions of the gRNA, and the opening energy of the DNA-DNA region bound to the spacer. Each potential off-target is scored by combining the different energy contributions. Benchmarking this CRISPRoff score we obtain significantly higher performance over the existing methods. We also score the individual gRNA’s specificity on a genome-wide scale by considering the Boltzmann-weighted ensemble of all off-targets. We find that when this score, CRISPRspec, is high, it agrees with low read coverage in off-target regions and vice versa, using published data. Comparing the CRISPRoff scoring to other available methods’ scoring for off-target activity, CRISPRoff correlates much stronger to these.

3:20 PM-3:40 PM
RNA atlas: a nucleotide resolution map of the human transcriptome
Room: Grand Ballroom B
  • Lucia Lorenzi, Ghent University, Belgium
  • Francisco Avila Cobos, Ghent University, Belgium
  • Robrecht Cannoodt, Ghent University, Belgium
  • Hua-Sheng Chiu, Texas Children’s Cancer Center, Baylor College of Medicine, United States
  • Tine Goovaerts, Ghent University, Belgium
  • Stephen Gross, Illumina, San Diego, California, United States
  • Tom Taghon, Ghent University, Belgium
  • Karim Vermaelen, Ghent University, Belgium
  • Ken Bracke, Ghent University, Belgium
  • Yvan Saeys, Ghent University, Belgium
  • Jeroen Galle, Ghent University, Belgium
  • Tim De Meyer, Ghent University, Belgium
  • Pieter-Jan Volders, Ghent University, Belgium
  • Thomas Hansen, Aarhus University, Denmark
  • Jørgen Kjems, Aarhus University, Denmark
  • Pavel Sumazin, Texas Children’s Cancer Center, Baylor College of Medicine, United States
  • Gary Schroth, Illumina, San Diego, California, United States
  • Jo Vandesompele, Ghent University, Belgium
  • Pieter Mestdagh, Ghent University, Belgium

Presentation Overview: Show

Technological advances in RNA expression profiling methods revealed that our genome is pervasively transcribed, producing an unexpectedly complex transcriptome consisting of various classes of RNA molecules and a huge isoform diversity. Many of these RNAs show high tissue specificity, with some being expressed in only one or few cell types. While numerous large-scale RNA-sequencing studies have been performed, samples involved are often complex tissues, masking transcripts expressed in low-frequent cell populations, and sequencing methods typically focus on one class of RNA transcripts. By applying complementary RNA sequencing methods (total RNA, poly-A RNA and small-RNA sequencing) across an extensive cohort of 300 human samples, we captured a wide variety of human transcripts, including protein coding genes, miRNAs, circular RNAs and long non-coding RNAs, a large fraction of which were previously unknown. We found that many non-coding RNAs show variable polyadenylation status across samples. We also compared cell-type specificity between different RNA species. Our results confirm the dynamic nature of the transcriptome, with many RNAs being expressed in only a limited number of cell-types. RNA atlas constitutes a unique resource for further studies on the function, organization and regulation of the different layers of the human transcriptome.

3:40 PM-4:00 PM
Uncovering new non-coding RNA genes in human with TGIRT-Seq.
Room: Grand Ballroom B
  • Vincent Boivin, Université de Sherbrooke, Canada
  • Olivier Boisvert, Université de Sherbrooke, Canada
  • Sonia Couture, Université de Sherbrooke, Canada
  • Sherif Abou Elela, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada

Presentation Overview: Show

High-throughput sequencing methods such as RNA-Seq have offered us a way to reveal the transcriptomic landscape’s complexity. However, it has become increasingly obvious that classical RNA-Seq poorly detects highly structured RNAs, which has contributed to their poor characterization.

Our previous studies led us to favor the TGIRT-Seq method which substitutes the retroviral reverse transcriptase for a Thermostable Group II Intron Reverse Transcriptase (TGIRT). TGIRT-Seq allows to detect highly structured RNAs such as snoRNA and tRNA in their correct biological abundance. We present here the discovery of hundreds of non-annotated non-coding RNA genes that are only found in these TGIRT-Seq datasets. We show that many of these novel genes share high sequence and structure similarity with known RNAs such as snoRNAs and tRNAs. Comparisons with RNA polymerase III ChIP datasets and ddPCR following the depletion of specific RNA-binding proteins validate that many of these genes are, indeed, actively transcribed, and give indications of their functions.

Understanding the function of these genes is a challenge as more than a third show no similarity with known genes. Nevertheless, it is clear that much remains to be understood about highly-structured RNA and that the endeavor of gene annotation in human is not over yet.

4:00 PM-4:40 PM
Coffee Break
4:40 PM-5:00 PM
Proceedings Presentation: Convolutional neural networks for classification of alignments of non-coding RNA sequences
Room: Grand Ballroom B
  • Genta Aoki, Keio University, Japan
  • Yasubumi Sakakibara, Keio University, Japan

Presentation Overview: Show

Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites.
Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.

5:00 PM-5:20 PM
RNA G-quadruplex prediction to investigate a novel RNA regulation model.
Room: Grand Ballroom B
  • Jean-Michel Garant, Université de Sherbrooke, Canada
  • Rachel Jodoin, Université de Sherbrooke, Canada
  • Michelle Scott, Université de Sherbrooke, Canada
  • Jean-Pierre Perreault, Université de Sherbrooke, Canada

Presentation Overview: Show

G-quadruplexes (G4) are tetra-helices formed by the stacking of planar guanine tetrads. Their folding in RNA molecules were shown to affect mRNA post-transcriptional regulation and miRNA biogenesis. However, there are not enough data available to draw conclusions on the biological functions associated with RNA G4. The G4RNA tools were developed as a first step to address the issue. The G4RNA database is a reference support and a source of curated data for comparative analysis which was used to train an artificial neural network (G4NN). This approach allows the prediction of unusual observed G4 that cannot be predicted by classical motif searches. G4NN provides good classification performances and was thoroughly described during its optimization. It was validated using a set of high-throughput detected G4 occurrences and was also shown to be very efficient at discarding randomly selected sequences from the transcriptome. G4NN is integrated in G4RNA screener which scans RNA sequences to find favorable G4 folding conditions. G4RNA screener is used to identify and characterize sub-populations of G4 structures which act as shared features of regulation common to groups of RNA molecules. Its predictions have been challenged experimentally producing a G4 based structural sub-categorization that relates to colorectal cancer pathways.

5:20 PM-6:00 PM
Exploring the nascent transcriptome with direct RNA nanopore sequencing
Room: Grand Ballroom B

Presentation Overview: Show

Alternative splicing of RNA transcripts has a vital role in generating extensive protein diversity across cell types. However, this process frequently goes awry in diseases ranging from cancer to neurological disorders. Most splicing reactions occur co-transcriptionally, while nascent RNA is attached to chromatin. The coupling of transcription and splicing provides an opportunity for regulating alternative splicing decisions. However, our understanding of how these processes are coupled is hindered by the lack of approaches that directly monitor the co-transcriptional nature of splicing in vivo. Here we use long-read sequencing of nascent RNA to directly probe the relationship between transcription and splicing in human cells. This approach is an expansion of our recently developed technology, native elongating transcript sequencing (NET-seq), that exploits the extraordinary stability of the DNA-RNA-RNA polymerase ternary complex to capture nascent transcripts directly from live cells. The identity the 3’ end of purified nascent RNAs is revealed by sequencing, providing a quantitative measure of RNA polymerase II position within genes. Direct RNA nanopore sequencing with the Oxford Nanopore MinION direct detects long RNA strands threaded through a protein nanopore. By combining the nascent RNA purification of NET-seq with long-read direct RNA nanopore sequencing, we directly measure the coupling between transcription and RNA splicing in vivo. I will discuss our analysis of these data and what they tell us about the dynamics of splicing.

Sunday, July 8th
10:15 AM-10:20 AM
RNA: Introduction
Room: Grand Ballroom B
10:20 AM-11:00 AM
Large-scale studies of RNA binding proteins by eCLIP and proximity labeling
Room: Grand Ballroom B
  • Gene Yeo, UCSD University, United States

Presentation Overview: Show

I will present our lab’s efforts in mapping hundreds of RNA binding proteins using large-scale eCLIP and identification of RNA binding proteins in RNA granules using proximity labeling followed by mass spec.

11:00 AM-11:20 AM
RADAR: Annotation and prioritization of variants in the post-transcriptional regulome for RNA-binding proteins
Room: Grand Ballroom B
  • Jing Zhang, Yale University, United States
  • Jason Liu, Yale University, United States
  • Donghoon Lee, Yale University, United States
  • Lucas Lochovsky, Yale University, United States
  • Jo-Jo Feng, Yale University, United States
  • Shaoke Lou, Yale University, United States
  • Michael Rutenberg-Schoenberg, Yale University, United States
  • Mark Gerstein, Yale University, United States

Presentation Overview: Show

Numerous RNA binding proteins (RBP) interact with single or double-stranded RNAs to play critical post-transcriptional roles and their dysregulation has been reported to cause numerous diseases. Their binding sites cover more nucleotides than coding exons and contain novel regions on the genome not reported by transcription-based annotations, but most current methods for variant prioritization ignore such RBP-mediated regulatory regions. In this study, we aimed to construct a comprehensive RBP regulome and a scoring framework to annotate and prioritize variants within it. We collected the full catalog of 318 eCLIP (for 112 RBPs), 76 Bind-n-Seq, and 472 RNA-Seq experiments after RBP knockdown from ENCODE to construct a comprehensive post-transcriptional regulome. We propose a variant impact scoring framework RADAR, which uses conservation, structure, network, and motif information to provide a baseline impact score. Then RADAR further incorporates user-specific inputs to highlight disease-specific variants. Results on somatic and germline variants demonstrate that RADAR can successfully pinpoint deleterious variants, such as splicing-disruptive ones that cannot be highlighted by current prioritization methods.

11:20 AM-11:40 AM
Identification of RNA-Binding Protein Targets with HyperTRIBE
Room: Grand Ballroom B
  • Reazur Rahman, Brandeis University, United States
  • Weijin Xu, Brandeis University, United States
  • Michael Rosbash, Brandeis University, United States

Presentation Overview: Show

RNA binding proteins (RBPs) accompany RNA from birth to death, affecting RNA biogenesis and functions. Identifying RBP-RNA interactions is essential to understand their complex roles in different cellular processes. However, detecting in vivo RNA targets of RBPs, especially in a small number of discrete cells, has been a technically challenging task. We have previously developed a novel technique called TRIBE (Targets of RNA-binding proteins Identified By Editing) to overcome this problem. TRIBE expresses a fusion protein consisting of a queried RBP and the catalytic domain from RNA editing enzyme ADAR (ADARcd), which marks target RNA transcripts by converting adenosine to inosine near the RBP binding sites. These marks can be subsequently identified via high-throughput sequencing. In spite of its usefulness, TRIBE is constrained by a low editing efficiency and editing-sequence bias from the ADARcd. So, we developed HyperTRIBE by incorporating a previously characterized hyperactive mutation, E488Q, into the ADARcd. This strategy increases the editing efficiency and reduce sequence bias, which dramatically increased sensitivity of this technique without sacrificing specificity. HyperTRIBE provides a more powerful strategy to identify RNA targets of RBPs with an easy experimental and computational protocol at low cost in both flies and mammals.

11:40 AM-12:00 PM
RPIDisorder: A machine learning method for improved prediction of RNA-Protein interaction partners
Room: Grand Ballroom B
  • Carla Mann, Iowa State University, United States
  • Drena Dobbs, Iowa State University, United States

Presentation Overview: Show

RNA-protein interactions are implicated in a wide range of critical regulatory and structural roles whose disruption can lead to numerous diseases. Computational methods for predicting RNA-protein interaction partners (RPIPs) are valuable because experimentally characterizing these interactions is time-consuming and expensive. Published prediction methods utilize various sequence and structural features, but are generally limited by high false positive rates (FPRs) and/or query sequence length. Because intrinsically disordered regions (IDRs) are abundant in RNA-binding sites of proteins, we hypothesized that incorporating IDR information with sequence features could improve prediction of RPIPs.

We developed a new random forest machine learning classifier, RPIDisorder, which requires only primary sequences of potential RNA and protein interaction partners as input. RPIDisorder outperformed our published classifier, RPISeq, on an independent test set of 11,281 RPIPs and 971 non-interacting pairs, with MCC 0.68 (vs 0.47) and FPR 21% (vs 55%).

In a case study, RPIDisorder was used to identify RNAs bound to the Fragile-X Mental Retardation Protein (FMRP). On a test set of 30 RNAs (14 binding and 16 non-binding ncRNAs), RPIDisorder achieved an MCC of 0.73 and FPR 6.3%.

These results indicate that incorporating IDR information can improve the reliability of RNA-protein partner prediction over sequence composition alone.

12:00 PM-12:10 PM
Using co-expression networks and predictive models to infer circular RNA regulatory function in colitis models
Room: Grand Ballroom B
  • Bojan Losic, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Nicholas Akers, Icahn Institute for Muliscale Biology, United States
  • Carmen Argmann, Icahn Institute for Muliscale Biology, United States
  • Lauren Peters, Icahn Institute for Muliscale Biology, United States
  • Josh Friedman, Janssen Research, United States
  • Sergio Lira, Icahn Institute for Muliscale Biology, United States
  • X Huabao, Icahn Institute for Muliscale Biology, United States
  • Eric Schadt, Icahn Institute for Muliscale Biology, United States

Presentation Overview: Show

Circular RNAs (cRNA) are increasingly being recognized as an important class of noncoding RNA that are pervasively expressed in a variety of eukaryotes, display significant conservation across mammals, and are coherently expressed independently of their cognate linear isoforms. Their functional role and biogenesis remains largely unknown.

Here we studied the role of cRNA in animal models of the onset of inflammatory bowel disease IBD. Leveraging ribosomal-depleted RNA sequencing obtained on 403 C57/B6 mice in longitudinal Dextran sulfate sodium (DSS) and adoptive T-cell transfer models, we compared the predictive power of cRNA and mRNA expression signatures to predict disease severity and evolution, jointly modeled cRNA and mRNA via co-expression networks to identify key drivers of colitis development, and finally detected cognate linear and circular RNA that displayed evidence of regulating different phenotypes to infer cRNA function.

We found that cRNA signatures derived from blood rival the predictive power of mRNA signatures in tissue in predicting colitis disease severity, and furthermore that co-expression networks identify cRNA disease drivers and suggest that scalable functional cRNA screening is facilitated by identifying differential cognate cRNA/mRNA phenotype association.

12:10 PM-12:20 PM
New tools for RNA epigenetics: an open-source approach to RNA modification analysis
Room: Grand Ballroom B
  • Samuel Wein, University of Pennsylvania, United States
  • Byron Andrews, STORM Therapeutics Limited, United Kingdom
  • Timo Sachsenberg, University of Tübingen, Germany
  • Helena Santos-Rosa, University of Cambridge, United Kingdom
  • Tony Kouzarides, University of Cambridge, United Kingdom
  • Benjamin Garcia, University of Pennsylvania, United States
  • Hendrik Weisser, STORM Therapeutics Limited, United Kingdom

Presentation Overview: Show

The importance of chemical modifications of RNA sequences in different biological contexts is being increasingly appreciated, giving rise to the field of RNA epigenetics. A pivotal challenge in this area is the identification of modified RNA residues within their sequence contexts. Mass spectrometry would offer a solution by using approaches analogous to shotgun proteomics. However, software support for the necessary data analyses is currently lacking. In particular, search engines that match tandem mass spectra to theoretical spectra derived from sequence databases are required. We present a database search engine for RNA sequences, developed in C++ within the OpenMS framework for computational mass spectrometry. We implemented classes representing endonucleases, (modified) ribonucleotides, RNA sequences, and a corresponding generator for theoretical spectra. We integrated modification data from the MODOMICS database and developed an output format for RNA identification results based on the proteomics standard mzTab. Finally, we added visualisation capabilities for these results to OpenMS’ viewer application. Our search engine supports the estimation of false discovery rates (FDR) based on target-decoy search strategies. We evaluated the performance of our software based on two benchmark samples, containing modified and unmodified versions of in vitro transcribed and chemically synthesised RNA, respectively, with promising initial results.

12:20 PM-12:30 PM
Novel Insights into Gene Expression Regulation during Meiosis Revealed by Translation Elongation Dynamics
Room: Grand Ballroom B
  • Renana Sabi, Tel Aviv University, Israel
  • Tamir Tuller, Tel Aviv University, Israel

Presentation Overview: Show

Numerous studies have demonstrated the critical role of translational control in the dynamic regulation of protein synthesis. However, most of them suggested that the elongation phase is not regulated in a condition-specific manner and is rather 'static'.
Here, we employ novel computational approaches applied to ribosome profiling data to estimate for the first time the distinct changes in translation elongation and initiation at multiple time points during yeast meiosis. We show that codon decoding rates and thus mRNAs elongation rates change dynamically and substantially during meiosis to facilitate the translation of transcripts whose proteins are required at specific time points.
Our approach captured a unique elongation pattern at the onset of anaphase II that was invisible to previous translational analyses. Particularly, we identified a large cluster of lowly expressed genes involved in sister chromatid segregation that showed a strong temporal shift toward increased elongation efficiency precisely when these processes occurred. Also at this time point, the elongation of the ribosomal proteins is decreased but their initiation is maintained to promote the translation of these anaphase II genes. Our analysis provides new insights into gene expression regulation during meiosis and demonstrates a functional role of translation elongation dynamics.

12:30 PM-12:40 PM
Ushering in a new era of CLIP-guided detection of miRNA targets
Room: Grand Ballroom B
  • Maria D Paraskevopoulou, University of Thessaly, Greece
  • Dimitra Karagkouni, University of Thessaly, Hellenic Pasteur Institute, Greece
  • Ioannis S Vlachos, University of Thessaly, Greece
  • Spyros Tastsoglou, University of Thessaly, Hellenic Pasteur Institute, Greece
  • Artemis G Hatzigeorgiou, University of Thessaly, Hellenic Pasteur Institute, Greece

Presentation Overview: Show

AGO-PAR-CLIP is considered one of the most powerful high-throughput methodologies for miRNA target identification. Until today, PAR-CLIP experiments have been performed in numerous tissues and cell types from physiological or pathological conditions. Current AGO-CLIP-guided implementations present limitations that undermine the central position of these experiments in the characterization of miRNA targetome. They depend strongly upon the T-to-C conversions to define miRNA bindings, while the efficacy of neglected interactions remains unknown.
By analyzing miRNA perturbation experiments and structural sequencing data we showed that the previously neglected non-T-to-C clusters exhibit functional miRNA binding events and strong accessibility.
Our findings are integrated in microCLIP, an innovative in silico framework based on deep structured learning for CLIP-Seq-guided detection of miRNA interactions. microCLIP was trained and evaluated against a compendium of miRNA binding sites deduced by numerous low-yield techniques and the analysis of more than 200 high-throughput experiments. Contrary to existing implementations, microCLIP operates on every AGO-enriched cluster. The proper incorporation of non-T-to-C clusters yields an average 14% increase in miRNA-target interactions per PAR-CLIP library, uncovering previously elusive regulatory events.
microCLIP framework robustly identifies 1.6-fold more validated binding sites compared to state-of-the-art algorithms, ushering in a new era of experimentally supported miRNA target annotation.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:10 PM
Generating full-length, high quality human transcriptomes from PacBio Iso-seq data
Room: Grand Ballroom B
  • Dana Wyman, University of California, Irvine, Center for Complex Biological Systems, Irvine, CA, United States
  • Gabriela Balderrama-Gutierrez, University of California, Irvine, United States
  • Shan Jiang, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States
  • Weihua Zeng, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States
  • Brian Williams, California Institute of Technology, United States
  • Barbara Wold, California Institute of Technology, United States
  • Ali Mortazavi, University of California, Irvine, Department of Developmental and Cell Biology, Irvine, CA, United States

Presentation Overview: Show

Conventional short-read RNA sequencing has been widely used to quantify gene expression in a variety of applications. However, short reads on their own lack the ability to resolve full-length isoforms, which can be several kilobases in length. Furthermore, computational methods developed to reconstruct isoforms from short read data are plagued by challenges, and results from different algorithms tend to be inconsistent. While long read sequencing technologies such as PacBio Iso-seq and Oxford Nanopore have a higher error rate than Illumina sequencing, they have great potential for isoform discovery and characterization of the 90% of multi-exon human genes that are thought to undergo alternative splicing. To take advantage of these properties, we develop a computational pipeline to process long reads into cleaned isoforms and generate a high-quality, full-length transcriptome. We demonstrate this process on PacBio Iso-seq data from human cell lines K562, GM12878, and HepG2 and show that the technology is mature enough to produce full-length transcriptomes by comparing the results to existing ENCODE data.

2:10 PM-2:20 PM
Long noncoding RNA (lncRNA)-Protein coding gene (PCG) regulatory networks responsive to diverse xenobiotics in rat liver
Room: Grand Ballroom B
  • Kritika Karri, Boston University, United States
  • David J Waxman, Boston University, United States

Presentation Overview: Show

The role of lncRNAs in the extensive genomic and epigenetic responses of mammalian liver to xenobiotic exposure remains elusive. Here, we analyzed 115 liver RNA-seq data sets from male rats exposed to 27 chemicals representing diverse mechanisms of action, ranging from activation of nuclear receptors to induction of DNA damage, to assemble the long non-coding transcriptome. We characterized gene structures and response patterns for 5798 rat liver lncRNAs, of which 1447 were differentially expressed by xenobiotic exposure. Remarkably, 280 of these lncRNAs responded to >10 of the 27 xenobiotics. In most cases, chemicals with common mode of action clustered tightly based on gene expression pattern. Weighted Correlation Network Analysis (WGCNA) identified lncRNA- PCG regulatory modules enriched for specific biological functions, and revealed putative regulatory lncRNAs occupying key points (hubs) in co-expression networks with genes involved in liver metabolism and hepatoxicity. These putative lncRNA regulators showed strong co-expression patterns with local (cis effect) and distal PCGs (trans effect). Many of these PCGs belonged to Cyp and Sult family of genes with known involvement in xenobiotic metabolism. Our findings will guide further mechanistic research on the roles of these lncRNAs in the hepatotoxicity or detoxification responses to diverse chemical exposures.

2:20 PM-2:40 PM
A deep long-read sequencing technology reveals coordination of distant exons on RNA molecules to be widespread.
Room: Grand Ballroom B
  • Tilgner Hagen, Cornell University, United States

Presentation Overview: Show

Understanding transcriptome complexity is crucial in human biology and disease, but long-read sequencing of 10-100 million isoforms is still infeasible. Our droplet-based method, sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules, enabling analysis of 10-100 million RNAs. SpISO-seq requires <=1ng input cDNA, limiting the need for prior amplification. Adjusting the number of reads devoted to each molecule reduces sequencing lanes and cost, with little loss in detection power. In addition to confirming our previously published cases of splicing coordination (e.g. BIN1), the greater depth reveals many new cases such as MAPT – a gene crucial in neurodegenerative disease. Coordination of internal exons is found to be extensive among protein coding genes: 23.5%-59.3% (95% CI) of highly expressed genes with distant alternative exons exhibit coordination, showcasing the need for long-read transcriptomics. However, coordination is less frequent for non-coding sequences suggesting a larger role of splicing coordination in shaping proteins. We also find new splicing coordination types, involving initial and terminal exons1.
I will furthermore discuss so-far unpublished data describing (a) isoform expression in specific cell types of the central nervous system and (b) the coordination of non-splicing RNA variables.

1. Tilgner et al. Genome Res. Epub 2017 Dec 1.

2:40 PM-3:00 PM
Accurate assembly of transcripts through phase-preserving graph decomposition
Room: Grand Ballroom B
  • Mingfu Shao, Carnegie Mellon University, United States
  • Carl Kingsford, Carnegie Mellon University, United States

Presentation Overview: Show

We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.

3:00 PM-3:20 PM
Proceedings Presentation: Dissecting newly transcribed and old RNA using GRAND-SLAM
Room: Grand Ballroom B
  • Christopher Jürges, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany
  • Florian Erhard, Institut für Virologie und Immunbiologie, Julius-Maximilians-Universität Würzburg, Germany

Presentation Overview: Show

Global quantification of total RNA is used to investigate steady state levels of gene expression. However, being able to differentiate pre-existing RNA (that has been synthesized prior to a defined point in time) and newly transcribed RNA can provide invaluable information e.g. to estimate RNA half-lives or identify fast and complex regulatory processes. Recently, new techniques based on metabolic labeling and RNA-seq have emerged that allow to quantify new and old RNA: Nucleoside analogs are incorporated into newly transcribed RNA and are made detectable as point mutations in mapped reads. However, relatively infrequent incorporation events and significant sequencing error rates make the differentiation between old and new RNA a highly challenging task.
We developed a statistical approach termed GRAND-SLAM that, for the first time, allows to estimate the proportion of old and new RNA in such an experiment. Uncertainty in the estimates is quantified in a Bayesian framework. Simulation experiments show our approach to be unbiased and highly accurate. Furthermore, we analyze how uncertainty in the proportion translates into uncertainty in estimating RNA half-lives and give guidelines for planning experiments. Finally, we demonstrate that our estimates of RNA half-lives compare favorably to other experimental approaches and that biological processes affecting RNA half-lives can be investigated with greater power than offered by other methods.
Availability: GRAND-SLAM is available under the Apache 2.0 license at http://software.erhard-lab.de; R scripts to generate all figures are available at zenodo (doi:10.5281/zenodo.1162340)

3:20 PM-3:40 PM
Gene isoform abundance quantification with third generation transcriptome sequencing
Room: Grand Ballroom B
  • Andrew Thurman, University of Iowa, United States
  • Yue Zhao, University of Iowa, United States
  • Haomin Li, University of Iowa, United States
  • Kin Fai Au, University of Iowa, United States

Presentation Overview: Show

Third generation sequencing platforms produce reads from DNA molecules with much larger read lengths than second generation sequencing platforms but with lower throughput. For transcriptome sequencing, long reads have been used to construct full-length gene isoforms, while higher throughput short reads have remained popular for quantifying isoform abundance. PacBio long read RNA sequencing also requires a size selection step to alleviate bias due to sequencer preference for shorter molecules. Here, we have developed a method for gene isoform abundance quantification using long reads that allows for ambiguity in assignment of reads to isoforms and accounts for sampling bias due to isoform length. We conducted numerical studies to understand situations where bias correction is necessary and analyzed statistical properties of our method. We also analyzed short reads and long reads simulated from the human transcriptome to understand how read length, number of reads, and repetitive regions of the genome impact abundance quantification. Further, to evaluate the adequacy of our method to improve quantification, we compared our method to a standard quantification approach on three long read data sets.

3:40 PM-4:00 PM
Revealing the hidden transcriptome: Analysis of nonsense-mediated mRNA decay target reveals mechanistic insights
Room: Grand Ballroom B
  • Courtney French, University of Cambridge, United Kingdom
  • James Lloyd, University of Western Australia, Australia
  • Gang Wei, Fudan University, China
  • Thomas Gallagher, The Ohio State University, United States
  • Darwin Dichmann, Intrexon Corporation, United States
  • Maki Inada, Ithaca College, United States
  • Sharon Amacher, The Ohio State University, United States
  • Richard Harland, University of California, Berkeley, United States
  • Steven E. Brenner, University of California, Berkeley, United States

Presentation Overview: Show

The nonsense mediated mRNA decay (NMD) pathway prevents accumulation of transcripts with premature termination codons and regulates gene expression when coupled with alternative splicing. The “50nt rule” is the prevailing model for how premature termination codons are defined and requires a splice junction downstream of the stop codon. It is also proposed that a longer 3' UTR triggers NMD in some species.

To explore the features of NMD-targeted transcripts in human cells, we analyzed RNA-Seq data from a polysome fractionation experiment (TrIP-Seq). This method interrogates a physiologically normal transcriptome. We find that transcripts with premature termination codons are enriched in the monosome fraction relative to transcripts with normal stop codons. Transcripts with longer 3’UTRs are not enriched in the monosome fraction.

To explore the features of NMD-targeted transcripts in several eukaryotes (human, mouse, frog, zebrafish, fly, S. pombe, and Arabidopsis), we inhibited NMD and identified transcripts with increased expression. We found that the 50nt rule is a strong predictor of NMD degradation in human cells and plays a role in most of the other species tested. In contrast, we found little to no correlation between the likelihood of degradation by NMD and 3' UTR length in any of the species.

4:00 PM-4:40 PM
Coffee Break
4:40 PM-5:00 PM
Stage-specific mRNA regulatory programs drive mammalian gametogenesis
Room: Grand Ballroom B
  • Leah Zagore, Case Western Reserve University, United States
  • Molly Hannigan, Case Western Reserve University, United States
  • Donny Licatalosi, Case Western Reserve University, United States

Presentation Overview: Show

Male germ cells progress through a highly ordered series of developmental transitions to produce motile cells which transmit DNA from parent to offspring. This developmental program has long been known to be associated with extensive RNA regulation, including high levels of stage-specific alternative mRNA processing, translation, and decay. However, we have a limited understanding of how these RNA regulatory events are controlled and impact germ cell fate at different stages of development. Using transgenic mouse models, gene knockouts of different RNA binding proteins (RBPs), flow cytometry, and multiple deep-sequencing tools (RNA-Seq, PolyA-Seq, and CLIP), we have uncovered two distinct stage-specific RNA regulatory programs that are critical for mammalian germ cell survival. The first, controlled by the RBP Dazl, sustains germ cell proliferation via a network of polyA-proximal interactions that ensure high levels of otherwise unstable mRNAs. The second, an alternative splicing program controlled the RBP Ptbp2, is required for cell-cell crosstalk and protects germ cells from prematurely detaching from somatic cells after meiosis. Collectively, these studies define two distinct and essential mRNA regulatory programs that control different stages of germ cell development in mammals.

5:00 PM-5:20 PM
Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci
Room: Grand Ballroom B
  • Namshik Han, University of Cambridge, United Kingdom
  • Tony Kouzarides, University of Cambridge, United Kingdom

Presentation Overview: Show

We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other’s expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers.
This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.

5:20 PM-6:00 PM
Modeling RNA-binding protein specificity using single-nucleotide-resolution binding maps: a case study of LIN28 and two subclasses of let-7 microRNAs
Room: Grand Ballroom B

Presentation Overview: Show

A majority of RNA-binding proteins (RBPs) recognize short and degenerate sequence motifs. While an expanding list of RBPs have been assayed using crosslinking and immunoprecipitation (CLIP) to map their binding sites in the transcriptome, precise modeling of their binding specificity remains a major challenge and a critical bottleneck to understanding the RNA regulatory code in development and disease. We precisely developed computational approaches to map protein-RNA interaction sites at single-nucleotide resolution by determining exact protein-RNA crosslink sites. Here, we present our recent efforts to model RBP binding specificity de novo while precisely registering protein-RNA crosslink sites in the motif simultaneously. We demonstrate the effectiveness of this approach using a case study of LIN28, a bipartite RBP that post-transcriptionally inhibits the biogenesis of let-7 microRNAs to regulate development and influence disease states. Suppression of let-7 is known to depend on a GGAG-like motif in the precursors that is recognized by the zinc knuckle domain of LIN28, although the specificity of the cold shock domain (CSD) and its function is not well understood. By leveraging single-nucleotide-resolution mapping of LIN28 binding sites in vivo, we determined that the CSD recognizes a (U)GAU motif. Surprisingly, this motif partitions the let-7 microRNAs into two subclasses, precursors with both CSD and ZKD binding sites (CSD+) and precursors with ZKD but no CSD binding sites (CSD-). LIN28 in vivo recognition—and subsequent degradation—of CSD+ precursors is more efficient, leading to their stronger suppression in LIN28-activated cells and cancers. This case study demonstrates that precise modeling of RBP binding specificity can be instrumental for new biological discoveries.