A novel NLP-based RBP binding motif and context discovery method using multiple-instance learning
Confirmed Presenter: Shaimae Elhajjajy, University of Massachusetts Chan Medical School, United States
Track: iRNA
Room: 519
Format: In Person
Moderator(s): Jérôme Waldispühl
Authors List: Show
- Shaimae Elhajjajy, Shaimae Elhajjajy, University of Massachusetts Chan Medical School
- Zhiping Weng, Zhiping Weng, University of Massachusetts Chan Medical School
Presentation Overview:Show
RNA-binding proteins (RBPs) are the primary mediators of mRNA regulation, dynamically governing complex processes such as splicing, cleavage, and degradation. Previous studies have shown that structurally diverse RBPs recognize similar motifs but can still bind distinct sites within the transcriptome. While in vitro evidence suggests that motif context plays an important role in RBP binding specificity, the precise underlying mechanisms remain unclear. Despite recent advances in machine learning models to predict RBP binding, current methods are often difficult to interpret and do not categorically investigate motif contexts. Thus, there remains a need for interpretable predictive models to disambiguate the contextual determinants of RBP binding specificity. Here, we present, to the best of our knowledge, the first formulation of the RBP binding prediction task as an NLP-based multiple-instance learning problem. We introduce a novel sequence decomposition strategy to generate entities termed “contexts”, which we use to train and test our deep learning models. We also develop a deterministic motif discovery algorithm that is fast, accurate, and specialized to handle our data structure, recapitulating the motifs of well-characterized RBPs as validation. Importantly, we discover and characterize the in vivo sequence binding contexts for a collection of RBPs. Finally, by integrating motif and context similarity measures with a cross-prediction approach, we propose novel RBP-RBP interaction partners and hypothesize whether these interactions are cooperative or competitive. In summary, we present a comprehensive computational strategy for illuminating contextual determinants of specific RBP binding and demonstrate the implications of our findings in delineating RBP function.