Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Tuesday, May 13th
14:30-15:30
Invited Presentation: Application of DNABERT for distinguishing polysemous cis-regulatory elements - predicting target regions of TF splice-variants
Format: In person

Moderator(s): Ferhat Ay


Authors List: Show

  • Ramana Davuluri
16:00-16:30
Invited Presentation: Long-read characterization of the changing isoform landscape in an aggressive subtype of pediatric leukemia
Confirmed Presenter: Ferhat Ay

Format: In Person

Moderator(s): Ramana Davuluri


Authors List: Show

  • Ferhat Ay
16:30-16:45
Soffritto: a deep-learning model for predicting high-resolution replication timing
Confirmed Presenter: Dante Bolzan, La Jolla Institute for Immunology, United States

Format: In Person

Moderator(s): Ramana Davuluri


Authors List: Show

  • Dante Bolzan, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States

Presentation Overview: Show

Motivation: Replication Timing (RT) refers to the order by which DNA loci are replicated during S phase. RT is cell-type specific and implicated in cellular processes including transcription, differentiation, and disease. RT is typically quantified genome-wide using two-fraction assays (e.g., Repli-Seq) which sort cells into early and late S phase fractions followed by DNA sequencing yielding a ratio as the RT signal. While two-fraction RT data is widely available in multiple cell lines, it is limited in its ability to capture high-resolution RT features. To address this, high-resolution Repli-Seq, which quantifies RT across 16 fractions, was developed, but it is costly and technically challenging with very limited data generated to date.
Results: Here we developed Soffritto, a deep learning model that predicts high-resolution RT data using two-fraction RT data, histone ChIP-seq data, GC content, and gene density as input. Soffritto is composed of a Long Short Term Memory (LSTM) module and a prediction module. The LSTM module learns long- and short-range interactions between genomic bins while the prediction module is composed of a fully connected layer that outputs a 16-fraction probability vector for each bin using the LSTM module’s embeddings as input. By performing both within cell line and cross cell line training and testing for five human and mouse cell lines, we show that Soffritto is able to capture experimental 16-fraction RT signals with high accuracy and the predicted signals allow detection of high-resolution RT patterns.

16:45-17:00
GOFinder AI: Rapid and Explainable Gene Ontology Term Assignment Using Large Language Models
Confirmed Presenter: Aws Almir Ahmad, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Canada

Format: Live Stream

Moderator(s): Ramana Davuluri


Authors List: Show

  • Aws Almir Ahmad, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Canada
  • Arvind Mer, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Canada
  • Ben Patrick, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Canada
  • Stefan Wallin, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Canada

Presentation Overview: Show

Gene Ontology (GO) provides a robust framework for systematically describing the functional attributes of genes and proteins. However, the rapid expansion of scientific literature poses a significant challenge for manual GO curation, often resulting in data gaps and inconsistencies in functional assignments. To address this issue, we present GOFinder AI, a computational platform that automates the assignment of GO terms to genes by leveraging fine-tuned Large Language Models (LLMs). For any given biological entity (such as gene name), GOFinder AI retrieves relevant information from the latest peer-reviewed research and accurately maps it to GO terms, thereby maintaining a current and comprehensive linkage between entities and GO terms without relying solely on manual curation. GOFinder AI operates through a two-stage pipeline. First, a targeted query mechanism retrieves relevant publications (e.g., from NCBI-PubMed) for any given gene of interest, and the LLM then processes the text to extract functional evidence. In the second stage, these functional excerpts get subsequently mapped to GO annotations using the fine-tuned LLM model, with reasoning provided for each assignment. This mapping step independently runs three times, with a voting-based approach to obtain consensus annotations. To ensure the reliability of these automated annotations, we fine-tuned multiple LLMs, including LLaMA 3.1 and DeepSeek, using a large dataset of validated, manually curated GO annotations. The goal was to maximize the tool’s domain-specific accuracy and minimize the generation of erroneous (“hallucinated”) term annotations. In addition, we integrated Retrieval-Augmented Generation (RAG) into the pipeline, enabling the model to dynamically consult the GO database and incorporate the most contextually relevant terms. Our evaluation, based on model cross-comparisons, demonstrated that combining fine-tuning, parallel annotation runs, and RAG substantially improves the precision of annotations. When tested against a benchmark of over 100 manually curated GO annotations, the fine-tuned LLaMA 3.1-8B model-based system achieved higher predictive accuracy than both ChatGPT and its own zero-shot counterpart. To enhance interpretability, GOFinder AI incorporates an explainable AI framework that traces the full decision-making process behind each GO assignment, from retrieving relevant literature to selecting the final annotation. This framework highlights the specific text supporting each assignment and enables users to approve, reject, or refine annotations, creating a feedback loop that enhances model accuracy over time through reinforcement learning. Beyond gene function annotations, our results suggest that GOFinder AI can successfully assign GO terms to other biological entities, such as metabolites, broadening its applicability for systems biology and bioinformatics research. Overall, GOFinder AI represents a scalable, accurate, and transparent approach to keeping GO annotations continuously updated in the face of rapidly evolving scientific knowledge.

17:00-18:00
Invited Presentation: Brain isoform expression at single-cell resolution in (developmental) time and (anatomical) space –and in neurodegeneration
Format: In person

Moderator(s): Ferhat Ay


Authors List: Show

  • Hagen Tilgner