Return to ISMB/ECCB 2025 Homepage   Click here for the abridged agenda


Select Track: 3DSIG | Bio-Ontologies and Knowledge Representation | BioInfo-Core | Bioinfo4Women Meet-Up | Bioinformatics in the UK | BioVis | BOSC | CAMDA | CollaborationFest | CompMS | Computational Systems Immunology | Distinguished Keynotes | Dream Challenges | Education | Equity and Diversity | EvolCompGen | Fellows Presentation | Function | General Computational Biology | HiTSeq | iRNA | ISCB-China Workshop | JPI | MICROBIOME | MLCSB | NetBio | NIH Cyberinfrastructure and Emerging Technologies Sessions | NIH/Elixir | Publications - Navigating Journal Submissions | RegSys | Special Track | Stewardship Critical Infrastructure | Student Council Symposium | SysMod | Tech Track | Text Mining | The Innovation Pipeline: How Industry & Academia Can Work Together in Computational Biology | TransMed | Tutorials | VarI | WEB 2025 | Youth Bioinformatics Symposium | All


Schedule for Tutorials

NOTE: Browser resolution may limit the width of the agenda and you may need to scroll the iframe to see additional columns.
Click the buttons below to download your current table in that format

Date Start Time End Time Room Track Title Confrimed Presenter Format Authors Abstract
2025-07-20 09:00:00 10:45:00 03A Tutorials Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource This tutorial is designed to empower bioinformatics researchers with the knowledge and skills to effectively utilize Massively Parallel Reporter Assays (MPRAs) data in their work. MPRAs are gaining wider applications across the functional genomics community and are used as part of the Impact of Genomic Variation on Function (IGVF) Consortium. IGVF is a collaborative research initiative funded by the NHGRI that aims to systematically study how genomic variations affect genome function and, consequently, phenotypes. By integrating experimental and computational approaches, IGVF seeks to map and predict the functional impacts of genetic variants, providing a comprehensive catalog of these effects. This tutorial provides a thorough introduction in MPRAs and IGVF data resources, practical training on MPRA data, and insights into advanced analysis methods for such data. Participants will gain an understanding of MPRA experiments, including their various experimental designs and the rationale for using them in functional genomics. This will involve learning the process of associating tags/barcodes with sequences incorporated in the reporter constructs from raw sequencing reads and counting barcodes from DNA sequencing and RNA expression. The tutorial will guide participants through data processing using MPRAsnakeflow, a streamlined snakemake workflow developed with IGVF for efficient MPRA data handling and QC reporting. Statistical analysis for sequence-level and variant-level effect testing of MPRA count data will be introduced using BCalm, a barcode-level MPRA analysis package developed as part of our IGVF efforts. Further, the tutorial will provide a starting point for training (deep learning) sequence models on MPRA data and related functional genomics datasets. Participants will learn how to extract meaningful insights from their datasets by investigating the sequence activity relationship and extracting important sequence motifs. By integrating these topics and methods, participants will leave the tutorial equipped with both theoretical knowledge and practical skills necessary for analyzing and using MPRA data effectively.
2025-07-20 09:00:00 10:45:00 04AB Tutorials Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research The interpretation of genetic variation is important for understanding human health and disease. Increased knowledge leads to societal benefits including faster disease diagnosis, a better understanding of disease progression, more efficient identification and prioritisation of drug targets for testing, resulting in overall better health outcomes for a population. Whilst the speed and cost of sequencing has reduced, the complexity of variant interpretation remains a bottleneck for understanding. This tutorial will explore the variety of annotations and techniques available to assess human variation and the implications of variant effects on human health and disease.
2025-07-20 09:00:00 10:45:00 03B Tutorials Tutorial IP4: Quantum Machine Learning for multi-omics analysis Single-cell and population-level multi-omics analyses have greatly enhanced our understanding of biological complexity. By integrating various types of biological data—such as genomics, proteomics, and transcriptomics, collectively known as multi-omics—these approaches have provided deep insights into the molecular mechanisms underlying complex diseases, both at the cellular level and across patient populations. As the size and complexity of multi-omics data continues to grow, the need to leverage emerging technologies such as artificial intelligence (AI) and quantum computing (QC) also grows. Recently, advances in QC have shown promise in solving real-world problems in machine learning and optimization in biomedicine, drug discovery, biomarker discovery, clinical trials, among other healthcare and life sciences objectives [1,2,3,4,5]. In this tutorial, participants will learn the fundamental concepts of QC, engage in hands-on experiments that apply classical machine learning (ML) techniques. They will also learn best practices for pre-processing multi-omics data in preparation for quantum machine learning (QML) tasks. Through a systematic evaluation of various data complexity measures and their impact on the performance of different ML and QML models, participants will gain insights into when to effectively utilize QML models. Additionally, they will explore quantum-classical hybrid workflows for ML, with a focus in biomedical data analysis.
2025-07-20 09:00:00 10:45:00 12 Tutorials Tutorial IP5: Introduction to Causal Analysis using Mendelian Randomisation Mendelian randomisation (MR) is a method that uses genetic variation associated with an exposure (e.g., behaviours, biomarkers) to infer its causal effect on an outcome (e.g. health status). In statistical terms, it functions as an "instrumental variable" approach. By mimicking the design of a randomised controlled trial through genetic inheritance, MR provides a framework for addressing confounding and reverse causation, making it a valuable tool in epidemiological and biomedical research. This workshop offers a beginner-friendly introduction to the key concepts and assumptions underlying MR, such as the use of genome-wide association study (GWAS) data and the three key assumptions for valid instrumental variables: relevance, independence, and exclusion restriction. Participants will explore common challenges in MR analysis, including pleiotropy, population stratification, and measurement error while learning strategies to overcome these using advanced methods. The workshop also includes a two-hour hands-on session in which attendees will work with real-world data to conduct MR analyses using R. By the end of the session, participants will have a clear understanding of MR principles, the ability to critically evaluate MR studies, and practical skills to apply MR methods in their own research.
2025-07-20 09:00:00 10:45:00 11BC Tutorials Tutorial IP6: Hello Nextflow: Getting started with workflows for bioinformatics Nextflow is a powerful and flexible open-source workflow management system that simplifies the development, execution, and scalability of data-driven computational pipelines. It is widely used in bioinformatics and other scientific fields to automate complex analyses, making it easier to manage and reproduce large-scale data analysis workflows. This training workshop is intended as a “getting started” course for students and early-career researchers who are completely new to Nextflow. It aims to equip participants with foundational knowledge and skills in three key areas: (1) understanding the logic of how data analysis workflows are constructed, (2) Nextflow language proficiency and (3) command-line interface (CLI) execution. Participants will be guided through hands-on, goal-oriented exercises that will allow them to practice the following skills: Use core components of the Nextflow language to construct simple multi-step workflows effectively. Launch Nextflow workflows locally, navigate output directories to access results, interpret log outputs for insights into workflow execution, and troubleshoot basic issues that may arise during workflow execution. By the end of the workshop, participants will be well-prepared for tackling the next steps in their journey to develop and apply reproducible workflows for their scientific computing needs. Additional study-at-home materials will be provided for them to continue learning and developing their skills further.
2025-07-20 09:00:00 10:45:00 11A Tutorials Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels Omics data analysis presents unique challenges due to its high dimensionality and complexity. Supervised machine learning (ML) offers powerful tools for gaining insights from these data but currently faces a crisis of reproducibility due to poor adherence to best practices when undertaking feature selection, model evaluation, and needs for further interpretability. This full-day tutorial introduces participants to the common pitfalls and best practices of applying ML to omics research. It exemplifies good practice through example using the Tidymodels framework for ML workflows in R, tailored to omics applications. The course will feature a mixture of lectures, quizzes, real-life coding tutorials and hands-on practicals with 1-1 support. Example applications will illustrate regression analysis with methylation clocks, gene prioritisation and classification with cancer biomarker discovery. Special attention will be paid to challenges in working with highly multivariate data and integrating various data types as well as providing tips to extract meaningful insights from complex data. Beginner-level R skills are required, and attendees will leave with practical skills to apply Tidymodels to their own datasets.
2025-07-20 11:00:00 13:00:00 12 Tutorials Tutorial IP5: Introduction to Causal Analysis using Mendelian Randomisation Mendelian randomisation (MR) is a method that uses genetic variation associated with an exposure (e.g., behaviours, biomarkers) to infer its causal effect on an outcome (e.g. health status). In statistical terms, it functions as an "instrumental variable" approach. By mimicking the design of a randomised controlled trial through genetic inheritance, MR provides a framework for addressing confounding and reverse causation, making it a valuable tool in epidemiological and biomedical research. This workshop offers a beginner-friendly introduction to the key concepts and assumptions underlying MR, such as the use of genome-wide association study (GWAS) data and the three key assumptions for valid instrumental variables: relevance, independence, and exclusion restriction. Participants will explore common challenges in MR analysis, including pleiotropy, population stratification, and measurement error while learning strategies to overcome these using advanced methods. The workshop also includes a two-hour hands-on session in which attendees will work with real-world data to conduct MR analyses using R. By the end of the session, participants will have a clear understanding of MR principles, the ability to critically evaluate MR studies, and practical skills to apply MR methods in their own research.
2025-07-20 11:00:00 13:00:00 11BC Tutorials Tutorial IP6: Hello Nextflow: Getting started with workflows for bioinformatics Nextflow is a powerful and flexible open-source workflow management system that simplifies the development, execution, and scalability of data-driven computational pipelines. It is widely used in bioinformatics and other scientific fields to automate complex analyses, making it easier to manage and reproduce large-scale data analysis workflows. This training workshop is intended as a “getting started” course for students and early-career researchers who are completely new to Nextflow. It aims to equip participants with foundational knowledge and skills in three key areas: (1) understanding the logic of how data analysis workflows are constructed, (2) Nextflow language proficiency and (3) command-line interface (CLI) execution. Participants will be guided through hands-on, goal-oriented exercises that will allow them to practice the following skills: Use core components of the Nextflow language to construct simple multi-step workflows effectively. Launch Nextflow workflows locally, navigate output directories to access results, interpret log outputs for insights into workflow execution, and troubleshoot basic issues that may arise during workflow execution. By the end of the workshop, participants will be well-prepared for tackling the next steps in their journey to develop and apply reproducible workflows for their scientific computing needs. Additional study-at-home materials will be provided for them to continue learning and developing their skills further.
2025-07-20 11:00:00 13:00:00 03B Tutorials Tutorial IP4: Quantum Machine Learning for multi-omics analysis Single-cell and population-level multi-omics analyses have greatly enhanced our understanding of biological complexity. By integrating various types of biological data—such as genomics, proteomics, and transcriptomics, collectively known as multi-omics—these approaches have provided deep insights into the molecular mechanisms underlying complex diseases, both at the cellular level and across patient populations. As the size and complexity of multi-omics data continues to grow, the need to leverage emerging technologies such as artificial intelligence (AI) and quantum computing (QC) also grows. Recently, advances in QC have shown promise in solving real-world problems in machine learning and optimization in biomedicine, drug discovery, biomarker discovery, clinical trials, among other healthcare and life sciences objectives [1,2,3,4,5]. In this tutorial, participants will learn the fundamental concepts of QC, engage in hands-on experiments that apply classical machine learning (ML) techniques. They will also learn best practices for pre-processing multi-omics data in preparation for quantum machine learning (QML) tasks. Through a systematic evaluation of various data complexity measures and their impact on the performance of different ML and QML models, participants will gain insights into when to effectively utilize QML models. Additionally, they will explore quantum-classical hybrid workflows for ML, with a focus in biomedical data analysis.
2025-07-20 11:00:00 13:00:00 04AB Tutorials Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research The interpretation of genetic variation is important for understanding human health and disease. Increased knowledge leads to societal benefits including faster disease diagnosis, a better understanding of disease progression, more efficient identification and prioritisation of drug targets for testing, resulting in overall better health outcomes for a population. Whilst the speed and cost of sequencing has reduced, the complexity of variant interpretation remains a bottleneck for understanding. This tutorial will explore the variety of annotations and techniques available to assess human variation and the implications of variant effects on human health and disease.
2025-07-20 11:00:00 13:00:00 03A Tutorials Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource This tutorial is designed to empower bioinformatics researchers with the knowledge and skills to effectively utilize Massively Parallel Reporter Assays (MPRAs) data in their work. MPRAs are gaining wider applications across the functional genomics community and are used as part of the Impact of Genomic Variation on Function (IGVF) Consortium. IGVF is a collaborative research initiative funded by the NHGRI that aims to systematically study how genomic variations affect genome function and, consequently, phenotypes. By integrating experimental and computational approaches, IGVF seeks to map and predict the functional impacts of genetic variants, providing a comprehensive catalog of these effects. This tutorial provides a thorough introduction in MPRAs and IGVF data resources, practical training on MPRA data, and insights into advanced analysis methods for such data. Participants will gain an understanding of MPRA experiments, including their various experimental designs and the rationale for using them in functional genomics. This will involve learning the process of associating tags/barcodes with sequences incorporated in the reporter constructs from raw sequencing reads and counting barcodes from DNA sequencing and RNA expression. The tutorial will guide participants through data processing using MPRAsnakeflow, a streamlined snakemake workflow developed with IGVF for efficient MPRA data handling and QC reporting. Statistical analysis for sequence-level and variant-level effect testing of MPRA count data will be introduced using BCalm, a barcode-level MPRA analysis package developed as part of our IGVF efforts. Further, the tutorial will provide a starting point for training (deep learning) sequence models on MPRA data and related functional genomics datasets. Participants will learn how to extract meaningful insights from their datasets by investigating the sequence activity relationship and extracting important sequence motifs. By integrating these topics and methods, participants will leave the tutorial equipped with both theoretical knowledge and practical skills necessary for analyzing and using MPRA data effectively.
2025-07-20 11:00:00 13:00:00 11A Tutorials Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels Omics data analysis presents unique challenges due to its high dimensionality and complexity. Supervised machine learning (ML) offers powerful tools for gaining insights from these data but currently faces a crisis of reproducibility due to poor adherence to best practices when undertaking feature selection, model evaluation, and needs for further interpretability. This full-day tutorial introduces participants to the common pitfalls and best practices of applying ML to omics research. It exemplifies good practice through example using the Tidymodels framework for ML workflows in R, tailored to omics applications. The course will feature a mixture of lectures, quizzes, real-life coding tutorials and hands-on practicals with 1-1 support. Example applications will illustrate regression analysis with methylation clocks, gene prioritisation and classification with cancer biomarker discovery. Special attention will be paid to challenges in working with highly multivariate data and integrating various data types as well as providing tips to extract meaningful insights from complex data. Beginner-level R skills are required, and attendees will leave with practical skills to apply Tidymodels to their own datasets.
2025-07-20 14:00:00 16:00:00 12 Tutorials Tutorial IP8: Representation Learning and Feature Engineering for Genomic Sequences Analysis Machine learning (ML) has been successfully applied in different omics problems, such as sequence classification in the field of genomics. The effectiveness of ML methods relies greatly on the selection of the data representation, or features, that extract meaningful information from sequences. Genomic sequences can be viewed as one-dimensional strings of successive letters representing nucleotides. However, to make these sequences compatible with ML methods, they must first be transformed into structured numerical representations, such as vectors or matrices. Traditional methods for sequence classification often rely on manually crafted or pre-defined features, which require domain expertise and may not fully capture the complexity of the underlying biological information. Recently, representation learning has emerged as a powerful alternative, enabling the automatic extraction of latent patterns directly from raw data and reducing the dependence on manually crafted features. In genomics, representation learning methods have been introduced to characterize DNA and RNA sequences. In genomics, techniques like Word2Vec, Convolutional Neural Networks (CNNs) and Large Language Models (LLMs) have demonstrated the ability to learn optimal sequence representations that effectively capture both local and global patterns in DNA and RNA sequences. This tutorial provides a comprehensive introduction to feature engineering and representation learning for genomic sequences (DNA/RNA). Participants will explore traditional techniques for extracting features from genomic sequences, building a foundation in classical approaches. Furthermore, the tutorial will cover representation learning, introducing concepts such as embeddings and their applications. Topics include methods such as Word2vec and LLMs to obtain meaningful representations from genomic sequences. Through hands-on exercises and comparative analyses, attendees will learn to combine traditional feature engineering with representation learning approaches, developing practical skills and insights that are adaptable to diverse genomic research challenges. The goal is to offer participants the knowledge and tools to enhance genomic sequence analysis using different techniques for sequence representation.
2025-07-20 14:00:00 16:00:00 11BC Tutorials Tutorial IP7: AI large cellular models and in-silico perturbation Transformer-based large language models (LLMs) are changing the world. The capabilities they illustrated in sophisticated natural language, vision and multi-modal tasks have inspired the development of large cellular models (LCMs) for single-cell transcriptomic data, such as scBERT, Geneformer, scGPT, scFoundation, GeneCompass, scMulan, etc. After pretraining on massive amount of single-cell RNA-seq data agnostic to any downstream task, these transformer-based models have demonstrated exceptional performance in various tasks such as cell type annotation, data integration, gene network inference, and the prediction of drug sensitivity or perturbation responses. Such advancements, albeit still in their early stage, suggested promising revolutionary approaches for leveraging AI to understand the complex system of cells from extensive datasets beyond human analytical capacity. Especially, such models have made it possible to conduct in-silico perturbation on cells of various types to predict their responses to gene perturbations without doing experiments on the cells. These models provided prototypes of digital virtual cells that can be used to reconstruct and simulate live cells, which will revolutionize many aspects of future biomedical studies. Although the community is high enthusiastic to these exciting progresses, the structures and algorithms of LCMs and other similar-scale AI models are mysterious to many people who were not equipped with relevant backgrounds. This tutorial will try to fill this gap. In the tutorial, we will begin from an introduction of basic principles of deep neural networks, and explain the basic structure and algorithm of the original Transformer for natural language tasks. We’ll show to the attendees how to build such models based on current machine learning platforms. Then we’ll introduce several successful ways to build large cellular models based on the basic Transformer model, and overview how such models are pretrained on single-cell RNA-seq data. We’ll show and let the attendees to practice how to use LCMs for basic tasks such as cell type annotation, and look into the specific application of LCMs on in-silico perturbation tasks. Attendees will engage in hands-on activities such as building basic transformer models and executing downstream single-cell tasks, including cell type annotation and in-silico perturbation. These activities will remove the mystery of LCMs for the attendees and help them better understand and feel how LCMs can be built and applied
2025-07-20 14:00:00 16:00:00 11A Tutorials Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels Omics data analysis presents unique challenges due to its high dimensionality and complexity. Supervised machine learning (ML) offers powerful tools for gaining insights from these data but currently faces a crisis of reproducibility due to poor adherence to best practices when undertaking feature selection, model evaluation, and needs for further interpretability. This full-day tutorial introduces participants to the common pitfalls and best practices of applying ML to omics research. It exemplifies good practice through example using the Tidymodels framework for ML workflows in R, tailored to omics applications. The course will feature a mixture of lectures, quizzes, real-life coding tutorials and hands-on practicals with 1-1 support. Example applications will illustrate regression analysis with methylation clocks, gene prioritisation and classification with cancer biomarker discovery. Special attention will be paid to challenges in working with highly multivariate data and integrating various data types as well as providing tips to extract meaningful insights from complex data. Beginner-level R skills are required, and attendees will leave with practical skills to apply Tidymodels to their own datasets.
2025-07-20 14:00:00 16:00:00 03A Tutorials Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource This tutorial is designed to empower bioinformatics researchers with the knowledge and skills to effectively utilize Massively Parallel Reporter Assays (MPRAs) data in their work. MPRAs are gaining wider applications across the functional genomics community and are used as part of the Impact of Genomic Variation on Function (IGVF) Consortium. IGVF is a collaborative research initiative funded by the NHGRI that aims to systematically study how genomic variations affect genome function and, consequently, phenotypes. By integrating experimental and computational approaches, IGVF seeks to map and predict the functional impacts of genetic variants, providing a comprehensive catalog of these effects. This tutorial provides a thorough introduction in MPRAs and IGVF data resources, practical training on MPRA data, and insights into advanced analysis methods for such data. Participants will gain an understanding of MPRA experiments, including their various experimental designs and the rationale for using them in functional genomics. This will involve learning the process of associating tags/barcodes with sequences incorporated in the reporter constructs from raw sequencing reads and counting barcodes from DNA sequencing and RNA expression. The tutorial will guide participants through data processing using MPRAsnakeflow, a streamlined snakemake workflow developed with IGVF for efficient MPRA data handling and QC reporting. Statistical analysis for sequence-level and variant-level effect testing of MPRA count data will be introduced using BCalm, a barcode-level MPRA analysis package developed as part of our IGVF efforts. Further, the tutorial will provide a starting point for training (deep learning) sequence models on MPRA data and related functional genomics datasets. Participants will learn how to extract meaningful insights from their datasets by investigating the sequence activity relationship and extracting important sequence motifs. By integrating these topics and methods, participants will leave the tutorial equipped with both theoretical knowledge and practical skills necessary for analyzing and using MPRA data effectively.
2025-07-20 14:00:00 16:00:00 04AB Tutorials Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research The interpretation of genetic variation is important for understanding human health and disease. Increased knowledge leads to societal benefits including faster disease diagnosis, a better understanding of disease progression, more efficient identification and prioritisation of drug targets for testing, resulting in overall better health outcomes for a population. Whilst the speed and cost of sequencing has reduced, the complexity of variant interpretation remains a bottleneck for understanding. This tutorial will explore the variety of annotations and techniques available to assess human variation and the implications of variant effects on human health and disease.
2025-07-20 14:00:00 16:00:00 03B Tutorials Tutorial IP4: Quantum Machine Learning for multi-omics analysis Single-cell and population-level multi-omics analyses have greatly enhanced our understanding of biological complexity. By integrating various types of biological data—such as genomics, proteomics, and transcriptomics, collectively known as multi-omics—these approaches have provided deep insights into the molecular mechanisms underlying complex diseases, both at the cellular level and across patient populations. As the size and complexity of multi-omics data continues to grow, the need to leverage emerging technologies such as artificial intelligence (AI) and quantum computing (QC) also grows. Recently, advances in QC have shown promise in solving real-world problems in machine learning and optimization in biomedicine, drug discovery, biomarker discovery, clinical trials, among other healthcare and life sciences objectives [1,2,3,4,5]. In this tutorial, participants will learn the fundamental concepts of QC, engage in hands-on experiments that apply classical machine learning (ML) techniques. They will also learn best practices for pre-processing multi-omics data in preparation for quantum machine learning (QML) tasks. Through a systematic evaluation of various data complexity measures and their impact on the performance of different ML and QML models, participants will gain insights into when to effectively utilize QML models. Additionally, they will explore quantum-classical hybrid workflows for ML, with a focus in biomedical data analysis.
2025-07-20 16:15:00 18:00:00 11BC Tutorials Tutorial IP7: AI large cellular models and in-silico perturbation Transformer-based large language models (LLMs) are changing the world. The capabilities they illustrated in sophisticated natural language, vision and multi-modal tasks have inspired the development of large cellular models (LCMs) for single-cell transcriptomic data, such as scBERT, Geneformer, scGPT, scFoundation, GeneCompass, scMulan, etc. After pretraining on massive amount of single-cell RNA-seq data agnostic to any downstream task, these transformer-based models have demonstrated exceptional performance in various tasks such as cell type annotation, data integration, gene network inference, and the prediction of drug sensitivity or perturbation responses. Such advancements, albeit still in their early stage, suggested promising revolutionary approaches for leveraging AI to understand the complex system of cells from extensive datasets beyond human analytical capacity. Especially, such models have made it possible to conduct in-silico perturbation on cells of various types to predict their responses to gene perturbations without doing experiments on the cells. These models provided prototypes of digital virtual cells that can be used to reconstruct and simulate live cells, which will revolutionize many aspects of future biomedical studies. Although the community is high enthusiastic to these exciting progresses, the structures and algorithms of LCMs and other similar-scale AI models are mysterious to many people who were not equipped with relevant backgrounds. This tutorial will try to fill this gap. In the tutorial, we will begin from an introduction of basic principles of deep neural networks, and explain the basic structure and algorithm of the original Transformer for natural language tasks. We’ll show to the attendees how to build such models based on current machine learning platforms. Then we’ll introduce several successful ways to build large cellular models based on the basic Transformer model, and overview how such models are pretrained on single-cell RNA-seq data. We’ll show and let the attendees to practice how to use LCMs for basic tasks such as cell type annotation, and look into the specific application of LCMs on in-silico perturbation tasks. Attendees will engage in hands-on activities such as building basic transformer models and executing downstream single-cell tasks, including cell type annotation and in-silico perturbation. These activities will remove the mystery of LCMs for the attendees and help them better understand and feel how LCMs can be built and applied
2025-07-20 16:15:00 18:00:00 12 Tutorials Tutorial IP8: Representation Learning and Feature Engineering for Genomic Sequences Analysis Machine learning (ML) has been successfully applied in different omics problems, such as sequence classification in the field of genomics. The effectiveness of ML methods relies greatly on the selection of the data representation, or features, that extract meaningful information from sequences. Genomic sequences can be viewed as one-dimensional strings of successive letters representing nucleotides. However, to make these sequences compatible with ML methods, they must first be transformed into structured numerical representations, such as vectors or matrices. Traditional methods for sequence classification often rely on manually crafted or pre-defined features, which require domain expertise and may not fully capture the complexity of the underlying biological information. Recently, representation learning has emerged as a powerful alternative, enabling the automatic extraction of latent patterns directly from raw data and reducing the dependence on manually crafted features. In genomics, representation learning methods have been introduced to characterize DNA and RNA sequences. In genomics, techniques like Word2Vec, Convolutional Neural Networks (CNNs) and Large Language Models (LLMs) have demonstrated the ability to learn optimal sequence representations that effectively capture both local and global patterns in DNA and RNA sequences. This tutorial provides a comprehensive introduction to feature engineering and representation learning for genomic sequences (DNA/RNA). Participants will explore traditional techniques for extracting features from genomic sequences, building a foundation in classical approaches. Furthermore, the tutorial will cover representation learning, introducing concepts such as embeddings and their applications. Topics include methods such as Word2vec and LLMs to obtain meaningful representations from genomic sequences. Through hands-on exercises and comparative analyses, attendees will learn to combine traditional feature engineering with representation learning approaches, developing practical skills and insights that are adaptable to diverse genomic research challenges. The goal is to offer participants the knowledge and tools to enhance genomic sequence analysis using different techniques for sequence representation.
2025-07-20 16:15:00 18:00:00 11A Tutorials Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels Omics data analysis presents unique challenges due to its high dimensionality and complexity. Supervised machine learning (ML) offers powerful tools for gaining insights from these data but currently faces a crisis of reproducibility due to poor adherence to best practices when undertaking feature selection, model evaluation, and needs for further interpretability. This full-day tutorial introduces participants to the common pitfalls and best practices of applying ML to omics research. It exemplifies good practice through example using the Tidymodels framework for ML workflows in R, tailored to omics applications. The course will feature a mixture of lectures, quizzes, real-life coding tutorials and hands-on practicals with 1-1 support. Example applications will illustrate regression analysis with methylation clocks, gene prioritisation and classification with cancer biomarker discovery. Special attention will be paid to challenges in working with highly multivariate data and integrating various data types as well as providing tips to extract meaningful insights from complex data. Beginner-level R skills are required, and attendees will leave with practical skills to apply Tidymodels to their own datasets.
2025-07-20 16:15:00 18:00:00 03A Tutorials Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource This tutorial is designed to empower bioinformatics researchers with the knowledge and skills to effectively utilize Massively Parallel Reporter Assays (MPRAs) data in their work. MPRAs are gaining wider applications across the functional genomics community and are used as part of the Impact of Genomic Variation on Function (IGVF) Consortium. IGVF is a collaborative research initiative funded by the NHGRI that aims to systematically study how genomic variations affect genome function and, consequently, phenotypes. By integrating experimental and computational approaches, IGVF seeks to map and predict the functional impacts of genetic variants, providing a comprehensive catalog of these effects. This tutorial provides a thorough introduction in MPRAs and IGVF data resources, practical training on MPRA data, and insights into advanced analysis methods for such data. Participants will gain an understanding of MPRA experiments, including their various experimental designs and the rationale for using them in functional genomics. This will involve learning the process of associating tags/barcodes with sequences incorporated in the reporter constructs from raw sequencing reads and counting barcodes from DNA sequencing and RNA expression. The tutorial will guide participants through data processing using MPRAsnakeflow, a streamlined snakemake workflow developed with IGVF for efficient MPRA data handling and QC reporting. Statistical analysis for sequence-level and variant-level effect testing of MPRA count data will be introduced using BCalm, a barcode-level MPRA analysis package developed as part of our IGVF efforts. Further, the tutorial will provide a starting point for training (deep learning) sequence models on MPRA data and related functional genomics datasets. Participants will learn how to extract meaningful insights from their datasets by investigating the sequence activity relationship and extracting important sequence motifs. By integrating these topics and methods, participants will leave the tutorial equipped with both theoretical knowledge and practical skills necessary for analyzing and using MPRA data effectively.
2025-07-20 16:15:00 18:00:00 04AB Tutorials Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research The interpretation of genetic variation is important for understanding human health and disease. Increased knowledge leads to societal benefits including faster disease diagnosis, a better understanding of disease progression, more efficient identification and prioritisation of drug targets for testing, resulting in overall better health outcomes for a population. Whilst the speed and cost of sequencing has reduced, the complexity of variant interpretation remains a bottleneck for understanding. This tutorial will explore the variety of annotations and techniques available to assess human variation and the implications of variant effects on human health and disease.
2025-07-20 16:15:00 18:00:00 03B Tutorials Tutorial IP4: Quantum Machine Learning for multi-omics analysis Single-cell and population-level multi-omics analyses have greatly enhanced our understanding of biological complexity. By integrating various types of biological data—such as genomics, proteomics, and transcriptomics, collectively known as multi-omics—these approaches have provided deep insights into the molecular mechanisms underlying complex diseases, both at the cellular level and across patient populations. As the size and complexity of multi-omics data continues to grow, the need to leverage emerging technologies such as artificial intelligence (AI) and quantum computing (QC) also grows. Recently, advances in QC have shown promise in solving real-world problems in machine learning and optimization in biomedicine, drug discovery, biomarker discovery, clinical trials, among other healthcare and life sciences objectives [1,2,3,4,5]. In this tutorial, participants will learn the fundamental concepts of QC, engage in hands-on experiments that apply classical machine learning (ML) techniques. They will also learn best practices for pre-processing multi-omics data in preparation for quantum machine learning (QML) tasks. Through a systematic evaluation of various data complexity measures and their impact on the performance of different ML and QML models, participants will gain insights into when to effectively utilize QML models. Additionally, they will explore quantum-classical hybrid workflows for ML, with a focus in biomedical data analysis.

- top -