Tutorials

There will be a series of in-person and virtual tutorials prior to the start of the conference. Tutorial registration fees are shown at: https://www.iscb.org/ismbeccb2025/register#tutorials

In-person Tutorials (All times BST)

Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels SOLD OUT
Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource
Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research
Tutorial IP4: Quantum Machine Learning for multi-omics analysis SOLD OUT
Tutorial IP5: Introduction to Causal Analysis using Mendelian Randomisation
Tutorial IP6: Hello Nextflow: Getting started with workflows for bioinformatics SOLD OUT
Tutorial IP7: AI large cellular models and in-silico perturbation SOLD OUT
Tutorial IP8: Representation Learning and Feature Engineering for Genomic Sequences Analysis SOLD OUT

Virtual Tutorials: (All times BST) Presented through the conference virtual platform

Tutorial VT1: Visualising and interpreting your -omics results using ggplot2 and R
Tutorial VT2: OmicsViz: Interactive Visualization and ML for Omics Data
Tutorial VT3: Computational approaches for deciphering cell-cell communication from single-cell transcriptomics and spatial transcriptomics data SOLD OUT
Tutorial VT4: An applied genomics approach to crop breeding: A suite of tools for exploring natural and artificial diversity
Tutorial VT5: Comprehensive Bioinformatics and Statistical Approaches for High-Throughput Sequencing Data Analysis, Including scRNA-seq, in Biomarker Discovery
Tutorial VT6: Beyond Bioinformatics: Snakemake for Versatile Computational Workflows
Tutorial VT7: Assessing and Enhancing Digital Accessibility of Biological Data and Visualizations
Tutorial VT8: Generative AI for Single-Cell Perturbation Modeling: Theoretical and practical considerations SOLD OUT
Tutorial VT9: Biomedical text mining for knowledge extraction

Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels SOLD OUT

Room: 11A
Date: July 20, 2025
Start Time: 09:00
End Time: 18:00

Organizer:
Jamie Soul

Speakers:
Jamie Soul, University of Liverpool
Eva Caamano Gutierrez, University of Liverpool
Anthony Evans, University of Liverpool

Max Participants: 30

Description
Omics data analysis presents unique challenges due to its high dimensionality and complexity. Supervised machine learning (ML) offers powerful tools for gaining insights from these data but currently faces a crisis of reproducibility due to poor adherence to best practices when undertaking feature selection, model evaluation, and needs for further interpretability.

This full-day tutorial introduces participants to the common pitfalls and best practices of applying ML to omics research. It exemplifies good practice through example using the Tidymodels framework for ML workflows in R, tailored to omics applications. The course will feature a mixture of lectures, quizzes, real-life coding tutorials and hands-on practicals with 1-1 support. Example applications will illustrate regression analysis with methylation clocks, gene prioritisation and classification with cancer biomarker discovery.

Special attention will be paid to challenges in working with highly multivariate data and integrating various data types as well as providing tips to extract meaningful insights from complex data. Beginner-level R skills are required, and attendees will leave with practical skills to apply Tidymodels to their own datasets.

Learning Objectives

Understand the challenges and pitfalls of using supervised machine learning on omics data, including reproducibility, overfitting, and feature selection.
Ability to critically appraise published examples as well as gain familiarity with reporting practices such as DOME.
Gain hands-on experience using Tidymodels to implement machine learning workflows in R.
Learn techniques for feature selection, dimensionality reduction, and inclusion of network-based information.
Explore interpretable machine learning approaches to improve biological insights.
Develop skills to apply Tidymodels for methylation clock modeling, gene prioritisation and mechanistic biomarker discovery.

Intended Audience and Level
This course is designed for researchers with at least a basic understanding of R programming and an interest in applying machine learning to omics data analysis. Prior exposure to omics data is beneficial but not mandatory. The course will provide a gentle introduction to relevant concepts while focusing on practical skills development. No prior experience with Tidymodels or advanced machine learning techniques is required. The tutorial is tailored to beginners and those looking to expand their skills in machine learning data analysis using R.

Schedule

09:00-09:45	Lecture: Introduction to Machine Learning in Omics Introductions and overview of the course structure and learning objectives Introductory lecture on ML principles, applications and common pitfalls in application to omics data, including “crisis of reproducibility”, high dimensionality, and interpretability challenges Introduction to reporting recommendations (e.g. DOME, FAIR adherence) and quiz to reinforce key concepts Eva Caamano Gutierrez
09:45-10:00	Hand-on: Critical appraisal of published examples Identifying good practice and areas of development. Expand familiarity with best reporting practices.
10:00-10:45	Tidymodels Framework: A Practical Introduction A follow along tutorial covering: Introduction to Tidymodels: Overview of the Tidymodels ecosystem and its core packages and how tidymodels streamlines the end-to-end machine learning workflow Understanding and Avoiding Data Leakage: Explanation of data leakage and why it can compromise model validity and examples of best practices Specifying Preprocessing Recipes: How to define reusable preprocessing pipelines using the recipes package Model Fitting with Parsnip: Introduction to different model types and how Tidymodels supports multiple ML algorithms Model Evaluation with Yardstick: Best practices for evaluating model performance using appropriate metrics and hands on demonstration Dr Jamie Soul
10:45-11:00	Coffee Break
11:00-12:30	Hands-on: Explore Tidymodels basics with a provided Quarto notebook Participants will work through a quarto notebook at their own pace implementing a simple classification model using the tidymodels framework based on the previous demonstration.
12:30-13:00	Lecture: Strategies for Managing Large Omics Datasets This lecture will provide strategies for handling the challenges posed by large omics datasets. Topics will include class imbalance, techniques for reducing dimensionality, and feature selection methods. Participants will learn how these methods can enhance model performance and reveal key biological insights. The session will conclude with a quiz to test understanding and reinforce key concepts discussed during the lecture. Jamie Soul
13:00-14:00	Lunch Break
14:00-15:15	Hands-on: Using Tidymodels to build a methylation clock In this interactive session, participants will learn how to use the Tidymodels framework to build a cross-validated regression model. The focus will be on predicting the age of an organism using methylation array data, providing a practical introduction to model building, validation, and analysis in the context of biological data.
15:15-15:30	Lecture: A short snapshot of networks and pathways Introduction to networks and considerations for inclusion for ML Anthony Evans
15:30-16:00	Hands on: Enhancing Machine Learning with Biological Context: Integrating Networks and Pathways This practical session will explore strategies for incorporating additional biological information, such as networks and pathway data, into machine learning models. Participants will examine how these methods can enhance model performance and biological relevance when applied to the task of gene prioritsation.
16:00-16:15	Coffee Break
16:15-16:30	Lecture: a short introduction to gaining model interpretability Introduction to interpreting models and tradeoffs between performance and interpretability. Anthony Evans
16:30-17:30	Hands On: Identifying biological mechanisms with ML With examples applied to cancer biomarker discovery, participants will explore how interpretable machine learning (ML) techniques can be used to gain insights into biological mechanisms. The session will provide a practical introduction to key interpretability tools, such as SHAP (SHapley Additive exPlanations) values which help understand the relationship between input features and model predictions.
17:30-18:00	Wrap-Up and Discussion Review of key concepts, Q&A, and discussion of applications to their own research.

09:00-09:15	Welcome and introduction to the schedule
09:15-09:30	Tutorial materials access and setup of the Google CoLab environment Participants that fail with setting up their access und CoLab will be brought outside of the seminar room and helped by other speakers while the theoretical introductions commence
09:30-09:50	Lecture: IGVF and its data access Introduction to the IGVF mission and how to access its data
09:50-10:15	Lecture: Fundamentals of MPRA Experiments Explanation of experimental design, including the association of barcode/tag sequences to tested regulatory sequences and variants.
10:15-10:45	Coffee break
10:45-11:05	Hands-on: Association of Barcode/Tag Sequences Participants will learn how to associate barcode/tag sequences with designed oligos using the association step of the MPRAsnakeflow pipeline on a subsetted/small dataset.
11:05-11:30	Hands-on: Count Sequencing Analysis We will demonstrate how to count barcodes in both DNA and RNA sequencing on a subsetted/small dataset.
11:30-11:50	Discussion of QC metrics We will discuss the QC parameters and plots returned by MPRAsnakeflow for the full data set, emphasizing the importance of accurate quantification in MPRA experiments.
11:50-12:30	Hands-on: Data Analysis Steps (Regions and Variants) We will walk through potential analytical steps for utilizing count tables and perform statistical analysis using BCalm to identify regions with high activity compared to other sequences in the assay. Additionally, we will highlight methods for identifying variants with significant differences between reference and alternative sequences within designed oligos.
12:30-13:30	Lunch break
13:30-13:50	Lecture: Modeling of regulatory activity with sequence models We will provide a brief introduction and review of modeling efforts for sequence activity using gapped kmers, convolutional neural networks and language model approaches.
13:50-14:30	Hands-on: Training a sequence based model We will walk through the steps of training a model based on the DeepSTARR CNN architecture.
14:30-14:50	Discussion of pre-training and fine-tuning approaches We discuss the opportunity of pre-training convolutional layers on open chromatin data from multiple cell-types and fine-tuning these convolutions on MPRA data vs. using a language model like NT to train MPRA activity models.
14:50-15:30	Hands-on: Interpreting models with in-silico mutagenesis Supported by tools like TF-MoDISco for motif discovery, participants will investigate important transcription factor binding motifs using models trained on sequence data that predict activity. We will assess whether these motifs exert activating or repressing effects by comparing the activity of sequences with and without these identified motifs in the cell-type of interest.
15:30-16:00	Coffee break
16:00-16:40	Hands-on: Linking motifs to variant effects We will identify transcription factor binding sites (TFBS) overlapping significant variants using motifs derived from model interpretation or available position weight matrices (PWMs) to support the interpretation of variant effects.
16:40-16:50	Discussion of data limitations We discuss the limitations of variant effects in various sequence contexts. We consider the effects of studying only few or related cell-types and the sensitivity of the available modeling approaches for cell-type effects.
16:50-17:20	Participant questions and feedback
17:20-17:30	Concluding remarks by the speakers

09:15-09:30	Lecture 1: Introduction - The challenge of variant interpretation Variation in context of human health
09:30-10:00	Lecture 2: - Genomic Annotation for variation datasets Public annotation datasets, Variation sources, Transcript based annotations, Non-coding variation, Structural variation
10:00-10:45	Hands-on 1: Annotating and predicting molecular variant effect Variant prioritisation and scoring methods
10:45-11:00	Break
11:00-11:15	Lecture 3: Understanding Variant Effects Using Protein Structure Protein position, interaction and complexes for variant interpretation, Alphafold3 and predicted structures
11:15-11:45	Lecture 4: Understanding Variant Effects Using Protein Function Combining functional, structural and population annotations to contextualise variant effects in proteins
11:45-12:15	Hands-on 2: Using protein databases to investigate variant impact Using structural information to interpret variant effect
12:15-12:45	Lecture 5: Deep Mutational Scanning Genome Editing for Variant Analysis
12:45-14:00	Lunch Break
14:00-14:30	Lecture 6: Utilising clinical data in variant prioritisation and classification, Applications fordisease research and genomic diagnosis, DECIPHER, G2P, and GWAS Catalog
14:30-15:00	Lecture 7: Target tractability and drug associations Target prioritisation for drug discovery, Case studies
15:00-16:00	Group Projects Hands on activity using bioinformatics resources for variant interpretation
16:00-16:15	Break
16:15-17:00	Group Projects Hands on activity using bioinformatics resources for variant interpretation
17:00-17:45	Presentations from groups Present to peers to discuss ideas and future work
17:45-18:00	Closing Remarks

09:00-09:20	Welcome Remarks and Introduction
09:20-10:00	Hands-on: Quantum computing fundamentals with Qiskit
10:00-10:45	Current state of Quantum Machin Learning
10:45-11:00	Coffee Break
11:00-11:30	Data and Complexity measures
11:30-12:00	Quantum Kernel methods
12:00-13:00	Hands-on: Applying a Quantum-Classical machine learning benchmarking tool on omics data
13:00-14:00	Lunch
14:00-15:00	Hands-on: Implementing Quantum Kernel methods
15:00-16:00	Hands-on: Execute the benchmarking tool on omics data
16:00-16:15	Coffee Break
16:15-17:15	Hands-on: Review results from the benchmarking tool
17:15-17:45	Result Read-outs
17:45-18:00	Furute Directions & Concluding Remarks

09:00-09:45	Introduction to Mendelian Randomization Overview of MR and its applications Key assumptions: relevance, independence, and exclusion restriction Types of MR studies (one-sample, two-sample, bi-directional)
09:45-10:45	Challenges in Mendelian Randomization Horizontal pleiotropy and methods to address it Sample overlap and measurement error Population stratification and genetic heterogeneity
10:45-11:00	Coffee Break
11:00-13:00	Hands-On Tutorial in R Preparing exposure and outcome GWAS data Conducting basic MR analysis (TwoSampleMR R package) Robust methods: MR-Egger, weighted median, and leave-one-out analysis Interpreting and visualizing results

09:00-10:00	Hello World Basic components and principles involved in assembling and running a Nextflow workflow. Learn the basics of the Nextflow syntax, how a workflow script is structured and how it can be modified Run a workflow for the first time, parse messages and logs produced during a run, and find outputs Use variables and key command-line parameters to control inputs, outputs and execution behavior Chain multiple steps together and handling transfer of data between steps Handle input and output files
10:00-10:20	Hello Containers Using containers as a mechanism for managing software dependencies in the context of reproducible bioinformatics workflows.
10:20-10:50	Hello Config Setting up and managing a pipeline’s configuration to customize its behavior, adapt it to different environments, and optimize resource usage.
10:50-11:05	Hello Modules Using code modules to make pipeline development and maintenance more efficient and sustainable.
11:05-11:20	Next steps Overview of educational resources that participants can use to continue developing their Nextflow skills. Includes several domain-specific tutorials (currently Genomics, RNAseq in development) that provide a practical application of the concepts learned in this workshop to relevant use cases, and an orientation to the nf-core ecosystem of tools and pipelines that can be used out-of-the-box or as building blocks for customized solutions.

14:00-14:30	Introduction to artificial neural networks and deep learning
14:35-15:10	Introduction to language models and Transformers
15:10-15:25	Coffee break
15:25-15:55	Basic structure and components of transformer-based LCMs
15:55-16:15	Hands-on: Build transformer model with Python Building a toy transformer model with PyTorch Using pretrained LCMs for cell type annotation
16:15-16:45	LLMs for in-silico perturbation
16:45-17:00	Coffee break
17:00-17:20	Hands-on: in-silico gene perturbation Model construction and data preparation In-silico gene perturbation with LCMs Evaluating the results
17:20-17:40	Discussions
17:40-18:00	Summary

14:00-14:45	Lecture: Exploring Traditional Feature Engineering in Genomics Overview: Introduction to machine learning for sequence classification and methods to extract numerical information from biological sequences Introduction to traditional feature extraction: Sequence composition: GC content, nucleotide composition, basic k-mer and accumulated nucleotide frequency Numerical mapping: Z-curve and one-hot encoding Example of public software packages: Seq2Feature, iFeature and iLearn
14:45-15:45	Hands-on: Feature Extraction in Practice — Crafting Descriptors for Genomic Analysis Practical exercises for extracting traditional genomic features Application of basic ML pipeline for sequence classification Comparative analysis of the effectiveness of different feature extraction approaches
15:45-16:00	Coffee Break
16:00-16:45	Lecture: Decoding the Genomic Language — Embeddings and Representations for Genomic Sequences Introduction to representation learning for genomic sequences, highlighting their role in advancing genomic analysis Introduction to embeddings: Overview of word embeddings, with a focus on Word2Vec and its application to genomic sequences Foundation models in genomics: Overview of LLMs, focusing on DNABERT2 and Nucleotide Transformer for sequence representation learning
16:45-18:00	Hands-on: Embedding Genomic Data — From Word Embeddings to Large Language Models Practical exercises applying Word2vec and LLMs (DNABERT2 and Nucleotide Transformer) to extract features from genomic sequences Evaluation and comparison of the features obtained from traditional methods and embedding techniques

09:00-09:45	Lecture: Introduction to Data Visualization Welcome and introductions Overview of most common types of data visualisation in omics papers What makes for a good or bad data visualisation Interactive quiz – participants will be given the opportunity to apply knowledge gained and suggest how they’d improve various data visualisations Emily Johnson
09:45-10:30	Lecture: introduction to ggplot2 Introduction to the tidyverse and why ggplot2 is ideal for reproducible visualisation Overview of the “grammar of graphics” and ggplot2 syntax Examples of advanced visualisations produced with ggplot2 Live coding demo to introduce core concepts Lauren Mee
10:30-10:45	10:30 - 10:45: Coffee Break
10:45-12:00	Hands-on: plotting omics data with ggplot2 A follow along tutorial covering volcano plots and box plots to explore a differential expression analysis. For each type of visualisation, we will start from the most basic way to produce the plot and iteratively build up the plot to create an intuition for ggplot2 syntax. Themes and colour palettes will also be introduced during this session. The differential expression analysis results will be pre-processed.
12:00-13:00	12:00 - 13:00: Lunch Break
13:00-13:45	Hands-on: creating heatmaps with ggplot2 and ComplexHeatMap A follow along tutorial that builds on from the morning practical. Participants will first create heatmaps using ggplot2, then they will be introduced to a more advanced heatmap-specific package: ComplexHeatMap. The merits of both approaches will be compared. The participants will also be introduced to the idea of clustering their data for visualisation purposes.
13:45-14:15	Lecture: networks and how to interpret them Introduction to networks in biological data visualisation and where they’re appropriate Understanding network-specific terminology Examples of networks (e.g., protein-protein interactions, co-expression networks) Brief overview of inputs for and challenges of network visualisation Euan McDonnell
14:15-15:30	Hands-on: creating network visualisations with igraph and ggraph Hands-on coding session where attendees will use igraph and ggraph to create network diagrams. Attendees will learn about different network layouts, how to identify and emphasise certain network structures and how to use igraph outputs as an input for ggraph (a package which is built on ggplot2 and compatible with tidyverse workflows) – for high quality and high-resolution network visualisations.
15:30-15:45	Coffee break
15:45-16:15	Lecture: Contextualising Results with Functional Enrichment What is functional enrichment and how does it help us make sense of our results? Common databases for enrichment analysis (e.g., GO, KEGG) Differences between Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA), including best practices for both (use of appropriate background for ORA and choice of ranking metrics for GSEA) Visualising enrichment results: dot plots, ridge plots, and network diagrams. Emily Johnson
16:15-17:30	Hands-on: ORA and GSEA practical Hands-on coding session using the ClusterProfiler package to perform ORA and GSEA on the results of a differential expression analysis. Participants will produce dot plots and ridge plots and compare the outputs of the two approaches. We will cover best practices for ORA and GSEA. There will be additional supplementary content including how to carry out functional enrichment for non-model organisms and advanced analysis such as Gene set variation analysis (GSVA).
17:30-18:00	Wrap-Up and Discussion Review of key concepts covered in the tutorial. Q&A session and discussion on applying concepts to attendees’ own research projects.

09:00-09:30	Introduction to computational inference of intercellular and intracellular cell-cell communication Overview of cell-cell communication inference, assumptions and challenges, a priori biological knowledge, current approaches for intercellular and intracellular signaling Di Camillo, Baruzzo
09:30-10:30	Hands-on analysis of Myocardial Infarction scRNA-seq Presentation of case study: human myocardial infarction, tutorial on cell-cell communication analysis from single cell transcriptomics data Cesaro (Helpers: Baldan Matteo and Tussardi Gaia)
10:30-10:45	Coffee Break
10:45-11:15	Introduction to comparative analysis of cell-cell communication and spatial analysis Overview of cell-cell communication comparative analysis and computational approaches, spatial transcriptomics technologies and platforms: pro and cons Costa
11:15-11:45	Hands on analysis of Myocardial Infarction ST and scRNA-seq (part 1) Tutorial on comparative analysis from single-cell transcriptomics and spatial transcriptomics data using myocardial infarction case study Nagai, Ruiz (Helpers: Thiago Maié and Kai Peng)
11:45-12:00	Coffee Break
12:00-12:45	Hands on analysis of Myocardial Infarction ST and scRNA-seq (part 2) Tutorial on comparative analysis from single-cell transcriptomics and spatial transcriptomics data using myocardial infarction case study Nagai, Ruiz (Helpers: Thiago Maié and Kai Peng)
12:45-13:00	Q&A Conclusion

14:00-14:45	Translational Bioinformatics Frameworks and AI Solutions for Multiomics Research Next-generation sequencing and multiomics data (bulk and single-cell) capturing molecular changes from genomics all the way to phenomics, have become an integral part of research in all domains including biomedical sciences, plants sciences, and others. This rapid revolution in the multiomics has posed a growing need for translational tools that can handle large amounts of data, are easily expandable, provide interpretable results and can be readily applied to any species. To address such translational needs, we have developed Soybean Knowledge Base (SoyKB) and Knowledge Base Commons (KBCommons) web-based frameworks, both fully equipped to handle the entire multiomics landscape for all organisms. SoyHUB within SoyKB provides access to our developed suite of tools such as AccuTool, Allele Catalog, GenVarX, MADis and others, specifically designed to provide the plant community with efficient data driven solutions for better breeding strategies. Additionally, our G2PDeep, deep learning method, provides a comprehensive web-based resource for phenotype predictions using multiomics data for all organisms. Trupti Joshi
14:45-15:40	Diversity Panel Creation and Resequencing Data Curation Using SnakyVC Pipeline, Allele Catalog Pipeline, and Allele Catalog Tool (hands-on) With the growing availability of large-scale genomic datasets, in silico identification of causal genes and crop improvement have become more achievable. However, extracting meaningful insights from these datasets often requires extensive data processing using various computational tools, which can be time-intensive due to sequential tool transitions. To address this challenge, we developed SnakyVC, a scalable variant calling pipeline to process large-scale whole-genome resequencing (WGRS) data, and an Allele Catalog Pipeline to annotate SNPs and Indels and generate Allele Catalog datasets. These high-efficiency and parallelizable pipelines can be deployed on standalone servers or high-performance computing clusters to significantly reduce computational time. To expand the benefits of the Allele Catalog datasets, the web-based Allele Catalog Tool has been developed and integrated into the SoyKB and KBCommons web platforms. This tool enables researchers to query and visualize alleles within genes, functional annotations, accession metadata, and phenotype distributions. Currently, this tool supports a wide range of organisms including soybean, Arabidopsis , poplar, rice, sorghum, and maize. With its extensive functionalities and new organism integration capabilities, the Allele Catalog Tool facilitates the discovery of novel alleles and the selection of plant accessions for improved breeding strategies and agricultural traits. Together, these tools empower researchers to efficiently process genomic data and enhance crop improvement efforts. Yen On Chan
15:40-16:00	Coffee break
16:00-17:45	Utilization of the SoyHUB tools in Post-GWAS analyses (hands-on) This hands-on tutorial will cover approaches and strategies in applied genomics that have been developed for Soybean and are available for adoption for other crops. Using SnakyVC, we created a panel of diverse resequenced soybean accessions. To explore the genetic diversity of Soy2939, we demonstrate the use of a gene-centric toolbox for post-GWAS analysis, SoyHUB. Specifically, we will present allele mining in protein-coding regions (Soybean Allele Catalog), gene regulatory regions (GeneVarEx), and intergenic regions (SNPViz v2.0). Additionally, we will introduce a post-GWAS strategy, the Synthetic Phenotype to Causative Mutation (SP2CM) approach, which increases the probability of identifying causative mutations. We will also demonstrate how to use AccuTool to calculate accuracy as a measure of the direct correspondence between phenotype and variant position. In this tutorial, we will perform demonstration analyses using each tool within the SoyHUB platform and discuss the results in the context of relevant biological insights. Together, the strategy and tools provide a powerful, bioinformatics-driven layer in the pre-breeding phase, that ultimately improves and accelerates crop breeding. Maria Skrabisova
17:45-18:30	MADis: Genomic Analysis Tool for the Revelation of Multiple Alleles within a Single Gene (hands-on) Understanding crop diversification through evolution and domestication is crucial for crop breeding. Genome-wide association study (GWAS) has emerged as a powerful tool for mapping genomic loci linked to important traits. However, GWAS often struggles to resolve complex genetic architectures, here we focus on a frequent situation when multiple independent causative alleles exist within a single gene. To address this GWAS limitation, we developed the MADis (Multiple Allele Discovery) tool, an innovative tool that computes a score for a combination of variant positions in a single candidate gene and, based on the highest score, identifies the best number and combination of CMs. In this hands-on tutorial, we will introduce the MADis tool and cover its functionalities and utilization. Participants will explore how MADis overcomes the limitations of traditional GWAS to accelerate precision breeding and enhance the understanding of complex genetic traits. Participants will also learn how to integrate MADis into their research workflow. The tool is available as a Python package on GitHub and as a web-based Soybean MADis tool specifically designed for a curated panel of 1066 soybean resequenced accessions. Jana Biova

09:00-10:00	Lecture: Introduction to Data visualization: Importance and Basic principles of data visualization in scientific research Jean-Christophe Nebel
10:00-10:45	Hands-on: Python Libraries for Visualization: Matplotlib, Seaborn, Plotly and others Farzana Rahman, Ragothaman Yennamalli, Shashank Ravichandran, and Megha Hegde
10:45-11:00	10:45AM - Coffee/Tea Break
11:00-12:00	Lecture: Colour theory in Visualization: Colour palettes, Accessible and Inclusive visualizations Ragothaman Yennamalli
12:00-13:00	Hands-on: Creating various types of charts, plots for clarity and aesthetics Case studies with real world datasets Farzana Rahman, Ragothaman Yennamalli, Shashank Ravichandran, and Megha Hegde
13:00-14:00	Lunch Break
14:00-15:00	Lecture: Fundamentals of Machine Learning: Types of ML, Data preprocessing and feature selection, model selection and training Ragothaman Yennamalli and Farzana Rahman
15:00-16:00	Hands on: Python libraries for Machine Learning: Scikit-learn, Pandas, NumPy, TensorFlow/Keras Building models using real-world biological data Shashank Ravichandran, and Megha Hegde
16:00-16:15	Coffee/Tea Break
16:15-17:15	Hands on: Integrating Data Viz and ML: Yellowbrick, Bokeh, Tensorboard, Scikit-plot, etc. Farzana Rahman and Megha Hegde
17:15-18:00	Question and Answer session Identify and highlight blocks of hands-on content in your submission

14:00-15:00	Introduction & Background ● What is digital accessibility? ● What are accessibility techniques? ● Why is digital accessibility important in computational biology? ● What is the state of digital accessibility in computational biology? ● What can we do to improve it?
15:00-15:30	Hands-on Session 1: Writing alt-text for biological data visualization ● Learn guidelines for writing alt text for biological data visualizations ● Practice writing alt text for biological data visualizations
15:30-15:45	Coffee Break
15:45-16:15	Hands-on Session 2: Keyboard accessibility of biological resources ● Learn the importance of keyboard accessibility in biological websites ● Use the keyboard input only to navigate biological data resources
16:15-16:45	Hands-on Session 3: Designing accessible biological visualization ● Learn accessibility guidelines for biological data visualizations ● Practice evaluating the accessibility of biological data visualizations
16:45-17:00	Coffee Break
17:00-17:30	Hands-on Session 4: Making computational notebooks accessible ● Learn the importance of the accessibility of computational notebooks ● Learn accessibility guidelines for computational notebooks ● Practice implementing the accessibility guidelines in computational notebooks
17:30-18:00	Open problems and innovations ● Making interactive biomedical data visualization accessible ● Complex data discovery tasks in data portals ● The INSCIDAR project (https://inscidar.org/) ● Useful resources

14:00-14:10	Title: Welcome and Introduction to perturbation modelling for single-cell technologies Short presentation (importance of perturbation modelling in single-cell biology, brief introduction to available tools, perturbation single-cell data, overview of workshop’s agenda) Speaker: George Gavriilidis
14:10-15:30	Title: scGEN: a landmark generative model for unseen perturbations Short presentation (Using autoencoders for manifold learning manifold learning, extrapolation to unseen events in single-cell perturbation data, scGEN architecture, possible real-world scenarios for deployment ) Hands-on practical: (an enhanced version of https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/scgen_perturbation_prediction.html will be designed focusing on single-cell data pre-processing, designating training and testing sub-datasets, model hyper-parameter tuning (epochs, batch size), running scGEN, interpreting model outputs, using dimensionality reduction to evaluate perturbational extrapolations, implementing basic metric R2 to monitor the accuracy of perturbation prediction for differentially expressed genes and highly variable genes) Speaker: Konstantinos I. Giatras Trainers: Gobikrishnan Subramaniam, Konstantinos I. Giatras
15:30-15:45	Coffee Break
15:45-16:45	Title: scPRAM: a newer perturbation generative model based on attention mechanism and causal counterfactuals Short presentation (Explore optimal transport principles in single-cell perturbation modelling, how scPRAM maps perturbations probabilistically between control and stimulated conditions, and its key components, including transport cost functions, regularisation, and interpretable metrics.) Hands-on practical (an enhanced version of https://github.com/jiang-q19/scPRAM/blob/main/Tutorial/PBMC_cross_celltype_predict.ipynb will be designed to focus on preprocessing single-cell datasets, configuring scPRAM with key hyperparameters(noise robustness, learning rate, batch size, and optimal transport settings), evaluating model performance using R2, Wasserstein distance and dimensionality reduction and showcasing how attention prioritises the most relevant cells or features for predicting perturbation responses. Speaker: Sabrina Jagot Trainers: Sabrina Jagot, George Gavriilidis
16:45-17:00	11:45-12:00 AM Coffee Break
17:00-17:50	Title: Decentralised benchmarking of generative perturbation models Short presentation (why benchmarking is important, selecting metrics beyond typical linear correlations, workflows and decentralised deployment) Hands-on practical (scGEN vs scPRAM; based on code from the ongoing Perturb-Bench effort from EU BH 2024 to systematically benchmark scGEN vs scPRAM against more metrics like E-distance, Maximum mean discrepancy in challenging single-cell scenarios like extrapolation to unseen patient perturbation responses and perturbation predictions across-species) Speaker: Alejandro Madrid Trainers: Alejandro Madrid, Konstantinos I. Giatras
17:50-18:00	Title: Wrap up and discussion Short presentation (Short recap of key take away messages) Open discussion Speaker: All

14:00-14:45	Introduction of biomarkers Definition of biomarkers, types of biomarkers, central dogma, molecular biomarkers, technologies for molecular biomarker detection, key steps in molecular biomarker discovery and related analytic methodologies involved.
14:45-15:30	Identification of driver mutations Discuss publicly available data sources, data types, use lung cancer TCGA DNA-seq and clinical data as examples to illustrate how to identify driver gene mutations in tumor samples. Hands on examples with R code for analysis will be demonstrated.
15:30-15:45	Break
15:45-16:45	Detection of differential expression Statistical methods for analyzing differential expression in bulk RNA-seq and scRNA-seq data, complemented by visualization techniques such as t-SNE, UMAP, and heatmaps. Hands-on examples will be provided, including R code demonstrations for data analysis.
16:45-17:00	Break
17:00-18:00	Machine learning methods for phenotype prediction using biomarkers Further discussing visualization of gene expression data with volcano plots, heatmaps, PCA plots, t-SNE, and UMAP etc., oncoplot and mutplot for gene mutation, logistic regression and survival analysis to predict biomarker association with patient progress and survival outcomes, then discuss some classification methods such as KNN, SVM, and random forest etc., for patient subtype analysis based on genomics data. Hands on examples with R code for predictive modeling will be demonstrated.

09:00-09:30	Introduction to Snakemake and Workflow Principles
09:30-10:45	Domain specific workflow examples Bioinformatics workflow (Hands-on) Data Science/Machine Learning workflow (Hands-on)
10:45-11:00	Coffee break
11:00-12:15	Advanced Snakemake Features Wildcards and config files HPC Environments Parameterised simulation workflow (Hands-on)
12:15-12:45	Practical considerations, Best Practices and Q&A Identify and highlight blocks of hands-on content Bioinformatics workflow (30-40 mins) Live coding of an RNA-seq data processing pipeline Participants create and modify Snakemake rules Explore rule dependencies and workflow structure Data Science/Machine Learning workflow (30-40 mins) Build an end-to-end ML pipeline Demonstrate data preprocessing, model training and evaluation Hands-on exercise in creating dynamic workflow rules Parameterised simulation workflow (30-40 mins) Create a parameterised simulation workflow Implement parameter sweeps Explore advanced features like wildcards and config files

09:00-09:15	Introduction to Biomedical Natural Language Processing This introduction will cover the applications of biomedical text mining and outlining the plan for the day.
09:15-09:30	Lecture: Getting started in NLP This talk will introduce the core concepts of getting computers to work with text, getting access to appropriate biomedical text and annotating it for building and evaluating text mining systems.
09:30-10:00	Hands-on: Getting started in NLP This session will have attendees try out some introductory NLP tasks in Python and get access to biomedical text from PubMed through its API
10:00-10:15	Lecture: Identifying mentions of biomedical concepts using named entity recognition (NER) This talk will go over the NER task and some of the different approaches. It will focus on transformer-based methods.
10:15-10:45	Hands-on: NER with Spacy and HuggingFace This session will have attendees work through a Jupyter notebook to train an NER system given a provided set of annotations. They will get to work with transformer-based models using the HuggingFace library.
10:45-11:00	Coffee Break
11:00-11:15	Lecture: Extracting relations from biomedical text This talk will outline the importance of extracting meaningful relations between entities and going beyond co-occurrences.
11:15-12:00	Hands-On: Relation Extraction with Co-occurrences and HuggingFace This session will have attendees work through a notebook using data that has entities already extracted to extract associations.
12:00-12:15	Lecture: Using Large Language Models for Biomedical Text Mining This talk will focus on the strengths and weaknesses of large language models for the tasks discussed in this tutorial. It will give a brief background on how they work and the common pitfalls when used for information extraction
12:15-12:45	Hands-On: LLMs for Entity Extraction and Relation Extraction This session will have attendees work with a small LLM and apply it to pre-prepared data for several extraction tasks.
12:45-13:00	Lecture: LLMs and the future of information extraction The final talk will lead a discussion on the benefits of LLMs for information extraction, but also what challenges remain.

Tutorials

Sponsors

Partners

Tutorial IP1: Machine Learning for Omics: Best practices and Real-Life Insights with TidyModels SOLD OUT

Tutorial IP2: Massively parallel reporter assays in functional regulatory genomics and as part of the IGVF data resource

Tutorial IP3: Genomic Variant Interpretation & prioritisation for clinical research

Tutorial IP4: Quantum Machine Learning for multi-omics analysis

Tutorial IP5: Introduction to Causal Analysis using Mendelian Randomisation

Tutorial IP6: Hello Nextflow: Getting started with workflows for bioinformatics

Tutorial IP7: AI large cellular models and in-silico perturbation SOLD OUT

Tutorial IP8: Representation Learning and Feature Engineering for Genomic Sequences Analysis SOLD OUT

Tutorial VT1: Visualising and interpreting your -omics results using ggplot2 and R

Tutorial VT2: OmicsViz: Interactive Visualization and ML for Omics Data

Tutorial VT3: Computational approaches for deciphering cell-cell communication from single-cell transcriptomics and spatial transcriptomics data SOLD OUT

Tutorial VT4: An applied genomics approach to crop breeding: A suite of tools for exploring natural and artificial diversity

Tutorial VT5: Comprehensive Bioinformatics and Statistical Approaches for High-Throughput Sequencing Data Analysis, Including scRNA-seq, in Biomarker Discovery

Tutorial VT6: Beyond Bioinformatics: Snakemake for Versatile Computational Workflows

Tutorial VT7: Assessing and Enhancing Digital Accessibility of Biological Data and Visualizations

Tutorial VT8: Generative AI for Single-Cell Perturbation Modeling: Theoretical and practical considerations SOLD OUT

Tutorial VT9: Biomedical text mining for knowledge extraction

ISCB On the Web