Tutorials - Africa 2025

In-person Tutorials (All times SAST)

Tutorial IP1: Genotype Imputation and Data Analysis for African Populations: A Practical Tutorial Using AfriGen-D Resources
Tutorial IP2: Simulation-Based Inference for Computational Biology: Integrating AI, Bayesian Modeling, and HPC
Tutorial IP4: Introductions to constraint-based modeling using cobrapy
Tutorial IP5: Building agentic workflows for bioinformatics

Virtual Tutorials (All times SAST)

Tutorial VT1: Multiomics Data Integration using Graph Based Machine Learning
Tutorial VT2: Machine Learning Models for Drug Response Prediction

Tutorial IP1: Genotype Imputation and Data Analysis for African Populations: A Practical Tutorial Using AfriGen-D Resources

Room: Atlantic 1
Date: April 17, 2025
Time: 13:00-17:00

Organizers
Mamana Mbiyavanga, University of Cape Town
Lyndon Zass, University of Cape Town
Sumir Panji, University of Cape Town
Nicola Mulder, University of Cape Town

Max Participants: 30

Description
The African Genomics Data Hub (AfriGen-D) provides essential resources for analyzing African genetic data, addressing unique challenges posed by the continent's exceptional genetic diversity. This hands-on tutorial focuses on genotype imputation and downstream analysis using AfriGen-D resources.

Through practical exercises, participants will master data quality control specific to African genetic data, execute imputation and basic GWAS workflows, and learn to interpret results using the AfriGen-D Imputation Service, African Genomics Medicine Portal (AGMP), and African Genomics Variation Database (AGVD).

Enhance your African genomics research capabilities with this practical tutorial. Using AfriGen-D resources, learn to prepare data, perform genotype imputation, conduct basic GWAS analysis, and interpret results with tools optimized for African genetic diversity.

Learning Objectives

Navigate AfriGen-D catalogues for data discovery
Master data preparation and quality control using the AfriGen-D Imputation Service
Execute and monitor imputation workflows using the AfriGen-D Imputation Service
Perform post-imputation quality assessment
Perform basic GWAS analysis using the AfriGen-D Imputation Service
Anotate and interpret genetic variants using African-specific resources (AGMP and AGVD)

Materials

Personal laptop with internet connection
Basic command-line knowledge
Familiarity with genetic data formats

Schedule

13:00-13:30	Introduction to AfriGen-D resources and data discovery
13:30-14:30	Data preparation and quality control (hands-on)
14:30-14:45	Break
14:45-15:45	Imputation workflow and monitoring (hands-on)
15:45-16:30	Post-imputation quality assessment and basic GWAS
16:30-17:00	Variant annotation and interpretation using AGMP/AGVD

- top -

Tutorial IP2: Simulation-Based Inference for Computational Biology: Integrating AI, Bayesian Modeling, and HPC

Room: Pacific 2
Date: April 17, 2025
Time: 09:00-13:00

Organizers
Alina Bazarova, Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany
Jose Ignacio Robledo, Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany
Stefan Kesselheim, Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany

Max Participants: 25

Description
This tutorial introduces Simulation-Based Inference (SBI), a framework combining Bayesian modeling, AI techniques, and high-performance computing (HPC) to address key challenges in computational biology, such as performing reliable inference with limited data by using AI-based approximate Bayesian computation. Moreover, it tackles the problem of intractable likelihood functions, thereby allowing to utilize Bayesian inference for biological systems with multiple sources of stochasticity. The tutorial also demonstrates how to leverage HPC environments to drastically reduce inference runtimes, making it highly relevant for large-scale biological problems. This tutorial bridges theoretical foundations with hands-on applications in computational biology. Participants will learn to implement SBI frameworks using diverse biological models, such as molecular dynamics simulations, agent-based tumor growth models, count data modeling, and Lotka-Volterra systems. Practical exercises in Jupyter notebooks guide attendees through SBI workflows, from simple coin-flipping examples to more complex biological simulations, ensuring accessibility for participants with varied backgrounds. The tutorial’s inclusion of cutting-edge methods like Sequential Neural Posterior Estimation and its emphasis on parallelization and HPC scalability align closely with the scientific community's focus on innovation in computational biology. A previous iteration of the tutorial at the Helmholtz AI Conference 2024 received excellent reviews and led to interdisciplinary discussions, highlighting its broad applicability and impact. For this conference, the content has been further refined with additional examples relevant to the community, ensuring it meets the needs of bioinformatics researchers.

Learning Objectives

Understand the Principles of Simulation-Based Inference (SBI): learn the theoretical foundations of SBI, including its relationship with Bayesian inference and its advantages in handling complex biological systems.
Explore SBI Methods (SNPE, SNLE, and SNRE): gain an understanding of Sequential Neural Posterior Estimation (SNPE), Sequential Neural Likelihood Estimation (SNLE), and Sequential Neural Ratio Estimation (SNRE) and their applications in computational biology.
Learn how to design and implement SBI frameworks for representative biological scenarios, such as molecular dynamics, cell growth, count data modeling, and Lotka-Volterra systems.
Leverage HPC for SBI Workflows: understand how to use high-performance computing (HPC) environments to scale SBI workflows and efficiently distribute computational workloads.

Intended Audience and Level
This tutorial is designed for researchers working in computational biology and bioinformatics, modeling natural processes, and applying AI or Bayesian inference techniques. It is well-suited for:

Researchers seeking to infer model parameters from sparse or simulated data.
Scientists interested in uncertainty quantification and critical assessment of model fits using Bayesian techniques.
Researchers experienced in Bayesian statistics looking to address intractable likelihoods or optimize inference workflows using AI-driven methods.
AI researchers interested in advanced applications of Deep Learning architectures, such as normalizing flows and likelihood ratio estimation.
Users of HPC systems or those interested in leveraging HPC for scaling simulations and training distributed AI models.

The tutorial is intermediate in content level. While no in-depth knowledge of statistical or Deep Learning methods is required, participants should have basic familiarity with these concepts and have experience in using Python. Experience with HPC systems is beneficial but not mandatory. Attendees are required to have a laptop for accessing the HPC system. Individual access accounts will be provided prior to or at the tutorial.

- top -

Tutorial IP4: Introductions to constraint-based modeling using cobrapy

Room: Pacific 2
Date: April 17, 2025
Time: 13:00-17:00

Organizers
Alia Benkahla, Institut Pasteur de Tunis
Feryel Guennich, Institut Pasteur de Tunis
Oussema Souiai, Institut Pasteur de Tunis
Emna Harigua-Souiai, Institut Pasteur de Tunis

Max Participants: 25

Description
COBRApy is a user-friendly open source Python package that makes learning this modeling accessible and convenient. A hands-on workshop, which included exercises and problem-solving, would introduce participants to this technique. By bringing together researchers interested in the development of this type of modeling, this type of workshop would not only teach a valuable skill, but also encourage the development of new collaborations. The practical skills participants acquire can immediately be applied to their research, deepening knowledge and accelerating discoveries.

Intended Audience and Level
Researchers and students with basic Python knowledge interested in applying constraint-based modeling to biological systems.

Materials

Slides with key concepts and code examples.
Google Colab notebooks with hands-on exercises.
Pre-prepared environment with COBRApy and necessary data.

Schedule

13:00-13:15	Introduction to Constraint-Based Modeling Briefly explain the core concepts of CBM: stoichiometry, constraints, objective functions, flux balance analysis (FBA) Briefly introduce COBRApy as a powerful and user-friendly Python package for CBM.
13:15-14:00	Introduction to CBM using COBRApy and Working with Models Introduction o Google Colab. Installing COBRApy and its dependencies. Create a model and understanding basic cobra objects (reactions, metabolites, genes): Importing and exploring existing metabolic models (e.g., E. coli core model). Understanding the structure of a COBRApy model object: reactions, metabolites, genes. Basic model manipulation: adding/removing reactions and metabolites.
14:00-15:00	Performing Flux Balance Analysis (FBA) Setting up an FBA problem: defining the objective function (e.g., biomass production). Genome-scale modelling. Studying the model: Inspecting the model's numbers Inspecting the systems' boundaries Running a Flux Balance Analysis (FBA). Hands-on exercises.
15:00-15:30	In-silico gene knockouts Single knockout study. Systems-wide knockout study.
15:30-16:00	Working with Experimental Data and Model Integration (30 mins) Discuss how to integrate experimental data (e.g., transcriptomics, metabolomics) with CBM models. Example of integrating gene expression data to constrain model fluxes.
16:00-16:15	Wrap-up Open discussion for questions and troubleshooting. Summary of key concepts and resources for further learning. Potential future directions and advanced topics in CBM.

- top -

Tutorial IP5: Building agentic workflows for bioinformatics.

Room: Pacific 1
Date: April 17, 2025
Time: 09:00-17:00

Organizers
Dionizije Fa, Entropic j.d.o.o.
Mateo Čupić, Entropic j.d.o.o.
Bruno Pandža, Entropic j.d.o.o.

Max Participants: 25

Description
Agentic workflow is a process of interacting with Large Language Models (LLMs) to complete complex tasks - allowing practitioners to build pipelines that integrate data retrieval, reasoning, and execution steps. This tutorial will guide participants through the conceptual and practical foundations of setting up their own agentic workflows. By combining prompt engineering techniques, retrieval-augmented generation tool use and deployment strategies that safeguard data privacy, tutorial participants will learn how to build, deploy and tune their own personal copilots for use in bioinformatics workflows.

The capabilities of agentic workflows—driven by improving LLMs —are rapidly expanding, while cloud offerings are making these advanced computational tools more accessible than ever before. By integrating agentic workflows into bioinformatics pipelines, practitioners can significantly reduce their time-to-analysis. Lowering the barrier to entry for novices and allowing expert practitioners to scale their work with greater efficiency, these workflows democratize cutting-edge computational methods and ensure that the tutorial participants can capitalize and leverage the latest advances in their work and careers in general. This tutorial will integrate state-of-the-art prompting techniques, retrieval augmentation strategies, add context to model selection and explore the fundamentals between choosing amongst the different techniques and current trends.

Learning Objectives

Develop a theoretical and practical understanding and experience of how to integrate and automate bioinformatics analyses using LLM agents
Gain hands-on experience building agentic workflows for bioinformatics
Learn current trends in agentic workflows, fundamentals and differences of LLM models, best practices in writing prompts and deploying local LLM agents
Understand how to extend and customize existing tools to fit specific research domains or specialized datasets

Intended Audience and Level
The tutorial is aimed at bioinformaticians who are beginners in AI and LLMs. However, it is strongly recommended to have programming experience in Python as well as using command line tools and bioinformatics software.

Attendees should have:

Working knowledge of Python, package management and the command line (Unix)
Familiarity with standard bioinformatics data formats and tools (e.g., FASTA, FASTQ)
PC with a Python environment set up prior to the tutorial (exact requirements to be defined later)
Comfortable understanding of basic bioinformatics concepts and common analysis pipelines

Participants will be provided with installation instructions in advance to ensure a smoother experience.

Materials

Slides and Documentation: Detailed slides summarizing key concepts will be shared.
Code Examples and Repositories: A public code repository containing example workflows, scripts, and configuration files will enable participants to continue experimenting independently.

Download tutorial materials

Schedule

	Introduction & Overview Overview of LLMs and prompt engineering concepts Introduction to agentic workflows Use-cases and examples in bioinformatics to motivate learning
	Foundational Concepts and Setup Environment setup Setting up the software packages and outline of the pipeline Q&A for clarification of key concepts
	Building a Simple Agentic Workflow Step-by-step construction of a basic workflow: prompting, and refining responses Demonstration of an example pipeline using a provided dataset (e.g., sequence) Discussion on integrating workflows into existing bioinformatics infrastructures
	Advanced Techniques and Troubleshooting Refinement: improving prompt quality, adding more complex tools Handling complex multi-step analyses Troubleshooting common errors and optimizing workflows for performance
	Wrap-up and Future Directions Recap of key takeaways and practical resources Future trends in agentic workflows and integrating emerging tools Open discussion, participant feedback, and next steps for continued learning

- top -

Tutorial VT1: Multiomics Data Integration using Graph Based Machine Learning

Date: April 10, 2025
Time: 12:00-16:00

Organizers
Loni Taylor, PMP, CETL, PhD Candidate, Meharry Medical College, Nashville, TN, USA.
Bishnu Sarker, PhD, Assistant Professor of Computer Science and Data Science, Meharry Medical College, Nashville, TN, USA.
Animesh Acharjee, PhD, Assistant Professor, University of Birmingham, UK.

Max Participants: 40

Description
This tutorial introduces participants to the integration of multiomics data from genomics, proteomics, transcriptomics, and metabolomics, focusing on computational approaches to uncover hidden relationships between biological entities. The session will cover techniques such as Non-negative Matrix Factorization (NMF), machine learning, and Graph Neural Networks (GNNs) to model multi-layered biological interactions and predict biological outcomes such as disease classification, drug responses, and biomarker discovery. Attendees will gain hands-on experience in processing and analyzing real-world multiomics datasets using open-source tools such as Python, pandas, scikit-learn.

Learning Objectives
By the end of this tutorial, attendees will be able to:

Understand key computational approaches for integrating and analyzing multiomics data, including NMF, machine learning, and GNNs.
Apply open-source tools to implement predictive models for disease classification, biomarker identification, and drug response analysis.
Evaluate model performance and interpret results to derive meaningful biological insights.

Intended Audience and Level
This tutorial is designed for intermediate to advanced learners with an interest in multiomics data integration and its applications in machine learning. It is ideal for bioinformaticians, computational biologists, data scientists, and machine learning practitioners who want to expand their knowledge of graph-based methods in multiomics analysis.
The tutorial is targeted at professionals, researchers, and graduate students with at least a basic understanding of:

Omics Data (Genomics, Proteomics, Metabolomics)
Machine Learning, especially Neural Networks
Programming (Python, familiarity with ML libraries like PyTorch or TensorFlow)

- top -

Tutorial VT2: Machine Learning Models for Drug Response Prediction

Room: TBD
Date: April 11, 2025
Time: 14:00-16:00

Organizers
Dennis Wang, A*STAR Bioinformatics Institute, A*STAR Institute for Human Development and Potential, National Hear
Yurui Chen, Institute for Human Development and Potential (IHDP), Agency for Science, Technology and Research (A*STAR), Department of Mathematics, National University of Singapore, Singapore, Republic of Singapore
Dr. Evelyn Lau, Institute for Human Development and Potential (IHDP), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
Dr. Juan Jose Giraldo Gutierrez, National Heart and Lung Institute, Imperial College London, London, Department of Computer Science, The University of Sheffield

Max Participants: 50

Description
This tutorial provides a comprehensive overview of machine learning techniques applied to drug response prediction on cancer cell lines, with a focus using Graph Neural Networks (GNNs) and Gaussian processes (GPs). Participants will gain both theoretical knowledge and practical experience through interactive lectures and hands-on demonstrations.

Learning Objectives

Understand Machine Learning Applications in Drug Development: Learn how machine learning models predict drug responses and facilitate drug development.
Explain Graph Neural Networks (GNNs): Grasp the fundamentals of GNNs and their specific applications in biomedical data analysis.
Develop and Evaluate GNN Models for Drug Prediction: Acquire skills in building and assessing GNN models for drug response prediction using tools like PyTorch and torch_geometric.
Explain Probabilistic Models for Drug Prediction: Grasp the importance of probabilistic models to quantify uncertainty when predicting drug response curves.
Building a Probabilistic Model based on Gaussian Processes (GPs) for Drug Prediction: Gain abilities to apply Gaussian process models for predicting dose responses.

Intended Audience and Level
This tutorial is designed for bioinformatics researchers, data scientists, and professionals in computational biology with a basic understanding of machine learning concepts. Prior experience with Python programming will be beneficial but not mandatory.

Materials
Participants will receive access to:

Presentation slides.
A take-home Jupyter Notebook (Google Colab) with:
- A step-by-step tutorial on building a basic GNN model for drug response prediction. Guidance on understanding and preparing biomedical data for GNNs.
- A colab tutorial on GP model that predicts dose-response curves.
- Insights into model training, evaluation, and interpretation.
Additional resources for further exploration of the subject.

Schedule

14:00-14:15	Welcome and Overview - Introduction to tutorial objectives and schedule.
14:15-14:45	Session 1: Introduction to Machine Learning for Drug Response and Development - Overview of machine learning applications in drug development. - Key concepts and terminology.
14:45-16:00	Session 2: Graph Neural Networks (GNNs) and Deep Learning for Drug Response Prediction - Introduction to GNNs and their relevance in biomedical research. - Case studies of GNN applications in drug response prediction. - Walkthrough of a pre-prepared GNN model for drug response prediction using PyTorch and torch_geometric. - Discussion on model evaluation techniques. - Materials at Github

- top -