There will be a series of in-person and virtual tutorials prior to the start of the conference. Tutorial registration fees are shown at: https://www.iscb.org/ismb2024/register#tutorials
In-person Tutorials (All times EDT)
- Tutorial IP1: Advanced machine learning methods for modeling, analyzing, and interpreting single-cell omics and spatial transcriptomics data SOLD OUT
- Tutorial IP2: Just-in-time compiled Python for bioinformatics research SOLD OUT
- Tutorial IP3: Multi-omic data integration for microbiome research using scikit-bio
- Tutorial IP4: Quantum-enabled multi-omics analysis
- Tutorial IP5: Modelling Multi-Modal Biomedical Data Using Networks SOLD OUT
- Tutorial IP6: Creating and running cloud-native pipelines with WDL, Dockstore, and Terra
- Tutorial IP7: Federated Ensemble Learning for Biomedical Data
Virtual Tutorials: (All times EDT) Presented through the conference platform
- Tutorial VT1: A Practical Introduction to Large Language Models in Biomedical Data Science Research SOLD OUT
- Tutorial VT2:BioViz: Interactive data visualisation and ML for omics data SOLD OUT
- Tutorial VT3: Using LinkML (Linked data Modeling Language) to model your data
- Tutorial VT4: Computational Approaches for Identifying Context-Specific Transcription Factors using Single-Cell Multi-Omics Datasets SOLD OUT
- Tutorial VT5: Explainability in Graph Deep Learning for Biomedicine SOLD OUT
Tutorial IP1: Advanced machine learning methods for modeling, analyzing, and interpreting single-cell omics and spatial transcriptomics data
SOLD OUT
Room: 518
Date: Friday, July 12, 2024 9:00 – 18:00 EDT
Organizer:
Juexin Wang
Speakers:
Mauminah Raina, (Ph.D. student) Indiana University Indianapolis, United States
Yi Jiang, (Ph.D. student) Ohio State University, United States
Lei Jiang, (Ph.D. student) University of Missouri, United States
Michael Eadon, Indiana University Indianapolis, United States
Juexin Wang, Indiana University Indianapolis, United States
Qin Ma, Ohio State University, United States
Dong Xu, University of Missouri, United States
Max Participants: 50
Website
https://github.com/juexinwang/Tutorial_ISMB2024
Description
Emerging single-cell omics and spatial transcriptomics technologies provide unprecedented opportunities and challenges for molecular biology studies. How to model these vast sequencing data in different modalities, perform computational analyses, and interpret mechanisms by identifying biological and pathological meaningful cell types, regulatory relations, and key markers are central questions in this aera.
Advanced machine learning methods and tools provide a promising approach to address these challenges. scGNN (https://github.com/juexinwang/scGNN) is a graph neural network based framework for clustering and imputing scRNA-seq data by modeling the single cells as a cell graph. Targeting single-cell multi-omics data, DeepMAPS (https://bmblx.bmi.osumc.edu/) introduces a heterogenous graph transformer to infer single-cell biological networks. BSP (https://github.com/juexinwang/BSP) proposes a granularity-based statistical approach to identify spatially variable genes on 2D and 3D spatial transcriptomics.
Our tutorial will cover key advancements in machine learning methods developed on single-cell multi-omics and spatial transcriptomics research over the past few years, emphasizing new opportunities in bioinformatics enabled by such advancements. We will start with a technical talk about the machine learning algorithms of covered approaches, including scGNN, DeepMAPS, and BSP, and from model training to model interpretation (discovery on cell types, regulatory relations, and key markers). We will then demonstrate the impact of machine learning on discovering
Learning Objectives
- To understand the basic principles of deep learning, graph representation learning, and model interpretation.
- To understand the specifics of computational tools such as scGNN, DeepMAPS, and BSP, and become aware of the appropriate tools to use in different applications in single-cell multi-omics and spatial transcriptomics studies.
- To gain hands-on experience in applying tools and interpreting results using standalone python- based software scGNN, R-based BSP, webserver-based DeepMAPS, and integrated AI-ready platform.
Intended Audience and Level
The target audiences are graduate students, researchers, scientists, and practitioners in both academia and industry who are interested in applications of deep learning in bioinformatics (Broad Interest). The tutorial is aimed towards entry-level participants with knowledge of the fundamentals of biology and machine learning (beginner). Basic experience with Python and R programming languages is recommended for the participants.
The tutorial slides and materials for hands-on exercises (e.g., links to demo, code implementation, and datasets) will be posted online prior to the tutorial and made available to all participants.
Schedule
9:00 |
Part 1: Overview: Introduction to single-cell multi-omics and spatial transcriptomics and corresponding challenges.
|
9:45 |
Part 2: Introduction to biological analyzing methods.
|
10:45 | Coffee Break |
11:00 |
Part 3: Clustering-based single-cell analysis and scGNN on AI-ready platform.
|
12:00 |
Part 4: Applications #1: Single-cell RNA-seq dataset acquisition, model training, and analysis.
|
13:00 | Lunch |
14:00 |
Part 5: Network analysis on single-cell multi-omics and DeepMAPS.
|
14:30 |
Part 6: Applications #2: Single-cell multi-omics dataset acquisition, model training, and analysis.
|
16:00 | Coffee Break |
16:15 |
Part 7: Marker analysis on spatial transcriptomics and BSP.
|
16:45 |
Part 8: Applications #3: Spatial transcriptomics dataset acquisition, model fitting, and analysis.
|
Room: 524c
Date: Friday, July 12, 2024 9:00 – 18:00 EDT
Organizer:
Sven Rahmann
Speakers:
Johanna Schmitz, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, Saarbrücken, Germany; Saarbrücken Graduate School of Computer Science
Jens Zentgraf, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, Saarbrücken, Germany; Saarbrücken Graduate School of Computer Science
Sven Rahmann, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Max Participants: 20
Description
Python has a reputation for being a clean and easy-to-learn language, but slow when it comes to execution, and difficult concerning multi-threaded execution. Nonetheless, it is one of the most popular languages in science, including bioinformatics, because for many tasks, efficient libraries exist, and Python acts as a glue language. In this tutorial, we explore how to write efficient multi-threaded applications in Python using the numba just-in-time compiler. In this way, we can use Python’s flexibility and the existing packages to handle high-level functionality (e.g., design the user interface, run machine learning models), and then use compiled Python for additional custom compute-heavy tasks; these parts can even run in parallel.
Over a full tutorial day, we introduce a small (but still interesting and relevant) problem as an example: efficient search for bipartite DNA motifs. We develop an efficient tool that outputs every match in a reference genome in a matter of seconds. Starting with an introduction to the problem and a (slow) pure Python implementation, we learn how to write more jit-compiler-friendly code, transition towards a compiled version and observe speed increases until we obtain C-like speed. We parallelize the tool to make it even faster, and add more options for more flexible searching. Finally, we add a simple but effective GUI, which can increase the potential user-base of such a tool by an order of magnitude.
Learning Objectives
- Understand the difference between interpretation, lazy and eager/early compilation
- Understand the possibilities and limitations of the numba just-in-time compiler
- Explore several examples about when numba can accelerate your code (and when it cannot)
- Understand pre-requisites for compiling a function
- Learn the differences between compileable and non-compileable Python code
- Learn about parallelizing Python in spite of the Global Interpreter Lock (GIL) with compiled functions
- Learn how to scale up a prototype to handle large data
- Get an understanding of DNA motif search
Intended Audience and Level
The tutorial addresses active bioinformatics researchers, from graduate students to principal investigators, who write software tools as part of their research. In particular, we address researchers who are looking for an easier transition from research prototype software to software that scales to large datasets and is usable by a large non-technical user-base. Therefore, our participants should have at least some experience developing bioinformatics research software.
Prior experience with the Python programming language is required, as well as some experience with managing environments with installed software, ideally using (bio)conda / mamba.
Schedule
9:00 | Introduction to the numba just-in-time compiler for Python; small examples, possibilities, limitations, how the compilation works. Last 30 minutes are short hands-on exercises (timing iterated execution of a small function in pure vs. compiled Python). |
10:45 | Coffee break |
11:00 | Introduction to DNA motif search and a “motif description” mini-language, with examples from the literature. Automaton-based pattern search and a bit-parallel algorithm. Hands-on: Implementation in pure Python (45 min, 15-20 lines). |
13:00 | Lunch break |
14:00 | Transforming a Python implementation to a numba-compiled implementation; separation of high-level and low-level code parts; managing memory allocations; introduction of type annotations (1 hour principles, 1 hour supervised coding). |
16:00 | Coffee break |
16:15 | Parallelization: Using threads to parallelize the application (e.g. parallel search across chromosomes); Replacing the command-line interface by a simple but effective GUI using streamlit. Hands-on coding: Splitting the task, collecting and visualizing the results. |
Room: 524a
Date: Friday, July 12, 2024 9:00 – 18:00 EDT
Organizer:
Qiyun Zhu
Speakers:
Qiyun Zhu
James Morton
Daniel McDonald
Matthew Aton
Lars Hunger
Max Participants: 40
Description
Modern microbiome research is marked by the extensive use of high-throughput, multi-omic data derived from complex biological systems, such as amplicons, metagenomes, metatranscriptomes, metaproteomes, and metabolomes, as well as data and metadata of the host or environment. The complexity and richness of data demand robust, scalable, and reproducible integration and analysis methods. Our full-day tutorial offers an essential guide to leveraging the expanded capabilities of scikit-bio, alongside the broader Python data science ecosystem. Scikit-bio is a core library behind the widely used QIIME 2 project, and provides various data structures, metrics and algorithms commonly used in bioinformatics. This tutorial is designed to provide researchers, educators, and developers with an overview of current trends, foundational principles, and analytical strategies in microbiome research. Participants will engage in hands-on exercises on handling data and metadata, analyzing communities and features, as well as correlating and predicting biological traits. This tutorial aims to equip attendees with knowledge and practical skills that are adaptable to various applications in microbiome research and beyond.
Exercises will be delivered through Jupyter Notebooks with clear code and documentation. Tutorial materials, including data, slides, and notebooks, will be hosted in a public GitHub repository under a BSD open-source license.
Learning Objectives
Participants will learn how to use scikit-bio and other common Python libraries to analyze and integrate multiple types of omic data that are usually involved in studies of microbiomes and their roles in the host or natural environment. Specifically, participants will:
- Understand and work with various summarized omic data types.
- Handle sparse, high-dimensional data tables and associated metadata.
- Analyze community composition using ecological, phylogenetic and statistical approaches.
- Identify important microbial or functional features associated with sample properties.
- Construct supervised learning models to predict traits of hosts or environments.
- Develop reusable workflows for microbiome research.
In the end of the full-day tutorial, each participant will complete an analytical workflow based on a demo dataset and can be customized and extended to other datasets.
Intended Audience and Level
This tutorial is for researchers, educators and developers interested in analyzing various types of biological “omic” data, such as metagenomics, metabolomics, and host transcriptomics. Attendees should have basic skills in Python (preferred), or any other programming language (such as R or C/C++). Experience with the Linux command line is not required. Optionally, attendees may benefit from basic knowledge in bioinformatics, biostatistics, and any specific biological research fields, such as microbiology, ecology, molecular biology, and epidemiology.
Each participant should bring their own laptop or tablet (with keyboard). The practices will be conducted using Google Colab or a local Jupyter environment, depending on the participant’s preference
Schedule
9:00 |
Introduction and software setup
Exercise: Setting up the software environment.
|
10:00 |
Working with various omic data types
Exercise: A real-world multi-omic dataset
|
10:45 | Coffee break |
11:00 |
Working with sparse, high-dimensional data tables
Exercise: Working with omic data tables
|
12:00 |
Analyzing microbial community structures
Exercise: Community diversity analyses
|
13:00 | Lunch break |
14:00 |
Inferring and associating critical features
Exercise: Statistical modeling and tests
|
15:00 |
Predicting host and environmental traits
Exercise: Constructing predictive models
|
16:00 | Coffee break |
16:15 |
Developing an analytical protocol for publication
Exercise: Assembling an analytical protocol
|
17:15 |
Debugging, wrapping-up and open questions
Lecture: Looking beyond
|
Room: 522
Date: Friday, July 12, 2024 9:00 – 18:00 EDT
Organizer:
Aritra Bose
Laxmi Parida
Speakers:
Aritra Bose, PhD, Research Scientist, IBM Research, Yorktown, NY
Hakan Doga, PhD, Postdoctoral Researcher, IBM Research, Cleveland, OH
Filippo Utro, PhD, Senior Research Scientist, IBM Research, Yorktown, NY
Laxmi Parida, PhD, ISCB Fellow, IBM Fellow
Max Participants: 50
Description
Single-cell and -omic analyses has provided profound insights on heterogeneity of complex tissues measuring multiple cells together, including a wide array of multi-omics data such as genomics, proteomics, transcriptomics, etc. The single cell analysis is often plagued by many uncertainties such as missingness, developing robust machine learning algorithms for discovering complex features across, finding patterns in spatial structure of single cell transcriptomics or proteomics, and most importantly integrating multi-omics data to create meaningful embeddings for the cells. Machine Learning (ML) techniques have been extensively used in analyzing, predicting, and understanding multi-omics data. For the purposes of this tutorial, we will use the term classical ML to refer to these the potential to overcome a lot of the above limitations of ML in single-cell analysis. This tutorial will be structured into five sessions as follows:
- In the first session we will introduce quantum computing fundamentals such as notations, operations, quantum states, entanglement, quantum gates, and circuits.
- In the second session, we will set up Qiskit, an open-source quantum computing toolkit based on Python and run a demo algorithm.
- In the third session, we will process and analyze single-cell multi-omics data from the Single Cell atlas or TCGA, etc. using classical ML algorithms to create baseline.
- In the fourth session, we will set up the data in Qiskit and run a QML algorithm to classify disease sub types.
- In the fifth and concluding session, we will summarize the tutorial and do an interactive Q&A session with the attendees.
Learning Objectives
Participants in this tutorial will learn a new paradigm of analyzing multi-omics data with hands on experience with a quantum computer. More objectively, the major takeaways of this tutorial would be:
- Understand the basics of quantum computing including hands-on experience on quantum gates and circuits using Qiskit.
- Identify the class of problem: analyzing machine learning methods on multi-omics data for biomarker discovery, disease subtyping, etc.?
- How to process single cell data for quantum-enabled algorithms.
- How to apply QML algorithms on single cell data.
- How to design experiments for healthcare and life sciences data using quantum computers
Intended Audience and Level
This tutorial is aimed at computational biologists, bioinformaticians, clinicians, practitioners, data analysts, including early-career to senior researchers in the fields of healthcare and life sciences enthusiastic to learn about new frontiers of computational biology. There are very few prerequisites for the tutorial, listed as follows:
- Create an IBM Quantum account in IBM Quantum Learning website, click on “Create an IBMid” and follow the instructions.
- Watch the Qiskit Global Summer School videos – QML 2021 or (optional)
- Entry-level knowledge of single-cell data and multi-omics analyses.
Schedule
9:00 | Session I: Quantum Information and Fundamentals |
10:45 | Coffee Break |
11:00 | Session II: Hello Qiskit!: Writing your first program in Qiskit |
12:30 | Session III: Processing multi-omics data with classical ML algorithms |
13:00 | Lunch |
14:00 | Session IV, Part I: Design and implement QML algorithm for single-cell data in Qiskit. |
16:00 | Coffee Break |
16:15 | Session IV, Part II: Analyze QML algorithm and compare with classical ML |
17:00 | Session V: Interactive Q&A session with the participants. |
Room: 521
Date: Friday, July 12, 2024 9:00 – 18:00 EDT
Organizer:
Ian Simpson
Speakers:
Ian Simpson, Professor of Biomedical Informatics, School of Informatics, University of Edinburgh
Barry Ryan, PhD Student, UKRI Centre for Doctoral Training in Biomedical Artificial Intelligence, School of Informatics, University of Edinburgh
Sebesty´en Kamp, PhD Student, UKRI Centre for Doctoral Training in Biomedical Artificial Intelligence, School of Informatics, University of Edinburgh
Max Participants: 30
Description
Network structures allow us to model complex data in an extremely flexible way, enabling a wide range of downstream analytic approaches to help us gain insight into the biological processes and systems we model. The ability of networks to capture myriad features of the primary data and explore high order relationships between them makes them highly suitable to address questions that are not easily answered by classical statistical approaches that typically only look at first-order interactions. Networks have been widely used in the biomedical sciences to study gene and protein expression profiles, protein-protein interactions, metabolic processes, dynamic pathway models, and diseases amongst others. The emergence of multi-modal data in the biomedical setting has gathered pace significantly over recent years whereby several different types of data are measured from the same sample source. Integration of these data is proving incredibly valuable at increasing the breadth and depth of our understanding of the underlying systems by reducing noise, increasing information content, facilitating our handling of missing and/or incomplete data, and crucially, increasing our predictive power beyond that of uni-modal data analysis.
In this comprehensive tutorial we will introduce participants to network analysis from first principles using real-world multi-modal data derived from the Generation Scotland study, a world-leading longitudinal research programme and an excellent use case for biomedical network analysis. Participants will perform hands-on end-to-end network construction and computational analysis using a ground up approach which will give them the skills, experience, and confidence to develop their own network analytic pipelines in the future. We will work in the context of human disease using both molecular and clinical data and introduce introduce analysis approaches for network based tasks including clustering, functional annotation analysis, and classification using graph neural networks.
Learning Objectives
Participants will learn how to analyse biological datasets using networks. They will gain handson
experience with a real-world dataset as an exemplar that can be directly transferred to their
own work in the future. Following the course they will be able to:
- Understand core network concepts and fundamentals
- Construct networks using Python and R
- Develop network models for uni-modal and multi-modal data (e.g. gene expression + DNA methylation)
- Perform functional annotation analysis and community clustering
- Implement simple Graph Neural Network based approached for classification tasks
Intended Audience and Level
Introductory Level.
This tutorial is aimed at an audience who have little prior experience working with and analysing data using networks. They will need at least a basic level of knowledge in Python and R programming. Specifically, participants are expected to be familiar with the Python packages Pandas, Numpy, and Matplotlib and the R packages ggplot2 and dplyr
The workshop will be conducted in both R and Python. We will communicate with participants in advance so that they have installed VisualStudioCode (Python) and RStudio (R) prior to the tutorial but can troubleshoot minor installation issues on the day and provide cloud compute instances of these if needed. All materials and data will be made available open-source through a dedicated GitHub repository. All analyses will be streamlined so that there are no challenging compute requirements for participants, a standard modern laptop will be suitable to take part.
Schedule
9:00 | Welcome & Introduction |
9:10 | ”An Introduction to Networks” |
9:40 | Practical Session 1 |
10:45 | Coffee Break |
11:00 | ”The Do’s and Don’ts of Biomedical Network Construction” |
11:30 | Practical Session 2 |
13:00 | Lunch |
14:00 | ”Common Approaches to the Analysis of Biomedical Networks” |
14:30 | Practical Session 3 |
16:00 | Coffee Break |
16:15 | ”An Introduction to Network Inference Using Graph Neural Networks” |
16:45 | Practical Session 4 |
17:50 | Closing Remarks |
Room: 519
Date: Friday, July 12, 2024 9:00 – 13:00 EDT
Organizer:
David Steinberg
Speakers:
Denis Yuen, Team Lead, Dockstore, Ontario Institute for Cancer Research
David Charles Steinberg, University of Santa Cruz
Leyla Tarhan, PhD, Senior Science Writer, Data Sciences Platform, Broad Institute of MIT and Harvard
Aseel Awdeh, PhD, Computational Biologist, Data Sciences Platform, Broad Institute of MIT and Harvard
Max Participants: 40
Description
With the advent of efficient sequencing technology, the scientific community produces petabytes of data daily. These data are prepared to answer diverse biological questions, each requiring unique sequencing approaches. To combine these disparate datasets and transform them into meaningful insights, researchers are turning to cloud-based approaches that adhere to Findable, Accessible, Interoperable, and Reusable (FAIR) practices. These include cloud-computing environments that allow for efficient resource-sharing and scalability. While the potential of these new resources is thrilling, the migration to cloud computing might feel daunting, as it requires new pipelines that harness the expanse of cloud tools. In this half-day tutorial, we introduce participants to key components that help them create cloud-native pipelines, including portable workflows written in the Workflow Description Language (WDL; pronounced “widdle”), portable packages of software and dependencies known as Docker containers, and Dockstore, a public platform for sharing Docker-based workflows. Participants will get hands-on experience with these resources by developing their own simple WDL workflow and Docker image for genomic analysis. They will push their workflows to Dockstore and export them to the cloud-based Terra platform so that they can run their workflow on real data.
Learning Objectives
In this tutorial, participants will learn how to:
- Write a basic WDL syntax with inputs and outputs
- Make a Docker image from a Dockerfile
- Navigate Dockstore, a platform for Docker-based workflows
- Find, evaluate, and share workflows in Dockstore
- Automatically integrate GitHub WDL with Dockstore
- Export a WDL workflow from Dockstore to Terra
- Set up and run a workflow in Terra
- Find resources for writing advanced WDL workflows
Intended Audience and Level
Researchers and tool developers interested in bringing their analyses to the cloud. A basic understanding of command line and a GitHub account is required, and participants are encouraged to have basic familiarity with genomics terminology and standard high-throughput sequencing data formats. The introduction to basic WDL syntax is designed for novice WDL writers and starts with a basic hello-world script.
Schedule
9:00 | Welcome/opening remarks/review agenda and learning goals |
9:05 | Introduction to Docker ● How dockers improve software and scientific reproducibility ● Docker and Dockerfile basics ● Finding and using Dockers |
9:15 | Building and Using Dockers ● Pull and use an existing Docker ● Create a Dockerfile to build a Docker |
9:45 | Introduction to WDL ● Anatomy of a WDL ● Where to find and run existing WDLs |
10:00 | Basic WDL scripting ● Writing your first WDL Hello-world script for Terra ● Running WDLs in Terra |
10:45 | Coffee Break |
11:00 | Introduction to Dockstore ● Finding and assessing the quality of workflows on Dockstore ● Launching workflows from Dockstore |
11:30 | Integrate your GitHub with Dockstore ● Use GitHub apps to streamline the development cycle |
12:00 | Real genomics example: Modify, export and run a WDL |
12:30 | Wrap-up and Q&A |
Room: 519
Date: Friday, July 12, 2024 14:00 – 18:00 EDT
Organizer:
Hryhorii Chereda
Speakers:
Prof. Dr. Anne-Christin Hauschild, Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
Hryhorii Chereda, Ph.D., Medical Bioinformatics, University Medical Center Göttingen, Göttingen, Germany
Dr. Youngjun Park, Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
Maryam Moradpour (MSc), Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
Max Participants: 15
Description
The digital revolution in healthcare, fostered by novel high-throughput sequencing technologies and electronic health records (EHRs), transitions the field of medical bioinformatics towards an era of big data. While machine learning (ML) have proven to be advantageous in such settings for a multitude of medical applications, they generally depend on a centralization of datasets. Unfortunately, this is not suited for sensitive medical data, which is often distributed across different institutions, comprises intrinsic distribution shifts and cannot be easily shared due to high privacy or security concerns.
Initially proposed by Google in 2017, Federated learning, allows the training of machine learning models on geographically or legally divided data sets without sharing sensitive data. When combined with additional privacy-enhancing techniques, such as differential privacy or homomorphic encryption, it is a privacy-aware alternative to central data collections while still enabling the training of machine learning models on the whole data set. However, in such federated settings, both infrastructure and algorithms become much more complex compared to centralized machine learning approaches. Some of the most intuitive implementations rely on ensemble learning approaches, where only the model parameters are transferred. For example, we can exchange split values of tree nodes as in federated random forest or combine local subgraph-based graph neural network (GNN) models into a global federated Ensemble-GNN.
This tutorial covers the general theory of federated learning and the practice of federated ensemble learning. We will explain the concepts and benefits of federated ensemble learning, and demonstrate how to use Python to implement two state-of-the-art methods: federated random forest and Ensemble-GNN. The participants will learn how to apply these methods to breast cancer data, including clinical and gene expression features, and how to deploy the models in a federated setup. By the end of this tutorial, the participants will have both theoretical and practical skills in federated ensemble learning and privacy-preserving techniques for biomedical data analysis.
Availability of the tutorial’s material: https://gitlab.gwdg.de/cdss/tutorial-federated-ensemblelearning- for-biomedical-data
Learning Objectives
- Participants will learn the basics of federated machine learning theory and will be introduced to federated ensemble learning:
- Participants will learn about federated random forest.
- Participants will be introduced to GNNs, which utilize a molecular subnetwork structuring input genomic data, and they will learn how GNNs can be combined into an ensemble (Ensemble-GNN).
- Participants will learn how to practically implement and apply a federated random forest.
- Participants will learn how to use GPUs to train Ensemble-GNN and how to apply it in both centralized and federated scenarios.
- Optionally, participants can learn how to implement their own GNN as a new base learner for Ensemble-GNN.
Intended Audience and Level
The aimed audience are: Bioinformaticians, Data scientists, Medical informaticians that are already beginners in machine learning. Participants should have a laptop with Linux, macOS, or Windows and internet connection. The access to computational environment will be provided by the organisers.
Level requirements are the following:
- Basic knowledge of machine learning.
- Basic knowledge of python.
Schedule
14:00 |
Lecture: Federated ensemble learning in biomedical health data Anne-Christin Hauschild |
14:30 |
Hands-on tutorial: how to develop and implement a federated random forest
Hryhorii Chereda, Maryam Moradpour, Younjun Park |
15:45 | Coffee Break |
16:00 |
Continuation of hand-on tutorial: how to develop and implement a federated random forest
Maryam Moradpour, Youngjun Park |
16:15 |
Lecture: Federated ensemble learning with graph neural networks GNNs are particularly developed to eprform different tasks with graphs. For instance, a patient cna be represented by a biological network where the nodes contain patient-specific omics features. In this case, GNNs perform graph classification to predict a patients's clinical endpoint. Ensemble-GNN approach builds predictive models utilizing PPI networks containing carious node features such as gene experssion and/or DNA methylation. To do this, Ensemble-GNN derives relevant PPI network communities and trains an ensemble of GNN models based on the inferred communities. Sharing local GNN models allows for the deployment of a federated ensemble of GNNs. Hryhorii Chereda |
16:30 |
Hands-on tutorial: how to train an apply federated Ensemble-GNN
Hryhorii Chereda, Maryam Moradpour, Youngjun Park |
Tutorial VT1: A Practical Introduction to Large Language Models in Biomedical Data Science Research SOLD OUT
Part 1: Monday, July 8, 2024 14:00 – 18:00 EDT
Part 2: Tuesday, July 9, 2024 14:00 – 18:00 EDT
Organizer:
Robert Xiangru Tang
Speakers:
Robert Xiangru Tang, Yale University, USA.
Qiao Jin, National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), USA.
Hufeng Zhou, Biostatistics Department, Harvard T. H. Chan School of Public Health, Harvard University, USA.
Shubo Tian, National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), USA.
Zhiyong Lu, National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), USA.
Mark Gerstein, Yale University, USA.
Max Participants: 50
Website: https://llm4biomed.github.io/
Description
Large Language Models (LLMs) like ChatGPT have exhibited remarkable capabilities in understanding and generating language across diverse disciplines. In the realm of biomedical data science and computational biology, LLMs can significantly aid the processes of information accessibility, data analysis, and knowledge discovery. In this tutorial, we offer an introductory level hands-on guide to understanding and utilizing these LLMs in the field of biomedical data science. Our tutorial begins with leveling the learning ground by providing introductions to LLMs and Biomedical Data Science. Subsequently, we delve into the core applications of LLMs in biomedical data science/computational biology via retrieval-augmented generation, database functionalities, and code generation. To facilitate thought-provoking discussions, pertinent case studies will be discussed, emphasizing how to harness the power of LLMs to bridge the gap between technical feasibility and practical utility in biomedical data science. Furthermore, handson exercises are included to enable participants to apply their learning in real-time. Participants will also get acquainted with OpenAI's ChatGPT and open-source LLMs, as well as their design, use cases, limitations, and prospects.
Our topics include:
- Introduction
- Large Language Models (LLMs) and their evolution from RNNs, LSTM to Transformers and GPT family.
- In-depth interaction with OpenAI’s ChatGPT, learning about its overview, capabilities, and implementation, focusing on Chain-of-Thought Prompting.
- Open-source LLMs
- Novel applications of LLMs in computational biology and biomedical data sciences
- Database query generation with LLMs.
- Retrieval-augmented generation.
- Language agents and code generation.
- Advanced topics of LLMs for bioinformatics
- Biomedical text retrieval and literature mining
- Gene set analysis
- Developing Representations of Disease-Relevant Molecules
- Guided hands-on exercises using provided datasets and problem statements for practical understanding and implementation.
- Limitations and challenges (e.g. hallucination, fairness, and safety) of using LLMs for science.
Learning Objectives
- Familiarizing with the key aspects of large-scale biomedical data.
- Leveraging LLMs to handle and interpret vast amounts of biomedical data.
- Learning cutting-edge research topics from two invited talks.
- Utilizing OpenAI APIs for GPTs and open-source LLMs in Python.
- Integrating LLMs to enhance their coding efficiency in bioinformatics.
- Deploying LLMs for biomedical question-answering and academic literature exploration.
Intended Audience and Level
This tutorial is designed for graduate students, researchers, data analysts, and practitioners in the domains of bioinformatics, computational biology, and biomedical informatics who are seeking to harness the potential of Large Language Models (LLMs) in their work. The didactic content would be chiefly beneficial for individuals who are keen on enhancing the breadth and depth of their analytical skills.
While the focus of the workshop lies in catering to beginners or users with little experience in LLMs, intermediates will find the advanced topics and in-depth case studies enriching as well. Participants should ideally possess a basic understanding of Python programming and machine learning concepts. Preliminary experience with Linux-based operating systems or interacting with APIs would provide an added advantage but is not a prerequisite.
Our discussion on using OpenAI's ChatGPT and other open-source LLMs, such as LLaMA, along with hands-on exercises and case studies, will offer an immersive learning experience that spans theory and practice. Researchers looking to streamline their data analysis processes and improve the efficiency and accuracy of their results will find this tutorial particularly useful.
Relevant resources and tutorial materials for hands-on activities will be shared online before the commencement of the tutorial, ensuring an unhampered learning experience for all attendees.
Schedule
Part 1 | |
14:00 | Overview and Welcome |
14:10 | Introduction to LLMs with a focus on Biomedical Data Science |
14:40 | How to use GPT-3.5 and GPT-4 with Python |
15:10 | How to use Open-source LLMs with Python |
15:30 | Break |
15:45 | Database Query Generation with LLMs |
16:10 | Retrieval-augmented Generation with Large Language Models |
16:35 | Code generation in Bioinformatics |
Part 2 | |
14:00 | Large Language Models for Biomedicine: from PubMed Search to Gene Set Analysis |
14:45 | AI in Biomedicine: Developing Representations of Disease-Relevant Molecules |
15:30 | Break |
15:45 | Integrating Biomedical Data Database Development with LLMs |
16:10 | Querying PubMed with RAG to answer biomedical questions with GPT-4 |
16:35 | Code generation in Bioinformatics with Opensource LLMs |
16:55 | Closing Remarks |
Part 1: Monday, July 8, 2024 14:00 – 18:00 EDT
Part 2: Tuesday, July 9, 2024 14:00 – 18:00 EDT
Organizer:
Ragothaman M Yennamalli
Speakers:
Ragothaman M. Yennamalli - Assistant Professor, SASTRA Deemed to be University, Thanjavur, India
Dr Farzana Rahman – Assistant Professor, Kingston University London, UK.
Shashank Ravichandran - Senior Software Engineer, Incedo Inc, India
Megha Hegde, PhD Researcher, Kingston University London, UK.
Jean-Christophe Nebel, Professor of Computer Science, Kingston University London, UK.
Max Participants: 30
Description
Data Science and Machine Learning are intricately connected, particularly in computational biology. In a time when biological data is being produced on an unprecedented scale — encompassing genomic sequences, protein interactions, and metabolic pathways- meeting the demand has never been more crucial.
Data visualisation plays a crucial role in biological data sciences since it allows the transformation of complex, often incomprehensible raw data into visual formats that are easier to understand and interpret. This allows biologists to recognise patterns, anomalies, and correlations that would otherwise be lost in the sheer volume of data. In addition, machine learning (ML) has brought about a revolution in the analysis of biological data. Exploiting extensive datasets, ML provides tools to model complex systems and generate predictions. Indeed, ML algorithms excel at uncovering subtle patterns in data, contributing to tasks like predicting protein structures, comprehending genetic variations and their implications for diseases, and even facilitating drug discovery by predicting molecular interactions.
The integration of data visualisation and machine learning is particularly powerful. In particular, visualisation may aid in interpreting machine learning models, allowing biologists to understand and trust their predictions. It could also help fine-tune these models by identifying outliers or anomalies in the data.
Due to its remarkable capability, there has been a surge in the development and application of tools that combine data visualisation and machine learning in biology. Platforms that integrate these technologies enable biologists to conduct comprehensive analyses without needing deep expertise in computer science. Assuredly, this democratisation of data science and ML has empowered more and more biologists to engage in sophisticated, data-driven research.
Learning Objectives
This tutorial is divided into two parts. In the first part of the tutorial, the participants will learn how to install and use tools for data visualisation using Python. The second part will focus on installing and using ML tools for feature selection, model training, and model optimisation using Python. By the end of this tutorial, the participants will be able to:
- Explain the role and significance of data visualisation in the context of scientific research.
- Apply fundamental principles of data visualisation to create clear and informative visual representations of data.
- Create a variety of data visualisations using Python libraries, i.e., Matplotlib, Seaborn, and Plotly.
- Understand the basics of colour theory and its implications for creating accessible and aesthetically pleasing visualisations.
- Design data visualisations that are accessible to a diverse audience, including those with colour vision deficiencies.
- Gain practical skills in preprocessing data and selecting appropriate features for machine learning models.
- Build, train, and evaluate machine learning models using Python libraries like Scikit-learn and TensorFlow/Keras.
- Implement machine learning algorithms on real-world biological datasets, demonstrating an understanding of the application of these techniques in biology.
- Create integrated visualisations of machine learning results using tools like Yellowbrick, Bokeh, and TensorBoard.
- Critically evaluate and discuss the applications, challenges, and implications of data visualisation and machine learning in scientific research, particularly in biology.
Intended Audience and Level
The tutorial is aimed towards entry-level participants (Graduate students, researchers, and scientists) in both academia and industry who are interested in Data Visualisation and ML. Prerequisites: Basic knowledge of computer programming (preferably Python) and machine learning (Beginner). There is no prerequisite to have any knowledge about Art and Aesthetics.
Schedule
Part 1 | |
14:00 | Lecture Introduction to Data Visualisation: Importance and Basic principles of data visualization in scientific research Jean-Christophe Nebel |
15:00 | Hands-on Python Libraries for Visualization: Matplotlib, Seaborn, Plotly and others Farzana Rahman, Ragothaman Yennamalli, Shashank Ravichandran, and Megha Hegde |
15:45 | Coffee/Tea Break |
16:00 | Lecture Colour theory in Visualization: Colour palettes, Accessible and Inclusive Visualisations Ragothaman Yennamalli |
17:00 | Hands-on Creating various types of charts, plots for clarity and aesthetics. Case studies with real world datasets Farzana Rahman, Ragothaman Yennamalli, Shashank Ravichandran, and Megha Hegde |
Part 2 | |
14:00 | Lecture Fundamentals of Machine Learning: Types of ML, Data preprocessing and feature selection, model selection and training Ragothaman Yennamalli and Farzana Rahman |
15:00 | Hands on Python libraries for Machine Learning: Scikit-learn, Pandas, NumPy, TensorFlow/Keras. Building models using real-world biological data Shashank Ravichandran, and Megha Hegde |
16:00 | Coffee/Tea Break |
16:15 | Hands on Integrating Data Viz and ML: Yellowbrick, Bokeh, Tensorboard, Scikit-plot, etc. Farzana Rahman and Megha Hegde |
17:15 | Question and Answer session Identify and highlight blocks of hands-on content in your submission |
Date: Monday, July 8, 2024 14:00 – 18:00 EDT
Organizer:Sierra A.T. Moxon
Speakers:
Sierra Moxon, software developer, Lawrence Berkeley National Laboratory
Kevin Schaper, software developer, University of Colorado
Patrick Kalita, software developer, Lawrence Berkeley National Laboratory
Max Participants: 30
Description
LinkML (Linked data Modeling Language; linkml.io) is an open, extensible modeling framework that allows computers and people to work cooperatively to model, validate, and distribute data that is reusable and interoperable. It is designed to create interoperable data from the start without the overhead normally required for doing this. LinkML can help even non-techies create better, FAIRer, more reusable data models backed by ontologies.
Collecting and organizing biomedical data for an individual project presents a huge challenge; doing so in a way that allows for later reanalysis and reuse across projects is even harder. Many data standards are not machine-actionable, or are defined in isolation, leading to siloization. The quantity and variety of data being generated in biomedical fields is increasing rapidly, but is still often captured in unstructured formats like publications, posters, lab notebooks, or spreadsheets. Researchers at all levels struggle with collecting, managing, and analyzing data and complex knowledge, due to a confusing landscape of schemas, standards, and tools. These challenges impede scientific progress and limit our ability to tailor treatments based on data (precision medicine). AI and ML increasingly enable large-scale data analysis, but lack of data harmonization limits cross-disciplinary applications.
LinkML addresses these issues, weaving together elements of the Semantic Web with aspects of conventional modeling languages to provide a pragmatic way to work with a broad range of data types, maximizing interoperability and computability across sources and domains. LinkML meets data producers where they are technically, and speaks many different modeling languages. Data models can be authored in a variety of languages including YAML, JSON Schema, or even spreadsheets. LinkML supports all steps of the data analysis workflow: data generation, submission, cleaning, annotation, integration, and dissemination. LinkML enables even non-developers to create data models that are understandable and usable across the layers from data stores to user interfaces, reducing translation issues and increasing efficiency.
LinkML is an easy-to-use framework that both emerging and established data-generating communities can use to generate interoperable, reusable datasets and workflows. It has already seen wide uptake by projects across the biomedical spectrum and beyond, including the German Human Genome-Phenome archive, Critical Path Institute, iSample project, National Microbiome Data Collaborative, Center for Cancer Data Harmonization, INCLUDE project, NCATS Biomedical Data Translator, Reactome, Alliance of Genome Resources, Open Microscopy Environment (Next Generation File Format), and Genomics Standards Consortium.
In this tutorial, we will discuss best practices for data modeling; introduce LinkML as a modeling framework and tool suite; work together to set up a LinkML project from scratch; develop a model and validate it with test data; and auto-generate model documentation. If time permits, we will discuss the LinkML tool, Schema Automator, and use of LLMs with LinkML models.
Learning Objectives
- Learn how to author a new data model that exercises some of the main LinkML modeling components.
- Understand common LinkML schema best practices.
- Generate documentation for the new model, and get familiar with generating the model in different formats.
- Time permitting, get familiar with LinkML’s bootstrapping tools that help migrate existing models to LinkML.
Intended Audience and Level
This tutorial is aimed at anyone who generates or works with data: biologists, biocurators, data scientists, and data modelers. No programming or data modeling expertise is required. Listening through the hands-on aspects is encouraged with or without participating directly. To participate in hands-on training, we assume that participants have basic familiarity with running commands from the command line (in a terminal)--for example, calling Python scripts or running simple commands like “cat” and “grep”--and they should have a GitHub account and basic familiarity with using GitHub.
Schedule
Time (EDT) | Topic | Presenter | Hands-on? |
---|---|---|---|
14:00 | Introduction | Sierra Moxon | No |
14:20 | Section 1: Set up a LinkML repository | Patrick Kalita | Yes |
14:50 | Section 2: Authoring a LinkML Model A. Model components B. Classes and slots |
Sierra Moxon | Yes |
15:10 | BREAK | ||
15:25 | Section 2: Authoring a LinkML Model (cont.) C. Mappings, definitions, enumerations |
Sierra Moxon | Yes |
15:40 | Section 3: Schema best practices, including linting | Patrick Kalita | Yes |
15:55 | Section 4: Generating code from your model A. Pydantic, JSONSchema B. Generating documentation |
Kevin Schaper | Yes |
16:35 | BREAK | ||
15:45 | Section 5: LinkML Validate | Patrick Kalita | Yes |
17:05 | Section 6 (Time permitting): Schema Automator (LLM + LinkML) | Sierra Moxon | No |
17:35 | Wrap up/Questions | Sierra Moxon | No |
Tutorial VT4: Computational Approaches for Identifying Context-Specific Transcription Factors using Single-Cell Multi-Omics Datasets
SOLD OUT
Date: Tuesday, July 9, 2024 14:00 – 18:00 EDT
Organizer:
Hatice Ulku Osmanbeyoglu
Speakers:
Hatice Ulku Osmanbeyoglu, Assistant Professor, University of Pittsburgh, USA
Merve Sahin, Computational Biologist, Memorial Sloan Kettering Cancer Center, USA
Parham Hadikhani, Postdoctoral fellow, University of Pittsburgh, USA
Linan Zhang, Assistant Professor, Ningbo University, China
Max Participants: 30
Description
Development of specialized cell types and their functions are controlled by external signals that initiate and propagate cell-type specific transcriptional programs. Activation or repression of genes by key combinations of transcription factors (TFs) drive these transcriptional programs and control cellular identity and functional state. For example, ectopic expression of the TF factors Oct4, Sox2, Klf4 and c-Myc are sufficient to reprogram fibroblasts into induced pluripotent stem cells. Conversely, disruption of TF activity can cause a broad range of diseases including cancer. Hence, identifying context-specific TFs is particularly relevant to human health and disease.
Systematically identifying key TFs for each cell-type represents a formidable challenge. Determination of TF activity in bulk tissue is confounded by cell-type heterogeneity. Single-cell technologies now measure different modalities from individual cells such as RNA, protein, and chromatin states. For example, recent technological breakthroughs have coupled the relatively sparse single cell RNA sequencing (scRNA-seq) signal with robust detection of highly abundant and well-characterized surface proteins using index sorting and barcoded antibodies such as cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq). But these approaches are limited to surface proteins, whereas TFs are intracellular. Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) measures genome-wide chromatin accessibility and reveals cellular memory and response to stimuli or developmental decisions. Recently several computational methods have leveraged these omics datasets to systematically estimate TF activity influencing cell states. We will cover these TF activity inference methods using scRNA-seq, scATAC-seq, Multiome and CITE-seq data through hybrid lectures and hand-on-training sessions. We will cover the principles underlying these methods, their assumptions and trade-offs. We will apply multiple methods, interpret results and discuss strategies for further in silico validation. The audience will be equipped with practical knowledge, essential skills to conduct TF activity inference independently on their own datasets and interpret results.
Learning Objectives for Tutorial
At the completion of the tutorial, participants will gain understanding into the basic concepts and recent advances in transcription factor inference methods for single-cell omics datasets including scRNA-seq, scATAC-seq, CITE-seq and Multiome. Four learning objectives are proposed:
- Understand the basics principles underlying TF activity inference from single-cell omics
- Understand the specific methodologies, assumptions, and trade-offs between computational inference methods
- Gain hands-on experience in applying tools and interpreting results using multiple TF activity inference methods on public scRNA-seq, scATAC-seq, multiome and CITE-seq datasets
- Discuss current bottlenecks, gaps in the field, and opportunities for future work.
Intended Audience and Level
This tutorial is designed for individuals at the beginner to intermediate level, specifically targeting bioinformaticians or computational biologists with some prior experience in analyzing single-cell RNA sequencing (scRNA-seq), single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq), Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq), and Multiome data, or those familiar with next-generation sequencing (NGS) methods. A foundational understanding of basic statistics is assumed.
While participants are expected to be beginners, a minimum level of experience in handling NGS datasets is required. The workshop will be conducted using Python and JupyterLab, necessitating prior proficiency in Python programming and familiarity with command-line tools.
To facilitate the learning process, participants will be provided with pre-processed count matrices derived from real datasets. All analyses, including JupyterLab notebooks and tutorial steps, will be available on GitHub for reference.
The tutorial will employ publicly accessible data, with examples showcased using datasets that will be made available through repositories such as the Gene Expression Omnibus or similar public platforms. This hands-on workshop aims to equip participants with practical skills and knowledge, enabling them to navigate and analyze complex datasets in the field of single-cell omics.
Schedule
14:00 | Welcome remarks and tutorial overview Hatice |
14:05 |
Basic principles behind TF activity inference methods
Hatice |
14:45 | Overview of computational TF inference methods based on single cell omics Hatice, Merve |
15:45 | Break |
16:00 | Hands-on experience in applying tools and interpreting results using multiple TF activity inference methods using public scRNA-seq Linan and Merve |
16:45 | Hands-on experience in applying tools and interpreting results using multiple TF activity inference methods using public scATAC-seq and multiome Parham and Merve |
17:30 | Hands-on experience in applying tools and interpreting results using TF activity inference methods using public CITE-seq Parham and Hatice |
17:55 | Discuss current bottlenecks, gaps in the field, and opportunities for future work Hatice |
Date: Monday, July 8 14:00 – 18:00 EDT
Organizer:
Guadalupe Gonzalez
Speakers:
Guadalupe Gonzalez, Prescient, Genentech Computational Sciences, Genentech.
Chirag Agarwal. Harvard University.
Max Participants: 50
Description
In the rapidly evolving field of biomedical research, graph deep learning (DL) has emerged as a powerful tool for analyzing complex biological data like molecular graphs, protein-protein interaction networks, and patient similarity networks. However, modern graph DL models are complex black-box neural networks comprising millions of parameters, and it is crucial to understand their model predictions before employing them in life-critical applications. Our proposed tutorial is designed to address the above challenge by providing a brief overview of explainability research in the context of graph neural networks (GNNs) and their applications to biomedical problems.
The tutorial will start with an introduction to graph DL, focusing on its relevance and potential in biomedicine. We will discuss why explainability is not just a desirable trait but a necessity in this domain, where model decisions can have significant implications for both model developers and relevant stakeholders.
The second part of the tutorial delves into the core of explainability research in GNNs. We will define what constitutes an explanation in GNN models, introduce post-hoc explainers, explore metrics for evaluating explanations, and criteria to assess the quality of explanations. We will also introduce explanation-directed message passing – a novel approach that integrates post-hoc explanations directly into the training pipeline of GNNs. Finally, we will introduce existing interpretable graph models in biomedicine.
In the third part, we will apply these concepts to high-stakes biomedical applications like predicting molecular properties, discovering new drug targets, and analyzing patient data. We will be discussing each application in depth, demonstrating how explainability enhances our understanding of modern GNNs and drives decision-making in biomedicine.
Finally, the tutorial will feature interactive demonstrations and a hands-on practical session. Participants will engage with real-world biomedical datasets, applying explainability techniques to GNN models. This session aims to provide attendees with practical experience and insights into developing and utilizing explainability techniques and interpretable GNN models effectively in their research.
By the end of this tutorial, participants will have a solid understanding of the importance, methods, and applications of explainability in GNNs within the biomedical sphere, equipped with the knowledge and skills to implement these techniques in their work.
Learning objectives
- Understand the fundamentals of graph deep learning:
- Gain a solid understanding of graph DL and GNNs.
- Recognize the significance and applications of graph DL in biomedicine.
- Learn the importance of the explainability and interpretability of machine learning models in biomedical applications:
- Learn why the explainability and interpretability of machine learning models is crucial in biomedical research.
- Appreciate the implications of model predictions in healthcare and research settings.
- Learn methods and metrics for explainability:
- Understand different approaches to generating explanations for GNN models predictions.
- Get acquainted with various metrics and desiderata used to assess the quality and effectiveness of explanations.
- Explore post-hoc explanation techniques and explanation-directed message passing:
- Discover methods for post-hoc analysis of GNN predictions.
- Delve into explanation-directed message passing and its role in enhancing model interpretability.
- Gain hands-on experience with explainability in GNN models:
- Participate in interactive demonstrations and hands-on exercises to learn how to generate explanations of GNN models predictions for the tasks of molecular property prediction, drug target discovery, and patient data analysis.
- Understand how explainability aids in the decision-making process in these applications.
Intended Audience and Level
This tutorial is primarily intended for:
- Researchers and academics: Individuals working in the fields of bioinformatics, computational biology, biomedical research, and related areas. This includes both experienced researchers and graduate students who are exploring interpretability in the context of graph machine learning techniques in biomedicine
- Data scientists and machine learning practitioners: Professionals in data science and machine learning working on graph DL and seeking to expand their knowledge into the interpretability domain.
- Industry professionals: Individuals from biotech, pharmaceutical, and healthcare technology companies who are involved in research and development, particularly in areas intersecting with AI and machine learning.
The tutorial is designed to be intermediate. Participants are expected to have:
- A basic understanding of machine learning concepts.
- Familiarity with the fundamentals of DL.
- Some knowledge of Python programming, as practical exercises will involve coding. No prior expertise in graph DL or specific biomedical applications is required. The tutorial will provide an introduction to these areas, but will also delve into more advanced topics suitable for attendees with existing knowledge in graph DL or bioinformatics.
Schedule
14:00 |
Part 1: Introduction to graph leep learning in biomedicine
|
14:30 |
Part 2: Understanding and measuring explainability in GNNs
|
15:45 | Coffee break |
16:00 |
Part 3: Applying explainability techniques to GNN model predictions in biomedical contexts
|
16:45 | Coffee break |
17:00 |
Part 4: Hands-on demonstrations and practical session
|