Tutorials

ISMB/ECCB 2021 Virtual Tutorial Program

All times are UTC

ISMB/ECCB 2021 will hold a series of online virtual tutorials prior to the start of the virtual conference scientific program.

Half Day Tutorials:

Tutorial 1: tidytranscriptomics : introduction to tidy analysis of single-cell and bulk RNA sequencing data
Tutorial 2: Comprehensive analysis of immunogenomics sequencing data in the cloud to facilitate reproducibility and rigor of immunogenomics research
Tutorial 3: Meta-learning for Bridging Labeled and Unlabeled Data in Biomedicine

Full Day Tutorials (presented over two half-days):

Tutorial 4: A practical introduction to multi-omics integration and network analysis (SOLD OUT)
Tutorial 5: Inside the ‘Black Box’: Explainable Deep Learning Models For Image and Sequence Classification (SOLD OUT)
Tutorial 6: Nextflow and nf-core: Scalable and FAIR Biomedical Analysis Workflows
Tutorial 7: The state-of-the-art in microbial community bioinformatics (SOLD OUT)
Tutorial 8: Reproducible omics data analysis workflows with the COVID-19 Disease Map, WikiPathways and Cytoscape

Tutorial 1: tidytranscriptomics : introduction to tidy analysis of single-cell and bulk RNA sequencing data

Thursday, July 22, 11:00 - 15:00 UTC

Presenters:

Maria Doyle, Peter MacCallum Cancer Centre, Australia
Stefano Mangiola, The Walter and Eliza Hall Institute of Medical Research, Australia

This tutorial will present how to perform analysis of single-cell and bulk RNA sequencing data following the tidy data paradigm (Wickham and others 2014). The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.

This can be achieved with the integration of packages present in the R CRAN and Bioconductor ecosysten, including tidyseurat, tidySingleCellExperiment, tidybulk, tidyHeatmap (Mangiola and Papenfuss 2020) and tidyverse (Wickham et al. 2019). These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data representation and analysis.

Pre-requisites:
• Basic knowledge of RStudio
• Some familiarity with tidyverse syntax
• Some familiarity with bulk RNA-seq and single cell RNA-seq
Recommended Background Reading Introduction to R for Biologists

Learning goals:
• To understand the key concepts and steps of RNA sequencing data analysis.
• To approach data representation and analysis though a tidy data paradigm, integrating tidyverse with tidybulk, tidyseurat, tidySingleCellExperiment and tidyHeatmap.

Learning objectives:
• Recall the key concepts of RNA sequencing data analysis.
• Apply the concepts to publicly available data.
• Create plots that summarise the information content of the data and analysis results.

Maximum Participants: 100

- top -

Tutorial 2: Comprehensive analysis of immunogenomics sequencing data in the cloud to facilitate reproducibility and rigor of immunogenomics research

Thursday, July 22, 15:00 - 19:00 UTC

Presenters:
Victor Greiff, University of Oslo, Norway
Kenneth B. Hoehn, Yale University, United States
Steven H. Kleinstein, Yale University, United States
Serghei Mangul, University of Southern California, United Statea
Milena Pavlovic, University of Oslo, Norway
Kerui Peng, University of Southern California, United States

Immunogenomics is a field in which genetic information at different levels of biological organization (epigenetics, transcriptomics, metabolomics, cells, tissues, and clinical data) has been characterized and utilized to understand the immune system and immune responses. Immunogenomics studies have offered new opportunities for deepening our understanding of adaptive immune receptors (B-cell receptors, antibodies, T-cell receptors) in the context of a variety of human diseases, such as infectious diseases, cancer, autoimmune conditions, and neurodegenerative disease. Given the importance of adaptive immune receptor research for drug and vaccine discovery, the field is growing at an exponential pace, as exemplified by the user statistics of several immune receptor sequence analysis software suites and databases (Immcantation: >52,000 downloads, >14,000 unique visitors in 2019. VDJTools: >10,000 visitors per year, VDJdb: > 19,000 visitors in 2019 and >40,000 views in 2019). With the number of users exploding, there is a need for software tutorials that lay focus on both rigorous analysis methods as well as reproducibility and interoperability.

We will cover the current stage of immunogenomics methods by providing hybrid lectures and hands-on training sessions. The audience will be equipped with knowledge in this field and the essential skills to conduct adaptive immune receptor analysis independently.

Learning Objectives:

(1) Understand the basics of immunogenomics and its implications for disease diagnosis, drug discovery, and vaccine development.
(2) Understand the basics of computational analysis that facilitate the immunogenomics big data research.
(3) Understand the commonly used computational tools and available datasets for promoting reproducibility and transparency in immunogenomics research.
(4) Understand the cutting edge machine learning approaches for immunogenomics research.

Audience and level:
Beginner or intermediate level. This tutorial is designed for a broader audience interested in immunogenomics research by providing an introduction, demonstration of existing tools, and publicly available datasets.

Maximum Participants: 60

- top -

Tutorial 3: Meta-learning for Bridging Labeled and Unlabeled Data in Biomedicine

Friday, July 23, 15:00 - 19:00 UTC

Presenters:

Maria Brbic, Stanford University, United States
Chelsea Finn, Stanford University, United States
Jure Leskovec, Stanford University, United States

Additional Tutorial details are available at: http://snap.stanford.edu/metalearning-ismb/

In biomedical domains labeled datasets are often very difficult and time-consuming to obtain, requiring a lot of costly manual effort and expert knowledge to hand-label classes before machine learning methods can even be used. This results in many scarcely labeled or completely unlabeled datasets. For instance, in protein function prediction a large number of functional labels have only a few labeled genes, or in single-cell transcriptomics novel and rare cell types appear across large, heterogeneous single-cell datasets. While machine learning methods excel on tasks with a large number of labeled datasets that can support learning of highly parameterized models, to solve central problems in biomedicine we need methods that can generalize to unseen domains and datasets given only a few labeled training examples, or in the extreme case to completely unlabeled datasets. Meta-learning methods solve this challenge by acquiring prior knowledge over previously labeled tasks in order to learn to generalize to a new task with insufficient labeled data.

This tutorial will cover principles and recent advancements of meta-learning with the case studies designed based on their high relevance for advancing new biomedical discoveries. We will present representation learning methods that bridge labeled and unlabeled data by learning to generalize across datasets given only a few labeled examples or extremely without any labeled data with an emphasis on interpretability. The tutorial will equip participants with the ability to understand fundamentals and state-of-the-art meta-learning methods and to utilize the learned concepts and methods in their own research.

Learning objectives:

At the completion of the tutorial, the participants will gain understanding and broad knowledge about the basic concepts and recent advances in the meta-learning techniques:
(1) How can we effectively learn from scarcely labeled datasets, e.g., protein functions or structures with a few labeled examples? How can we use prior knowledge to learn to generalize, i.e., meta-learn?
(2) How can we utilize knowledge from existing knowledge bases, such as Gene Ontology and Cell Ontology, to provide interpretations behind decisions based on only few-labeled examples?
(3) How can we learn without any labeled examples? How can we discover new, never-before-seen categories/classes, such as rare and unseen cell types across single-cell experiments?
(4) How can we transfer knowledge across different species, tissues, or sequencing technologies?
(5) What fundamental open problems in biology can benefit from meta-learning techniques? How can meta-learning be applied to these problems?
(6) What frameworks, tools and libraries are available to use meta-learning methods on new datasets and applications?

Maximum Participants: 100

- top -

Tutorial 4: (SOLD OUT) A practical introduction to multi-omics integration and network analysis

Thursday, July 22, 11:00 - 15:00 UTC
Friday, July 23, 11:00 - 15:00 UTC

Presenters:

Ashfaq Ali, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Lund University
Rui Benfeitas, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University
Nikolay Oskolkov, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Lund University

Advances in next generation sequencing (NGS) and mass spectrometry have recently allowed us to probe deeper and systematically into different layers of biological information flow. We can now capture snapshots of cellular states at single-cell or tissue levels on genomic, transcriptomic, metabolomic, and proteomic levels, to examine relationships between thousands of features in each of these omics and a given phenotype or disease. However, characterization beyond individual omic levels to understand how multi-omic relationships jointly relate with a given phenotype remains a challenge. How may identify the features with the largest phenotypic impact, and how can we identify patterns among the different layers?

In this tutorial we will introduce several different approaches for integration of multi-omics data including supervised and unsupervised learning and network analyses. We will highlight some of the key issues in dealing with the high multidimensionality that characterizes multi-omic data and techniques to address them. We will also discuss some of the most successful methods for multi-omic data abstraction, and how machine learning approaches can be used in unraveling biological relationships. We will show how biological network analyses can be used to identify patterns within and between omics, and how communities of features may be related with phenotypic data and biologic functions. Finally, we will discuss how meta-analyses and network meta-analyses can be used in analyzing studies from independent experiments.

Learning Objectives:
(1) Identify common issues in integration of highly multidimensional omics data.
(2) Identify key methods for data integration through supervised and unsupervised machine learning approaches.
(3) Understand how biological network analysis may assist in identifying coordinated patterns between features and associating feature communities with phenomic and biological functions.
(4) Hands-on experience in supervised/unsupervised integration and biological network analysis.

Audience and level:
Aimed at bioinformaticians and computational biologists with experience in analysis of highthroughput data and basic statistics knowledge, with R or Python coding experience. Knowledge of machine learning techniques is advantageous. Hands-on sessions will comprise both R and Python coding.

Maximum Participants: 30

- top -

Tutorial 5: (SOLD OUT) Inside the ‘Black Box’: Explainable Deep Learning Models For Image and Sequence Classification

Thursday, July 22, 11:00 - 15:00 UTC
Friday, July 23, 11:00 - 15:00 UTC

Presenters:

Panagiotis Alexiou, Central European Institute of Technology, Masaryk University, Czech Republic
David Cechak, Central European Institute of Technology, Masaryk University, Czech Republic
Filip Jozefov, Faculty of Informatics, Masaryk University, Czech Republic
Vlastimil Martinek, Central European Institute of Technology, Masaryk University, Czech Republic
Petr Simecek, Central European Institute of Technology, Masaryk University, Czech Republic

Computational Biologists have been using Machine Learning techniques based on Artificial Neural Networks for decades. New developments in the Machine Learning field over the past years have revolutionized the efficiency of Neural Networks and bring us to the era of Deep Learning. In the news, you can read about Deep Learning beating experts in Go, Chess and StarCraft, translating texts and speech between languages, turning the steering wheels of self-driving cars and even to tag kittens, Not-Hotdogs, and tumours in images. In our field, we have witnessed such systems reaching competitive accuracy with experienced radiologists, predicting folding of proteins and calling single nucleotide polymorphisms in genomic data better than any other method.

In this tutorial we utilize three powerful components that are freely available for use: TensorFlow is an open source library for deep learning and machine learning in general. Thanks to the second one, Google Collaboratory, computational resources needed to train TensorFlow models are available without cost. And finally, TensorFlow.js, will enable us to deploy the trained model as a static web page that can be easily hosted, e.g. on GitHub Pages. We will demonstrate Google Collaboratory + TensorFlow + TensorFlow.js on two examples: classification of images (cells & tissues) and classification of genomic sequences.

The key part of the tutorial will be evaluation and interpretation of the trained model. What could go wrong and how to diagnose it? We will start with simple techniques, like measuring the impact of simple perturbation, and end with an Integrated Gradient method to identify part of input mostly contribution to the decision, introduced in a paper “Axiomatic Attribution for Deep Networks”.

Audience and level: This tutorial is intended for students and practitioners interested in getting their hands dirty with neural networks. It is designed to be an introduction and a starting point for further work and study. Beginners are welcome. Familiarity with Python is necessary, experience with Jupyter Notebooks, pandas & numpy will be useful.

Maximum Participants: 70

- top -

Tutorial 6: Nextflow and nf-core: Scalable and FAIR Biomedical Analysis Workflows

Thursday, July 22, 15:00 - 19:00 UTC
Friday, July 23, 15:00 - 19:00 UTC

Presenters:

Phil Ewels, nf-core creator; Bioinformatics Team Leader, SciLifeLab, Sweden
Evan Floden, Nextflow co-creator; Seqera Labs, Spain
Paolo Di Tommaso, Nextflow co-creator; Seqera Labs, Spain

Nextflow is an open-source workflow management system that prioritizes portability and reproducibility. It enables users to develop and seamlessly scale genomics workflows locally, on HPC clusters, or in major cloud providers’ infrastructures. Developed since 2014 and backed by a fast-growing community, the Nextflow ecosystem is made up of users and developers across academia, government and industry. It counts over 1M downloads and over 10K users worldwide.

nf-core is a framework for the development of collaborative, peer-reviewed, best-practice analysis pipelines. All nf-core pipelines are written in Nextflow and benefit from the ability to be executed on most computational infrastructures, as well as having native support for container technologies such as Docker and Singularity. The nf-core community has developed a suite of tools that automate pipeline creation, testing, deployment and synchronization. The goal is to provide a framework for high-quality bioinformatics pipelines that can be used across all institutions and research facilities.

This intensive tutorial is targeted at bioinformaticians and will cover everything to get users started with Nextflow and nf-core.

Audience: This tutorial is targeted and bioinformaticians and developers interested in writing and deploying biomedical analysis pipelines with Nextflow.

Requirements: Participants should have a basic knowledge of Linux shell programming. Virtual environments will be provided for registered participants.

Maximum Participants: 100

- top -

Tutorial 7: The state-of-the-art in microbial community bioinformatics (SOLD OUT)

Thursday, July 22, 15:00 - 19:00 UTC
Friday, July 23, 15:00 - 19:00 UTC

Organizers & Presenters:

Curtis Huttenhower, Harvard T.H. Chan School of Public Health, United States
Melanie Schirmer, Technical University of Munich, Germany
Nicola Segata, University of Trento, Italy

Presenters:

Eric Franzosa, Harvard T.H. Chan School of Public Health, United States
Philipp Muench, Helmholtz Centre for Infection Research, Germany
Kelsey Thompson, Harvard T.H. Chan School of Public Health, United States
Aaron Walsh, Broad Institute of MIT and Harvard, United States

This tutorial will introduce attendees to the current state-of-the-art in computational and quantitative methods for microbial community analyses. These will focus on integrating modern culture-independent sequencing (shotgun metagenomics and metatranscriptomics) with other molecular data (metabolomics, metaproteomics) and applying appropriate, accurate upstream bioinformatics and downstream biostatistics. This will include both human microbiome epidemiology and environmental microbial ecological, phylogenetic, and toxicology applications.

Attendees are assumed to be familiar with basic microbial community concepts and with command line environments, ideally with some facility in Python and/or R, but are not required to have extensive prior experience with metagenomics. The tutorial will mix lectures introducing important current analysis concepts with hands-on labs using pre-built cloud instances including demonstration data and bioBakery software tools. It will conclude with a discussion of gaps, needs, challenges, and potential next steps for bioinformaticians interested in the field of microbial community research.

Learning Objectives:
(1) Understand the breadth of available microbial community molecular profiling data and computationalanalysis approaches to it.
(2) Apply bioBakery tools for basic microbiome analysis tasks (e.g. taxonomic and functional profiling).
(3) Integrate them with external tools and advanced statistical and visualization techniques for multi-omic integration and downstream analysis.
(4) Recognize and avoid common pitfalls in microbial community bioinformatics, particularly with respect to statistical gotchas, false positives, and noise characteristics of microbiome data.
(5) Discuss gaps in the field and opportunities for future work in microbiome bioinformatics.

Audience: Should be familiar with basic microbial community concepts and, importantly, have facility with command line computing environments. We will provide prebuilt cloud instances for each participant, and participants should be able to manipulate command line tools and data within these instances with little to no introduction. However, extensive familiarity with current microbial community bioinformatics is not required, as these will be introduced (briefly) at the beginning of the tutorial.

Maximum Participants: 60

- top -

Tutorial 8: Reproducible omics data analysis workflows with the COVID-19 Disease Map, WikiPathways and Cytoscape

Thursday, July 22, 15:00 - 19:00 UTC
Friday, July 23, 15:00 - 19:00 UTC

Presenters:

Lauren Dupuis, Maastricht University, Netherlands
John “Scooter” Morris, UCSF, United States
Martina Summer Kutmon, Maastricht University, Netherlands

During the COVID19 pandemic, an international group of over 200 researchers started a collaboration to build a comprehensive map of SARS-CoV-2 related processes from virus uptake and virus replication to host immune response. In this tutorial session, we will highlight some of the use cases of this collection of highly curated pathway models for omics data analysis using pathway and network approaches.

Given the constant influx of new knowledge and data, the development of automated and reproducible data analysis workflows is essential. After a short introduction of the COVID-19 Disease Map project and the WikiPathways community curated pathway database, the tutorial will start with a session focused on Cytoscape, one of the most popular tools for network analysis and visualization, and its automation features. During the hands-on session in the afternoon, we will instruct participants on how to make use of three automated R-based transcriptomics data analysis workflows focused on pathway enrichment, tissue-specific pathway activity, network visualization, and network extension.

Importantly, while we will focus on the COVID-19 Disease Map and COVID-19 related transcriptomics datasets, the majority of the workflows can be easily utilized for other applications.

Audience: The audience for this tutorial session are bioinformaticians or life scientists interested in learning to use automation in R to perform pathway and network analysis of transcriptomics data. Participants should have some prior experience with R data analysis. Participants are required to install a Cytoscape 3.8, R and RStudio, and optionally Jupyter notebooks installed. Detailed instructions will be provided in the weeks prior to the tutorial.

Learning Objectives:
(1) Participants will understand the basics of pathway and network analysis.
(2) Participants will be able to perform reproducible workflows in R for transcriptomics data analysis using pathway information from the COVID19 Disease Map and WikiPathways.
(3) Participants will be able to set up and perform an automated network analysis in Cytoscape.

Maximum Participants: 30

- top -