Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

Tutorials

ISMB/ECCB 2023 Tutorial Program

 

ISMB/ECCB 2023 will hold a series of in-person and virtual tutorials prior to the start of ISMB/ECCB 2023. Tutorial registration fees are shown at: https://www.iscb.org/ismbeccb2023-register#tutorials

In-person Tutorials (All times CEST)

Virtual Tutorials: (All times CEST) Presented through the ISMB/ECCB conference platform

- top -

Tutorial IP1: Accessing structural biology data from experimental and predicted models

Room: Salle Rhône 1 (Level 1)     Sunday, July 23    9:00 – 18:00 CEST

Organizer:
David Armstrong

Speakers:
Gerardo Tauriello, SWISS-MODEL/ModelArchive, Switzerland
Jim Procter, Jalview, University of Dundee, United Kingdom
Preeti Choudhary, Protein Data Bank Europe (PDBe), United Kingdom
Nicola Bordin, CATH, United Kingdom
R Gonzalo Parra, FrustratometeR, Spain
Paulyna Magana, PDBe/AlphaFold, United Kingdom
Maxim Tsenkov, PDBe/AlphaFold, United Kingdom

Max Participants: 40
Introductory level

The understanding that three-dimensional shape dictates function in biology was first revealed by the molecular structure of the DNA. Macromolecular structure can tell us a lot about how these molecules function and the roles they play within a cell. FAIR data derived from structure determination experiments fuelled accessibility and has been essential to Artificial Intelligence (AI)-assisted structure prediction in recent advances. The abundance of structure data available from experiments and predictions enables life-science researchers to address a wide variety of fundamental questions that until now were not accessible.

This tutorial will introduce resources that support effective access of experimentally determined and predicted macromolecular structure models through the 3D-Beacons network, which provides a general way to obtain macromolecular structure data. It will also cover some more specific resources that host this data, including PDBe-KB, SWISS-MODEL Repository, ModelArchive and AlphaFold DB.

While predicted models are increasingly accurate, researchers should never blindly expect them to be. We will hence put a special focus on accessing and interpreting the global and local model confidence estimates which complement the predicted structures and help determine if a predicted structure can be used for downstream analysis.

Each session in the tutorial will include hands-on elements to allow attendees to put these resources into practice and gain direct experience of structural biology data analysis. Furthermore, the tutorial will introduce resources that support functional interpretation of this data and enrich the available information for structure analysis.

All tools and resources presented in this tutorial are open and freely accessible, with comprehensive online documentation and training materials. This tutorial provides an opportunity for participants to not only become more familiar with each resource’s capabilities, but also how they can continue to learn and further develop their knowledge and apply it in teaching and research.

Learning Objectives:

After this tutorial, participants should be able to:

  • Access structure models from different structural biology resources, including PDBe-KB, AlphaFold DB, SWISS-MODEL Repository and ModelArchive.
  • Understand how structure model data can be enriched to provide additional functional context.
  • Interpret data quality metrics for both predicted models and experimental structures.
  • Take advantage of the capabilities provided by tools that allow these data to be explored, enriched, and analysed to gain and communicate biological insights.

Intended audience:

This tutorial is aimed at bioinformaticians or participants interested in accessing and understanding structural biology data. Attendees should have a background in biology or computational biology and some familiarity with structural biology data. Some hands-on elements will include scripting around REST APIs, so will need proficiency in Python or other scripting languages. Participants must bring a laptop to participate in the practical sessions. We will use Microsoft Azure Jupyter Notebooks, so you will need a free Microsoft account.

- top -

Tutorial IP2: Spatial transcriptomics data analysis: theory and practice

Room: Salle Rhône 2 (Level 1)    Sunday, July 23    9:00 – 18:00 CEST

Speakers:
Simon Cockell, Newcastle University, United Kingdom
Alexis Comber, University of Leeds, United Kingdom
Rachel Queen, Newcastle University, United Kingdom
Eleftherios Zormpas, Newcastle University, United Kingdom

Max Participants: 30
Intermediate level

Recent technological advances have led to the application of RNA Sequencing in situ. This allows for whole-transcriptome characterisation, at approaching single-cell resolution, while retaining the spatial information inherent in the intact tissue. Since tissues are congregations of intercommunicating cells, identifying local and global patterns of spatial association is imperative to elucidate the processes which underlie tissue function. Performing spatial data analysis requires particular considerations of the distinct properties of data with a spatial dimension, which gives rise to an association with a different set of statistical and inferential considerations.

In this comprehensive tutorial, we will introduce users to spatial transcriptomics (STx) technologies and current pipelines of STx data analysis. Furthermore, we will introduce attendees to the underlying features of spatial data analysis and how they can effectively utilise space to extract in-depth information from STx datasets.

Learning objectives:

Participants in this tutorial will gain understanding of the core technologies for undertaking a spatial transcriptomics experiment, and the common tools used for the analysis of this data. In particular, participants will appreciate the strengths of geospatial data analysis methods in relation to this type of data. Specific learning objectives will include:

  1. Describe and discuss core technologies for spatial transcriptomics
  2. Make use of key computational technologies to process and analyse STx data
  3. Apply an analysis strategy to obtain derived results and data visualisations
  4. Appreciate the principles underlying spatial data analysis
  5. Understand some of the methods available for spatial data analysis
  6. Apply said methods to an example STx data set

Intended audience:

This tutorial is aimed at data analysts with some experience in single-cell RNA-Seq data analysis, or similar analytical techniques. A base level of competency in analysing high-throughput data using R will be expected – we expect that participants will have hands-on experience of this type of analysis.

The workshop will be conducted in R/RStudio and prior experience with these tools is required. All tutorials will be run on Posit Cloud1 (formally RStudio Cloud), therefore participants do not need to meet any minimum computational specification to attend.

Materials availability: All tutorial materials will be distributed via Github, governed by a permissive open license (e.g. CC-BY). Practical sessions will be documented in Bookdown with accessible and coherent code examples included. Recorded versions of taught material will be made available via YouTube after the conference.

Data used for tutorial examples will be publicly available spatial transcriptomics datasets made accessible via the Gene Expression Omnibus or similar public repository.

- top -

Tutorial IP3: Using Virtual Reality technology for exploring biological data

Room: Salle Rhône 3A (Level 1)    Sunday, July 23    9:00 – 18:00 CEST

Speakers:
Jörg Menche, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences; University of Vienna, Austria
Sebastian Pirch, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences; University of Vienna, Austria
Martin Chiettini, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences; University of Vienna, Austria
Felix Müller, University of Vienna, Austria
Celine Sin, University of Vienna, Austria
Julia Guthrie, University of Vienna, Austria
Joel Hancock, University of Vienna, Austria

Max Participants: 30
Introductory level

Virtual Reality (VR) technology introduces new and exciting opportunities for visualizing and analyzing diverse biological data ranging from 3D protein structure to genome-scale networks. This tutorial will provide an overview of the emerging field of VR based data visualization and hands-on sessions on how to visualize genomic data in VR. We start with an introduction to VR hardware, basic hard- and software requirements, and how to set up a VR station. We will then present an overview of existing VR applications that include tools from structural biology [1], single cell transcriptomics [2], molecular dynamics [3], microscopy [4], high-dimensional point clouds [5], biological networks [6], and other fields.

Attendees will be able to try out the different VR apps. Finally, a hands-on session will introduce the workflow for visualizing and exploring gene based datasets (e.g., from transcriptomics) in a gene interaction network. Attendees are encouraged to bring their own datasets that can then be implemented into the VR platform guided by the tutors. No prior experience with VR is required. The hands-on session requires basic skills in python programming and data preparation/processing.

Learning Objectives:

  • Understand basic hard- and software requirements and how to set up a VR station
  • Get an overview of the broad range of existing VR apps for biological data visualization and analysis
  • Learn how to bring biological data into a VR Analytics platform

Tutorial materials

  • All hands-on tutorials will be based exclusively on open source software.
  • We will prepare google colab python notebooks.
  • All code and required data will be available on a dedicated github repository.
  • The attendees are invited (but not required) to bring their own datasets.

- top -

Tutorial IP4: Orchestrating Large-Scale Single-Cell Analysis with Bioconductor

Room: Salle Rhône 3B (Level 1)    Sunday, July 23    9:00 – 18:00 CEST

Speakers:
Ilaria Billato, University of Padova, Italy
Ludwig Geistlinger, Harvard Medical School, United States
Stefania Pirrotta, University of Padova, Italy
Marcel Ramos, CUNY Graduate School of Public Health and Health Policy; and Roswell Park Comprehensive Cancer Center, United States
Dario Righelli, University of Padova, Italy
Davide Risso,
 University of Padova, Italy

Max Participants: 30
Introductory level

In the last few years, the profiling of a large number of genome-wide features in individual cells has become routine. Consequently, a plethora of tools for the analysis of single-cell data has been developed, making it hard to understand the critical steps in the analysis workflow and the best methods for each objective of one’s study.

This tutorial aims to provide a solid foundation in using Bioconductor tools for single-cell RNA-seq analysis by walking through various steps of typical workflows using example datasets.

This tutorial uses as a “text-book” the online book “Orchestrating Single-Cell Analysis with Bioconductor” (OSCA), started in 2018 and continuously updated by many contributors from the Bioconductor community (see https://bioconductor.org/books/release/OSCA/). Like the book, this tutorial strives to be of interest to the experimental biologists wanting to analyze their data and to the bioinformaticians approaching single-cell data.

Learning objectives:

Attendees will learn how to analyze multi-condition single-cell RNA-seq from raw data to statistical analyses and result interpretation. Students will learn where the critical steps and methods choices are and will be able to leverage large-data resources to analyze datasets comprising millions of cells.

In particular, participants will learn:

  • How to access publicly available data, such as those from the Human Cell Atlas.
  • How to perform data exploration, normalization, and dimensionality reduction.
  • How to identify cell types/states and marker genes.
  • How to correct for batch effects and integrate multiple samples.
  • How to perform differential expression and differential abundance analysis between conditions.
  • How to work with large out-of-memory datasets.

- top -

Tutorial IP5: How to make reproducible, portable and reusable bioinformatics software using software containerization

Room: Salle Saint Claire 3 (Level 2)    Sunday, July 23    9:00 – 13:00 CEST

Speakers:
Giacomo Baruzzo, University of Padova, Italy
Barbara Di Camillo, University of Padova, Italy
Giulia Cesaro, (PhD student) University of Padova, Italy
Mikele Milia, (PhD student) University of Padova, Italy

Max Participants: 50
Intermediate level

The lack of reproducibility of scientific results has a negative impact on several research fields, including bioinformatics. Specifically, bioinformatics analyses are the results of set/pipelines of software packages, each one having multiple running options, making it difficult to fully reproduce a specific analysis workflow. Moreover, bioinformatics software have a fast-evolving development and they extensively rely on external libraries/packages, two elements that limit the reproducibility across different software versions and operating systems. The extensive use of external libraries/packages also limits the portability and reusability of the software, requiring to properly install and configure many dependencies, and building/compiling the application on different target systems.

Containers are the solutions to most of the above issues. Containers use lightweight virtualization to encapsulate an entire execution environment that can be easily shared, deployed and executed in an efficient and fully reproducible way in a variety of computing systems. Compared to other virtualization strategies, software containerization has a very low overhead in terms of computational burden, it supports modern parallel computing (multithreading, MPI, GPU computing, etc.) and it is very used even in High Performance Computing scenarios.

This tutorial will introduce attendants to software containerization and its impact in science and research in terms of reproducibility, portability and reusability of software and related results, with practical application to the bioinformatics field. First, the tutorial will provide an overview of the different virtualization strategies (e.g., containers vs. virtual machines) from both users and developers point of view, and the differences between the available container engines. Second, it will be explained how to develop, build, run and share containers using two of the most used container engines, i.e., Docker and Singularity. Since many fields of bioinformatics involve computationally intensive tasks and the analysis of large datasets, how to work with containers in both desktop/workstation and High-Performance Computing (HPC) infrastructures will be presented, including practical guides for parallel framework and GPU applications. The tutorial will provide a good balance between the theoretical aspects of software containerization (e.g., how container works, pros and cons, different container engines) and hands-on exercises to train on basic concepts and experience the design of container for routinely used software in the field of bioinformatics.

All the course material (slides, container definitions files and container images) will be made freely available on the web on a dedicated web page/git repository. Virtual machines containing a working environment to reproduce tutorial examples and perform hands-on will be made freely available to course attendants.

The learning objectives of this tutorial are:

  • understand the underlying concepts related to software containerization and their impact in science and research in terms of reproducibility, portability, and reusability of software and results obtained, with practical applications to the field of bioinformatics.
  • understand the differences between cutting-edge virtualization strategies (e.g., containers vs virtual machines) from both users and developers point of view, as well as the purposes for which they are designed.
  • highlight differences between Docker and Singularity, two of the most widely used container engines available, and understand their own strengths and peculiar features.
  • be able to develop, build, run and share simple containers based on Singularity and/or Docker container engines in desktop, workstation, and high-performance computing (HPC) infrastructures.

Intended audience:

Master or PhD students, and researchers in the field of bioinformatics, computational biology and medical informatics that are users and/or developers of bioinformatics software & pipelines, and who are interested in enhancing the reproducibility, reusability and portability of bioinformatics software and analyses. Basic knowledge of Linux-based operating systems and Linux terminal is strongly suggested. The target level would be beginner users or users with few experience on software containerization.

- top -

Tutorial IP6: Interactive microbiome analysis using DIAMOND+MEGAN

Room: Salle Roseraie 1/2 (Level 3)    Sunday, July 23    9:00 – 13:00 CEST

Speakers:
Daniel Huson, University of Tübingen, Germany
Anupam Gautam, Max Planck Institute for Biology Tübingen; University of Tübingen, Germany
Wenhuan Zeng, University of Tübingen, German

Max Participants: 30
Introductory level

Computational analysis of microbiome samples is usually a two-step process. First, metagenomic sequencing datasets are subjected to computational analysis on a server. This usually involves alignment of metagenomic data against a protein reference database such as NCBI-nr, which is a computational intensive computation and requires specialized software such as DIAMOND. The resulting alignments will then be processed to perform taxonomic and functional binning, using additional command-line tools such MEGANIZER. Once server-based analysis has been completed, the second step is that investigators then explore and compare the taxonomic and functional content of the samples on a personal computer, using an interactive tool such as MEGAN.

In the first half of the tutorial, participants will learn how to setup and run the main computational steps of metagenomic analysis on a server. In the second half of the tutorial, the focus will be on how to interactively explore and compare the taxonomic and functional content of metagenomic datasets.

Learning Objectives:

The overall aim of this tutorial is to enable participants to setup and perform metagenomic analysis using the DIAMOND+MEGAN pipeline. At the end of the tutorial, participants will be familar with the main steps of pipeline and will be able to:

  • setup and run DIAMOND alignment of short read datasets against a protein reference database such as NCBI-nr, on a server,
  • adjust the parameters of DIAMOND to align assembled contigs and long reads,
  • use alternative databases such as AnnoTree or UniRef,
  • perform basic taxonomic and functional computational analysis using MEGAN tools on a server,
  • open and interactively explore the taxonomic and functional content of metagenomic datasets using MEGAN, on a personal computer,
  • interactively compare multiple datasets,
  • import and work with sample metadata,
  • export assignments, reads and alignments, and
  • run gene-centric assembly for genes of interest.

- top -

Tutorial IP7: nf-core: a best-practice framework for creating Nextflow pipelines and sharing them with the community

Room: Salle Roseraie 1/2 (Level 3)    Sunday, July 23    14:00 – 18:00 CEST

Speakers:
Maxime Garcia, Seqera Labs, Spain
Friederike Hanssen, Quantitative Biology Center, University of Tuebingen; and, nf-core, Germany

Max Participants: 30
Intermediate level

Long-term use and uptake by the life-science community requires that workflows (as well as tools) are findable, accessible, interoperable and reusable (FAIR) to achieve reproducibility in the analyses. To address these challenges, scientific workflow managers such as Nextflow have become an indispensable part of a computational biologist’s toolbox. The nf-core project is a community effort to facilitate the creation and sharing of high-quality workflows and workflow modules written in Nextflow. Workflows hosted on the nf-core framework must adhere to a set of strict best-practice guidelines that ensure reproducibility, portability and scalability. Currently, there are more than 41 released pipelines and 26 pipelines under development as part of the nf-core community, with more than 400 code contributors.

In this tutorial, we will briefly introduce the nf-core project and how to become part of the nf-core community. We will showcase how the nf-core tools help create Nextflow pipelines starting from a template. We will highlight the best-practice components in Nextflow pipelines such as CI-testing, modularization, code linting, and containerization to ensure a reproducible and portable workflow. We will introduce the helper function for interacting with over 700 ready-to-use modules and subworkflows. In the final practical session, we will build a pipeline using the introduced components.

This workshop will be taught by core administrators of the nf-core project, with more than 3 years of experience in writing Nextflow workflows and contributing to the nf-core community.

Target Audience:

  • People wanting to develop bioinformatics best-practice Nextflow workflows. Basic experience using Nextflow is helpful, but not required. To get started with Nextflow, see here.
  • People wanting to contribute to the nf-core community with shared modules or pipelines.

Prerequisites for this tutorial:

  • Have a GitHub account and join the nf-core GitHub organisation beforehand
  • Check that your Gitpod account linked to GitHub is active
  • Join the nf-core slack
  • Basic command line usage

Learning Objectives:

  • What is the nf-core community for shared Nextflow workflows.
  • What are the best-practices components in a Nextflow pipeline: CI testing, modules, code linting, containerization, and portability.
  • How to create a Nextflow pipeline with nf-core/tools.
  • What are nf-core shared modules and how to use them within a pipeline.   

- top -

Tutorial IP8: Explainable AI and Omics Data: Interactive Data Visualization and Machine Learning in Python

CANCELLED

- top -

Tutorial VT1: Make your research FAIRer with Quarto, GitHub and Zenodo

Part 1: Monday, July 17 (14:00 – 18:00 CEST)
Part 2: Tuesday, July 18 (14:00 – 18:00 CEST)

Speakers:
Geert van Geest, Swiss Institute of Bioinformatics; and University of Bern, Switzerland
Wandrille Duchemin, Swiss Institute of Bioinformatics; and sciCORE,  University of Basel, Switzerland
Lea Taylor, University of Bern, Switzerland

Max Participants: 30
Introductory level

The FAIR (Findable, Accessible, Interoperable and Reusable) principles provide guidelines for making research data and other resources more easily discoverable and reusable, which can help increase your research's impact and exposure. Adhering to these principles also ensures that your research is more reliable and reproducible, as others can more easily access and provide feedback. In addition, making your research FAIR can promote the principles of open science and make it easier for others to contribute to and build upon your work. Finally, many funding agencies and journals now expect that your research outputs be made FAIR as a condition of funding or publication, so adhering to these principles can help to ensure that your research meets these requirements.

Sharing and reusing are at the heart of the FAIR principles and should be a routine task of any (life) scientist today. To enable others to use or reproduce our findings and work, we need to provide all the information: the data, software and parameters used, scripts for the analysis, databases and their versions, and any required documentation and contextualisation. To have all this information and code together in a single file, many use markdown. Such a markdown file can be created in a simple way, and after that rendered in a single web page that is easy to share and read.

The FAIR principles are a continuum of steps, and FAIRer research is better than not FAIR at all. In this tutorial, the participants will be introduced to some FAIR steps. They will learn how to write documentation containing Markdown and code (e.g. R or Python) and render it into nicely formatted pages with Quarto. Afterwards, the participants will learn the basics of git and GitHub and how to host the version-controlled source files in a GitHub repository. Third, the participants will learn how to automatically render the source files into a nicely formatted web page hosted with GitHub. Lastly, the participant will learn how to store the source files in a longer term by linking a GitHub repository to Zenodo and giving it a unique identifier (DOI). By using the taught tools and concepts, the participants will take significant steps in adhering to the FAIR principles and therefore boost the benefits of sharing.

Learning objectives:

After this tutorial, the participants will be able to:

  • Create notebooks and websites based on Markdown, and Python or R with Quarto
  • Use Git and GitHub to version control the generated content
  • Host a website by making use of GitHub actions and GitHub pages
  • Link the GitHub repository to Zenodo and give it a unique identifier (DOI)

Intended audience:

This tutorial is aimed at computational biologists, bioinformaticians, researchers, scientists and trainers working in the life sciences who want to learn how to make their research and training FAIRer with reproducible notebooks and websites. Participants are expected to have an introductory level in programming with R or Python. Participants should have a GitHub account (https://github.com/join) and bring their laptops with either the latest versions of RStudio or VSCode pre-installed.

- top -

Tutorial VT2: Protein Sequence Analysis using Transformer-based Large Language Model

Part 1: Monday, July 17 (14:00 – 18:00 CEST)
Part 2: Tuesday, July 18 (14:00 – 18:00 CEST)

Speakers:
Bishnu Sarker, Meharry Medical College, United States
Sayane Shome, Stanford University, United States
Farzana Rahman, Kingston University London, United Kingdom
Nima Aghaeepour, Stanford University, United States

Max Participants: 30
Introductory or Intermediate level

In the current decade, AI/ML has tremendously facilitated scientific discoveries in biomedicine. Moreover, the recent advancements in the development of large language models (a type of deep learning model that can read, summarize, translate, and generate text as we humans do) have inspired many researchers to find applications in biological sequence analysis, partly because of the similarities in the data. Attention-based deep transformer models [1,2] pre-trained in a self-supervised fashion on large corpus have dramatically transformed research in natural language processing. The attention mechanism involved in transformer models captures the long-distance relationship among words in textual data [2]. Following a similar principle in the biological domain, researchers have trained transformer-based protein language models for biological sequence analysis. For example, ProtTrans [3] was trained on UniProtKB [4] sequences for protein sequence analysis. They showed that transformer-based self-supervised protein language models effectively capture the spatial relationship among residues which is critical for understanding the functional and structural aspects of proteins.

In this tutorial, we aim to provide experiential training on how to build basic ML pipelines using deep learning and pre-trained transformer protein language models for biological sequence analysis. We will start with a quick introduction to Python packages (Keras, Tensorflow/Pytorch, Scipy, scikit-bio, bio-transformers) that are heavily used for machine learning projects. In addition, we will cover the biological concepts behind protein sequence and function. Then, we will introduce classical natural language processing, and report its recent advancements. Finally, self-supervised deep learning-based large language models (such as Transformers) will be reviewed with a particular focus on protein sequence analysis.

Learning Objectives:
At the end of the tutorial, the participants will have understanding and practical knowledge of:

  1. Fundamentals of transformer-based large language models.
  2. How to collect, preprocess and vectorize sequence data.
  3. How to build basic machine learning models for sequence analysis.
  4. How to implement deep learning models such as Convolution and Recurrent Neural Networks (CNN and RNN) in the context of biological sequence modelling.
  5. How to apply a pre-trained transformer language model for biological sequence analysis
  6. How to fine-tune a transformer-based large language model on new data.
  7. How to formulate and address biomedical problems using transformer-based large language models.
  8. What tools, frameworks, datasets, and programming libraries are available to work with transformer-based large language models for sequence analysis.

Intended Audience:

Graduate students, researchers, scientists, and practitioners in both academia and industry who are interested in applications of deep learning, natural language processing, and transformer-based language models in biomedicine and biomedical knowledge discovery. The tutorial is aimed towards entry-level participants with basic knowledge of computer programming (preferably Python) and machine learning (Beginner or Intermediate).

- top -

Tutorial VT3: Functional metagenomics made easy

Part 1: Monday, July 17 (14:00 – 18:00 CEST)
Part 2: Tuesday, July 18 (14:00 – 18:00 CEST)

Speakers:
Silas Kieser, University of Geneva, Switzerland
Matija Trickovic, University of Geneva, Switzerland

Max Participants: 30
Introductory level

Metagenomics is transforming how we study microbiomes by enabling the analysis of entire microbial communities from diverse environments, without the need for culturing. Recent improvements in computational algorithms enable the assembly of genomes directly from metagenomic data. In this way, assembly-based metagenomics allowed the recovery of an almost unimaginable number of uncultured microbes from different environments such as the gut and ocean. However, the availability of genomes is only the start of the analysis.

In this tutorial, we will familiarize the participants with the steps required in assembly-based metagenomics (assembly, binning, Genome-completeness estimation, taxonomic and functional annotation, and pathway inference). In the hand-on session, we will use metagenome-atlas a commonly used metagenomics pipeline, that allows users to get started in three commands with their analysis. Based on a case study we show how the functional annotation of genomes can be leveraged to make sense of the data.

Learning Objectives:

The participants will

  1. Learn to assemble genomes from metagenomic reads and estimate their quality
  2. Understand the steps used of gene annotation and pathway inference
  3. Use the functional and taxonomic annotation of a metagenome dataset to answer scientific questions.

Intended Audience:

The workshop is intended for beginners. Participants should know what a fastq is, how to run commands in bash. They should know how to read tables in their programming language in either python or R.  For the hands-on session, the participants should bring their laptops, with a possibility to run bash (Linux, Mac, Linux subsystem for windows, docker container, remote connection to server).

- top -

Tutorial VT4: Deep learning modeling, training, prediction, and marker analysis of Multiomics data using G2PDeep-v2 web server

Speakers:
Trupti Joshi, University of Missouri-Columbia, United States
Shuai Zeng, (Ph.D. student) University of Missouri-Columbia, United States
Ajay Kumar, (Ph.D. student) University of Missouri-Columbia, United States

Date: Monday, July 17 (14:00 – 18:00 CEST)

Max Participants: 50
Introductory level

With the advances in next-generation sequencing (NGS) technologies, large amounts of multiomics data for many organisms have been generated and are publicly available. Recently, many deep learning applications have been developed and widely used in bioinformatics studies. Despite the availability of these applications and databases, there is still no web-based service available to provide end to end phenotype prediction, markers discovery, and Gene Set Enrichment Analysis (GSEA) starting with input multiomics datasets. The off-line applications have steep learning curves and require complicated installations.

The G2PDeep-v2 server (https://g2pdeep.org) is a comprehensive web-based platform, providing phenotype prediction and markers discovery, powered by deep learning. The server provides multiple services for researchers including creation of deep-learning models through an interactive interface and training these models through automated hyperparameter tuning algorithm on high-performance computing resources. It visualizes results of phenotype and markers predicted by well-trained models. It also provides GSEA analysis for the significant markers to provide insights into the mechanisms underlying complex diseases and other biological phenomena.

In our tutorial, we will cover key advancements in deep learning over the past few years, with an emphasis on new opportunities in bioinformatics enabled by such advancements. We will start with a technical talk about the deep learning algorithm of G2PDeep, and from model training to model interpretation (markers discovery). We will then demonstrate the impact of G2PDeep on discovering underlying mechanisms in complex diseases and other biological phenomena.

Learning Objectives

  • To understand the basic principles of deep learning and model interpretation (markers discovery).
  • To understand the specifics of G2PDeep and become aware of the appropriate tools to use in different applications.
  • To gain hands-on experience in applying tools and interpreting results using G2PDeep web servers.

Intended audience:

Graduate students, researchers, scientists, and practitioners in both academia and industry who are interested in applications of deep learning in bioinformatics (Broad Interest). The tutorial is aimed towards entry-level participants with knowledge of the fundamentals of biology and machine learning (beginner). No prior experience with Python programming language is assumed, but familiarity with working on Unix-based systems is strongly recommended for the participants.

The tutorial slides and materials for hands-on exercises (e.g., links to demo, code implementation, datasets) will be posted online prior to the tutorial and made available to all participants.

- top -

Tutorial VT5: Biomedical knowledge exploration using graph databases: Neo4j, Cypher, Biolink, and emerging standards

Date: Monday, July 17 (14:00 – 18:00 CEST)

Speakers:
Stephen Ramsey, Oregon State University, United States
David Koslicki, Penn State University, United States
Christopher Plaisier, Arizona State University, United States

Max Participants: 30
Introductory level

This tutorial aims to provide participants with a self-contained and practical introduction to the use of knowledge graphs in biomedical and translational research and with a high-level understanding of the conceptual foundations of graphical knowledge representation. The tutorial will feature hands-on computer exercises using the free, open-source, and widely used graph database Neo4j and its underlying graph query language, Cypher. Participants will be provided with example knowledge data files suitable for downloading and using in the tutorial, that are semantically described in accordance with Biolink, an emerging standard semantic layer for biomedical knowledge graphs. The tutorial will not require participants to have previous experience with Neo4j, Cypher, or Biolink. Participants need only bring a computer that is capable of installing and running Neo4j Community Edition (which is free and multiplatform); participants will be guided through installing Neo4j during the tutorial. Free access to a Neo4j database server will be provided, as an alternative to installing Neo4j on the participants' computer; in that case, a participant only needs to have a web browser.

Learning objectives:

After completing this tutorial, participants will be able to:

  1. Describe how knowledge is represented as a labeled property graph in Neo4j
  2. Query a biomedical knowledge graph to find concepts and relationships of interest.
  3. Analyze a biomedical knowledge graph to find paths connecting concepts of interest.
  4. Display a knowledge graph of concepts and relationships relevant to a specific application.
  5. Create an indexed knowledge graph from standardized input data.
  6. Describe how the Biolink model is used as a semantic layer in knowledge

Intended Audience:

This tutorial is aimed at participants who are beginner-level with Neo4j and Cypher, but it will also have material that would be of interest to those who have used Neo4j/Cypher previously.

- top -

Tutorial VT6: Introduction to AlphaFold 2 and Practical PyMOL Visualization

Date: Tuesday, July 18  (14:00 – 18:00 CEST)

Speakers:
Adelaide Rhodes, United States
Geraldine Van der Auwera, United States

Max Participants: 30
Introductory level

As of now there are a few different ways to predict the exact protein structure in the lab: X-ray Crystallography, Nuclear Magnetic Resonance (NMR) Spectroscopy and 3D Electron Microscopy. There are 200,000,000 known distinct proteins, each with a unique structure that determines function, however only a small fraction of exact 3D structures are known due to the cost and time involved in generating the structures using laboratory methods. A reliable method that can predict the protein structure from the nucleotide sequence would speed up protein discovery despite these constraints.  A new predictive algorithm system based on Artificial Intelligence called AlphaFold by DeepMind, a subsidiary of Alphabet (now on version 2) holds the promise of delivering a protein structure prediction in minutes at a modest computational cost. A lot of interest and excitement has been generated by the availability of AlphaFold 2, so this tutorial is geared towards demonstrating how to set up, run and analyze results from AlphaFold 2 using a simple test data set. The results of the analysis will be visualized using PyMOL, an open source but proprietary (Schrödinger, Inc.) molecular visualization system created by Warren Lyford DeLano. The workshop will be a combination of informative lectures and practical hands-on exercises.

Learning Objectives

  • Understand the background of Protein Structure Prediction
  • Gain insight into how an AI environment works
  • Understand how AlphaFold 2 takes a protein sequence and creates a predicted protein structure
  • Practice analyzing AlphaFold 2 output and determine how well it compares to available structures
  • Practice visualizing protein structures in PyMOL
  • Demo and practice on how to run AlphaFold 2 and PyMOL visualization from a Jupyter Notebook

Intended Audience:

Beginner to intermediate bioinformatics scientists or clinical scientists who are curious about how AlphaFold 2 works. No previous experience with Alpha Fold 2 is required, but some basic command line experience is helpful.

- top -