Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

Tutorials

ISMB 2022 Tutorial Program

 

ISMB 2022 will hold a series of in-person and virtual tutorials prior to the start of ISMB 2022.

In-person Tutorials (All times CDT)

Virtual Tutorials: (All times CDT)

 

Tutorial IP1: Gene regulatory network inference from single-cell transcriptomics data

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Kedar N Natarajan,
University of Southern Denmark

Single-cell transcriptomics has become the state-of-art for profiling cell types and states from heterogenous complex tissues, and can facilitate investigation into underlying regulatory mechanisms. The recent wealth of large single-cell RNA-sequencing (scRNA-seq) studies incl. atlas-scale datasets have enabled an improved appreciation of cell type complexity and underlying gene regulatory networks. A growing suite of computational methods termed gene regulatory network (GRN) inference approaches take the input scRNA-seq data (alongside annotations), and infer activity of master regulator transcription factors with their downstream targets within single-cells. These GRN (TF-target) inference approaches integrate the upstream epigenetic and signalling cascades to provide an improved understanding of both cell type-specific and consensus (multi cell-type) TF regulation.

We will cover two major GRN inference methods using simulated and real scRNA-seq data through hybrid lectures and hand-on-training sessions. We will cover the principles underlying these methods, their assumptions, trade-offs, benchmark methods alongside interpretation of results and discuss strategies for further in silico validation. The audience will be equipped with practical knowledge, essential skills to conduct GRN inference independently on their own datasets and interpret results.

Learning Objectives:
At the completion of the tutorial, participants will gain understanding into the basic concepts and recent advances in GRN inference methods for single-cell transcriptomic datasets. Four learning objectives are proposed:

  1. Understand the basics principles underlying GRN inference from scRNA-seq data;
  2. Understand the specific methodologies, assumptions, and trade-offs between two commonly used computational inference methods (TENET, PySCENIC);
  3. Gain hands-on experience in applying tools and interpreting results using two GRN inference methods on simulated and public scRNA-seq datasets;
  4. Discuss current bottlenecks, gaps in the field, and opportunities for future work.

Intended audience and level:
Beginner or intermediate level. This tutorial is aimed at bioinformaticians or computational biologists with some experience in scRNA-seq data analysis (or NGS methods) and basic statistics knowledge. We expect participants to be beginners but with minimal experience analysing next-generation sequencing datasets. The workshop will be conducted in Python/ JupyterLab and prior experience with Python syntax is needed. Participants will be provided by pre-processed count matrix (both simulated and real-datasets). All the analysis (JupyterLab notebooks) and steps undertaken in the tutorial will be provided via GitHub.

Maximum Participants: 30

- top -

Tutorial IP2: A practical introduction to the design, quantification, and analysis of CRISPR genome editing data

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Luca Pinello, Harvard Medical School/Broad Institute, United States
Basheer Becerra, Bioinformatics & Integrative Genomics Ph.D. Candidate, Harvard Medical School
Maya Talukdar, Health Sciences & Technology M.D./Ph.D. Student, Harvard Medical School/Massachusetts Institute of Technology, United States
Jayoung Ryu, Bioinformatics & Integrative Genomes Ph.D. Candidate, Harvard Medical School, United States
Lucas Ferreira, Postdoctoral Fellow, MGH/Boston Children’s Hospital/Harvard Medical School, United States

The easy programmability of CRISPR-associated nucleases and other recent editors such as base and prime editors has revolutionized our ability to interrogate genome function and pinpoint causal variants. In particular, CRISPR genome engineering tools can be deployed to uncover functional important genes, pathways, non-coding elements, and examine the effects of regulatory variants on gene expression and other phenotypes.

In this tutorial, we will cover the computational workflow involved in performing CRISPR screens through both lectures of theoretical concepts and hands-on workshops of computational tools. We will begin with introducing CRISPR genome editing and common CRISPR screening strategies. Next, we will discuss computational methods to design CRISPR perturbations by generating guide RNA protospacer sequences. We will then cover experimental and computational methods to quantify genome editing activity. Lastly, we will cover computational methods to analyze and interpret the results of CRISPR screens. Ultimately, this tutorial will provide a comprehensive and holistic overview in the design and analysis of CRISPR screens without requiring any prior background in genome editing.

Learning Objectives:

  • Understanding of common CRISPR editing technologies and how they are used in forward-genetics screens to interrogate genes, variants, and other regions potentially associated with a phenotype-of-interest.
  • Understanding the experimental workflow of arrayed and pooled CRISPR screens with a focus on cell viability and cell sorting.
  • Strategies and practical considerations when defining screen targets and generating a CRISPR guide library design (such as off-target prediction, PAM constraints, predicted cleavage site, etc.).
  • Ability to use computational tools for generating CRISPR guide sequences for library design such as CRISPOR and CRISPRme.
  • Ability to use computational tools for quantifying genome editing activity such as Synthego ICE and CRISPResso.
  • Understand strategies in screen design for maximizing statistical power to detect a signal-of-interest.
  • Ability to use computational methods for the analysis of common CRISPR-based bulk or single-cell screens with MAGeCK and SCEPTRE.

Intended audience and level:
Our target audience are researchers with an interest in analyzing CRISPR screens with little to no prior knowledge in CRISPR/Cas9 or genome editing screens or corresponding computational tools. We do require some knowledge in using command line tools and programming languages for the workshops, specifically the basics of using Python and R. We recommend knowledge in biology and statistics at an undergraduate level to fully understand the concepts discussed in the lectures.

Maximum Participants: 40

- top -

Tutorial IP3: Guidelines for the assessment and analysis of lrRNA-seq data for transcript identification and quantification (LRGASP challenge)

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Ana V Conesa, Spanish National Research Council, Spain
Fairlie Reese, University of California at Irvine, United States
Dennis Mulligan, University of California at Santa Cruz, United States
Hagen Tilgner, Weill Cornell Medicine, United States
Andre Sim, Genome Institute of Singapore
Toby Hunt, European Bioinformatics Institute, GENCODE, United Kingdom
Ralf Herwig, Max Planck Institute for Molecular Genetics, Germany
Roger Volden, Pacific Biosciences, Menlo Park, California, 

Long read, single molecule sequencing platforms such as Nanopore and Pacbio are increasingly being used for transcriptomics analysis leading to the long reads RNA-seq (lrRNA-seq) datasets. Over the last years, these two sequencing platforms have improved in throughput and accuracy, and novel algorithms have been developed to analyze the data. Still, there are not yet clear guidelines on how the best method for accurate lrRNA-seq analysis is and how different methods compare to each other. The LRGASP is a community-wide initiative to benchmark long reads sequencing platforms, library preparation method and analysis pipelines using lrRNA-seq (https://www.gencodegenes.org/pages/LRGASP/). More than 50 different datasets were created and analyzed by a dozen of lrRNA-seq analysis tools. Evaluation metrics were created to assess the accuracy of predicted transcript models and quantification of gene and transcript expression. In this tutorial we will present the LRGASP analysis framework, discuss relevant results and lessons learned from the LRGASP project and train participants in the utilization of a diversity of pipelines for the analysis of both Nanopore and Pacbio lrRNA-seq data, as well as the LRGASP evaluation tools to assess the quality of the data. The aim of the tutorial is to provide an intensive training in tools for lrRNA-seq analysis and discuss best practiced for the analysis of these data. We will also discuss how LRGASP evaluation tools can be used to benchmark new or updated lrRNA-seq analysis tools developed by the community beyond the LRGASP contest.

The tutorial will introduce the LRGASP contest, datasets and benchmarking tools, including the OpenEBench implementation of LRGASP. Hand-ons will be provided by developers of different lrRNA-seq analysis tools on how to use their methods both for transcript identification, quality control, quantification, visualization and differential expression analysis. These include FLAIR (Brooks Lab), TALON (Mortazavi Lab), SWAN (Mortazavi Lab), IsoTools (Herwig lab), IsoQuant (Tilgner Lab), Bambu (Goke lab) , SQANTI (Conesa lab) and tappAS (Conesa lab).

This tutorial will include a wide variety of tools for both Nanopore and Pacbio RNA-seq tools, as well as an extensive benchmarking framework for this type of applications.This tutorial will contribute to disseminating good analysis practice among the transcriptome community starting to use long reads sequencing in their analysis.

Intended audience and level:
Beginner or intermediate. This tutorial will be of broad interest to researchers from academia or industry who started to analyze Nanopore and Pacbio long reads transcriptomics dataset and need guidance on alternative analysis methods. The tutorial is also useful for developers of lrRNA-seq analysis tools as it presents an extensive benchmarking platform and gives access to the utilization to competing tools by their developers.
Attendees are expected to have basic Unix command line skills and some familiarity with R/Rstudio. Programming knowledge is not required though most of the tools are written in Python.

Software/hardware requirements:
Attendees are expected to supply their own laptops and have installed R/RStudio and the tools included in the tutorial. We will be using a shared instance in AWS for those programs that require heavy computation. Instructions on how to install all software tools will be provided to participants ahead of the tutorial and a zoom session will be organized before the meeting to troubleshoot any installation problems

Maximum Participants: 40

- top -

Tutorial IP4: GA4GH: An introduction to federated genomics using the GA4GH Starter Kit and real-world data platforms

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Yasasvini (Yash) Puligundla, Broad Institute, United States
Ian Fore, National Cancer Institute, United States

The Global Alliance for Genomics and Health (GA4GH) is an international technical standards-setting organization, enabling genomic data sharing. Through its standards, GA4GH aims to promote a federated model of data sharing, in which researchers can seamlessly access data from multiple sources in an international network using common tools and protocols, and data providers can securely share data with trusted researchers while still maintaining ownership and control over their data. In this tutorial, we explore four GA4GH standardized API interfaces that enable federated data access and analysis. The first API standard, Data Repository Service (DRS), provides minimal metadata and access information about files that can be used as input to analytical workflows. The second standard, Workflow Execution Service (WES), enables researchers to remotely run workflows defined in Common Workflow Language (CWL), Workflow Description Language (WDL) or Nextflow on an input dataset of interest. The third standard, Data Connect, allows researchers to search and filter biomedical datasets based on criteria of interest. Lastly, GA4GH Passports grants researchers fine-grained permission sets for data and compute resources they are allowed to access, and is used to control researcher access to resources behind the other three APIs.

In the first section of the tutorial, we use GA4GH-aware client tools to reach out to real-world data platforms that have adopted the four API standards, including systems maintained by NIH, Seven Bridges, ELIXIR, and DNAstack. On these platforms, we search and obtain access to controlled datasets, ultimately using them as inputs to an analysis workflow of interest.

In the second section of the tutorial, we deploy a server implementation network of the four API standards using a reference implementation suite known as the GA4GH Starter Kit. Using the same set of client tools, we run the same search, access, and analyze protocol on our local GA4GH network.

Overall, tutorial participants will gain with hands-on experience with the core GA4GH standards that enable federated analysis, from both the perspective of the researcher making use of these web services, and the data provider setting up secure, GA4GH-compliant services.

Learning Objectives for Tutorial:

  • Gain an understanding of the key GA4GH standards in controlled data access, data discovery, and workflow execution, and how they enable federated genomic analysis
  • Learn about and establish accounts with some of the major data platforms implementing GA4GH standards
  • Run simulated federated analyses on these platforms via standardized patterns and protocols outlined in GA4GH specifications.
  • Set up a local network of GA4GH web services using out-of-the-box implementations, the GA4GH Starter Kit.

Intended audience and level:
Intermediate to Advanced

  • Attendees should be familiar with a programming language (code/script examples will mainly be done in Python), and have some understanding of web programming principles, such as REST APIs. Students should ideally have knowledge of containerization technologies such as Docker, as this will be used to set up the Starter Kit.
  • full breakdown of the technologies that should be set up on attendees’ laptops will be distribvute prior to the tutorial.
  • Attendees to have any prior knowledge of GA4GH or GA4GH Standards.
  • If participants wish to access controlled access data in dbGaP to which they have access they should ensure their accounts are in good standing and have the requisite user credentials. Note that participants must know and abide by the terms of the Data Use Agreements or other terms under which they have been granted access to data.

Maximum Participants: 40

- top -

Tutorial IP5: Julia for Data Science

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Claudia Solis-Lemus, University of Wisconsin-Madison, United States
Douglas Bates, University of Wisconsin-Madison, United States
Sam Ozminkoski, University of Wisconsin-Madison, United States
Bella Wu, University of Wisconsin-Madison, United States

Julia has been called the programming language of the 21st century for scientific computing, data science, and machine learning. As a high-level, high-performance, dynamic language, Julia is faster than other scripting languages because of smart design decisions like type-stability through specialization via multiple-dispatch. Julia's code can be efficient and concise, which leads to clear performance gains. In addition, Julia's environments are fully reproducible and it is easy to express object-oriented and functional programming patterns.

This tutorial will provide an introduction to key Data Science tools in Julia such as reproducible project management with DrWatson.jl, data management with Arrow.jl and Tables.jl and (Generalized) linear mixed models with MixedModels.jl. Unlike widely used R packages, all packages that we will describe are written 100% in Julia thus illustrating the language’s potential to overcome the two-language problem.

This tutorial will appeal to anyone interested in learning more about Julia and some of the existing Julia packages that are already available for Statistics and Data Science. In addition to lectures, participants will engage in hands-on exercises. For example, participants will bring a dataset of their choice along with an existing script written in another language (R or python) that performs certain data analyses. During the tutorial, participants will translate their work to Julia in-order to compare running times and ease of programming.

Learning Objectives for Tutorial
At the end of the tutorial, participants will be able to:

  1. Identify the main features that make Julia an attractive language for Data Science
  2. Set up a Julia environment to run their data analysis
  3. Organize reproducible projects with the project management of DrWatson.jl
  4. Efficiently handle datasets (even across different languages) through Tables.jl and Arrow.jl
  5. Fit (generalized) linear mixed models with MixedModels.jl
  6. Communicate across languages (Julia, R, python)

Intended audience and level:
The tutorial is intended for any data scientist with experience in R and/or python who is interested in learning the attractive features of Julia for Data Science. No knowledge of Julia is required.

Maximum participants: 30

- top -

Tutorial IP6: Data processing and visualization in the cloud with Terra, Dockstore, and Galaxy

Room: TBD
Sunday, July 10, 11:00 am – 6:00 pm CDT

Organizer(s):
Farzaneh Khajouei, Broad Institute of MIT and Harvard, United States
Elizabeth Kiernan
, Broad Institute of MIT and Harvard, United States
Anton Kovalsky, Broad Institute of MIT and Harvard, United States
Tiffany Miller, Broad Institute of MIT and Harvard, United States

Biomedical data is rapidly expanding, accelerating breakthrough discoveries that can improve human health. But this poses challenges with respect to data, accessibility, cost and security. To overcome these challenges, researchers are turning to cloud computing, where a new research landscape contains interoperable, community-driven components that enable robust analyses for a variety of research needs. To harness these resources, researchers must not only understand how cloud products and platforms work but at the same time learn how they work together. In this tutorial, we will guide you through a research journey that highlights the capabilities of cloud components like Terra, Dockstore, Galaxy, and Single-Cell Portal that allow you to find data that meets your research interests, process and interrogate that data with community-developed tools, and share your reproducible analysis results. Via hands-on exercises, you will integrate tools from these interoperable platforms to complete an example end-to-end analysis with single-cell RNA sequencing data.

Learning Objectives
After the workshop, trainees will be able to:

  1. Understand how Terra, Dockstore, and Galaxy interact within the cloud ecosystem.
  2. Find, organize, and manipulate cloud data in Terra.
  3. Find, configure, and run WDL workflows that meet data processing needs.
  4. Visualize and share workflow outputs using Galaxy and Single-Cell Portal tools.
  5. Apply cloud concepts by analyzing and visualizing an example single-cell RNA-seq dataset.

Intended audience and level:
Researchers and tool developers interested in ways to maximize data and analysis resources in the cloud. Coding experience is helpful, but not required, and participants should have basic familiarity with genomics terminology and standard high-throughput sequencing data formats.

Maximum Participants: 40

- top -

Tutorial VT1: Introduction to Python programming for bioscientists

Zoom Presentation
Wednesday, July 6, 9:00 am - 1:00 pm CDT (part 1)
Thursday, July 7, 9:00 am - 1:00 pm CDT (part 2)

Organizer(s):
Pedro de Carvalho Braga Ilídio Silva, University of São Paulo, Brazil
Renato Augusto Corrêa dos Santos,University of São Paulo, Brazil
Hemanoel Passarelli Araujo, Federal University of Minas Gerais, Brazil
Vinícius Henrique Franceschini dos Santos, University of São Paulo, Brazil

Programming skills are increasingly necessary for scientists working with biological data analysis and bioinformatics. Python has been widely used in biology and it is a high-level programming language, which makes it relatively easy to learn compared to others. In this tutorial, we introduce the first steps in analyzing biological data using Python in digital notebooks, which facilitates code documentation, real-time visualization of results, and sharing. The practice will be carried out on Google Meet and Google Colab.
Basic programming tasks will be presented, including variable assignment employing the main data structures (e.g., strings, lists and dictionaries), data types (e.g., numbers and sequences of characters), and operations (e.g., loops, comparisons, and decision structures). To provide a real example of the Python application in bioinformatics, we will use SARS-CoV-2 amino acid alignment data in a case study to apply data structures and methods from the Biopython toolkit to obtain information about different COVID-19 variants.

Learning Objectives for Tutorial:

  • Introduce Google Colab digital notebooks;
  • Present the basic logic and data structures in Python;
  • Provide hands-on experience in analyzing biological sequences using Biopython.

Intended audience and level:
Researchers at different education levels with interest in learning programming skills for bioinformatics; level of programming skills: beginners.

Maximum Participants: 30

- top -

Tutorial VT2: Building Interactive Visualizations of Genomics Data with Gosling

Zoom Presentation
Wednesday, July 6, 9:00 am - 1:00 pm CDT

Organizer(s):

Sehi L'Yi, Harvard Medical School, United States
Trevor Manz, PhD Student, Harvard Medical School, United States
Qianwen Wang, Postdoctoral Research Fellow, Harvard Medical School, United States
Nils Gehlenborg Associate Professor, Harvard Medical School, United States

Most existing genomic visualization tools are tailored toward specific use cases, lacking the generalizability for reuse and expressivity to build interactive visualizations that scale to the diverse data types and analysis tasks in genomics. The Gosling visualization grammar for genome-mapped data (http://gosling-lang.org, https://pubmed.ncbi.nlm.nih.gov/34596551) defines primitives that specify how genomics datasets can be transformed and mapped to visual properties, providing building-blocks to compose unique scalable and interactive genomics data
visualizations on the web.

The Gos Python library (https://gosling-lang.github.io/gos/) is designed to enable
computational biologists to quickly author Gosling-based visualizations with their own data. In our tutorial, we introduce core concepts of genomic data visualizations and illustrate how they can be applied through hands-on training with Gos in Jupyter Notebooks.

Objectives:

  • To introduce concepts of genomic data visualization
  • To introduce Gosling grammar for defining interactive genomic data visualizations
  • To provide hands-on experience with Gos to author interactive Gosling visualizations in Jupyter Notebooks
  • To highlight how unique Gosling primitives like “semantic zooming” can be leveraged to reveal patterns across scales

Maximum Participants: 45

- top -

Tutorial VT3: Federated Learning in Biomedicine

Zoom Presentation
Wednesday, July 6, 9:00 am - 1:00 pm CDT

Organizers:
Julian Matschinske, University of Hamburg, Germany
Julian Späth, University of Hamburg, Germany
Niklas Probul, University of Hamburg, Germany
Mohammad Bakhtiari, University of Hamburg, Germany

The vast amount of biomedical data produced by recent sequencing technologies have
shown to be a valuable resource for machine learning models to better understand biological mechanisms and pathways. While machine learning models generally depend on centralized datasets, unfortunately, this is not suited for sensitive medical data, which is often distributed across different institutions and cannot be easily shared due to high privacy or security concerns.

Federated learning, a method proposed by Google in 2017, allows the training of machine learning models on geographically or legally divided datasets without sharing sensitive data. When combined with additional privacy-enhancing techniques, such as differential privacy or secure multi-party computation, it can serve as a privacy-aware alternative to central data collections while still enabling the training of machine learning models on the whole dataset.

This is achieved by exchanging (possibly obfuscated) model parameters only. However, in such federated settings, both algorithms as well as the required infrastructure are much more complex than for centralized machine learning approaches. To address this, various federated learning tools have been developed and published recently that try to fill this gap and make the usage and development of federated algorithms easier, more intuitive, and applicable for data scientists without requiring profound software engineering capabilities.

In this tutorial, first, the theory of federated learning will be introduced using Python examples. The risk of privacy leaks is demonstrated to show the necessity of additional privacy-enhancing techniques, which are introduced afterward. The acquired knowledge will then be put to use with the help of two tools, namely PySyft1 and FeatureCloud2. These tools allow for implementing and executing federated algorithms in a truly federated production setting and will be used to provide the attendants with a practical hands-on experience, involving a real-world biomedical dataset and prediction task.
In the end, this tutorial will provide the attendants with both theoretical and practical knowledge about federated learning and privacy-enhancing techniques in the context of biomedicine and demonstrate the whole development process from the conception of the algorithm to deployment to a production system.
1 https://github.com/OpenMined/PySyft
2 https://featurecloud.ai

Learning objectives

  • Federated learning theory and hands-on experience
  • Privacy-enhancing techniques (differential privacy, secure-multiparty computation)
  • Tools to implement federated algorithms/methods (sklearn, FeatureCloud, PySyft)
  • Deployment of federated algorithms/methods

After attending the tutorial, attendants should have a solid understanding of what federated learning is, how it can be used to perform privacy-aware machine learning on distributed datasets using the techniques mentioned above, and how to practically bring such implementations to the user.

Intended audience and level:
Programming skills (ideally Python) and past experience with machine learning are advised.

Maximum Participants: 30

- top -

Tutorial VT4: Towards Precision Medicine with Graph Representation Learning

Zoom Presentation
Thursday, July 7, 9:00 am - 1:00 pm CDT

Organizers:
Michelle M. Li,
Harvard University, United States
Marinka Zitnik, Harvard University, United States

Learn more here: https://zitniklab.hms.harvard.edu/biomedgraphml/ 

Biomedical networks are universal descriptors for systems of interacting elements, from molecular interactions and disease co-morbidity to healthcare systems and scientific knowledge. With the remarkable success of representation learning in providing powerful predictions and insights, we have witnessed a rapid expansion of representation learning techniques into modeling, analyzing, and learning with such networks. Concretely, given a biomedical network, a representation learning method can transform the graph to extract patterns and produce compact vector representations that could be optimized for downstream tasks. Areas of profound impact include identifying variants underlying complex traits, disentangling behaviors of single cells and their effects on health, fusing electronic health records with biomedical knowledgebases to diagnose patients, and developing safe and effective treatment regimens.

In our tutorial, we will cover key advancements in graph representation learning over the last few years, with an emphasis on new opportunities in biomedicine enabled by such advancements. We will start with a technical exposition of prevailing graph learning paradigms, from classic network propagation methods to state-of-the-art graph neural networks. We will then demonstrate the impact of such techniques on accelerating research in computational biology and precision medicine. In doing so, we will present a toolbox of modern graph representation learning algorithms for biomedicine.

Intended audience and level:
The target audiences are graduate students, researchers, scientists, and practitioners in both academia and industry who are interested in applications of graph machine learning in biomedicine (Broad Interest). The tutorial is aimed towards entry-level participants with knowledge of the fundamentals of network biology and machine learning, and ideally some basic experiences in graph representation learning (Beginner or Intermediate). Although the first half of the tutorial will focus on introducing networks and predominant graph machine learning paradigms, it will be helpful to have a preliminary understanding of basic graph representation learning methods.

Materials availability. The tutorial slides and materials for hands-on exercises (e.g., code implementation, datasets) will be posted online prior to the tutorial and made available to all participants.

Maximum Participants: 100

- top -

Tutorial VT5: Computational analysis of antibody repertoires, with applications for therapeutic discovery

Zoom Presentation
Thursday, July 7, 9:00 am – 1:00 pm CDT

Organizer(s):
Fergus Boyles, University of Oxford, United Kingdom
Matthew Raybould, University of Oxford, United Kingdom

High-throughput computational methods for B-cell receptor/antibody analysis are enabling the study of deep-sequencing samples of the human repertoire. Whether applied to improve our understanding of the adaptive immune system, perform immunodiagnosis, assess vaccine performance, or generate lead molecules for therapeutic discovery, these approaches are set to become a key component of the 21st century immunologist’s toolkit.

Such in silico analysis algorithms must strike a careful balance between accuracy and
throughput, to cope with the complexity of immune repertoire samples. They tend to take the form of clustering algorithms that convert antibody repertoire sequences into clonal lineages or more general groups of molecules likely to have sufficient chemical similarity to engage the same antigens. Classically these algorithms have assessed antibody sequence properties, but structure prediction tools are now reaching the required throughput and accuracy to yield meaningful three-dimensional representations of repertoires of antibodies.

These methods promise to offer unprecedented resolution and understanding of the
sampled binding sites (‘paratopes’). We will explore these technologies through a mock in silico drug discovery case study, starting from convalescent patient repertoires that represent a trove of potential human-expressible therapeutic antibodies. You will gain hands-on experience with the latest approaches (both sequence-based and structure-based) that have the speed required to enable computational early-stage discovery directly from these repertoire sequences. We’ll first identify the convergent (seen across multiple individuals) disease-responding antibodies, analyse them for likelihood of pathogen complementarity, and finally assess them for developability concerns such as stability and humanness.

Learning Objectives:

  • Use the latest techniques to cluster patient B-cell receptor repertoire sequencing data and identify convergence in pathogen response across individuals
  • Use sequence-based approaches to identify convergent antibodies with a high probability of binding to/neutralising pathogens of interest
  • Build 3D structural models of antibody sequences of interest and use them to develop a structural hypothesis for their mode of action
  • Use a sequence-based machine learning approach to confirm that the antibody sequences are indeed fully ‘human’ (and so are unlikely to cause immunogenicity as a therapeutic)
  • Use a recent structure-based method to compare candidate antibodies to late-stage clinical trial therapeutics, identifying any with extraneous biophysical properties associated with poor developability (solubility, viscosity, expression, etc.)

Intended audience and level:
This tutorial is intended for those with some prior knowledge of immunology/biology who would like to learn about functional profiling of antibody repertoires, and for those with interests in how to computationally analyse antibody repertoire datasets to discover novel therapeutics. No prior experience with sequence analysis or structural modelling tools is assumed, but familiarity with working on unix-based systems is strongly recommended for the practicals. For part of the tutorial we will use the Oxford Protein Informatics Group's suite of antibody modelling tools, which are available both as an online web server and as a Singularity container.
Prior experience of working with Singularity containers is not required, but attendees will need a working installation of Singularity if they wish to run the tools which depend on SAbDab-SAbPred on their machine.

Maximum participants: 50

- top -

Tutorial VT6: Online tools for visualizing RNA structure

Zoom Presentation
Thursday, July 7, 9:00 am - 1:00 pm CDT

Organizer(s):
Afaf Saaidi, Georgia Institute of Technology, United States

This tutorial will cover the available tools and web servers to visualize the structure of RNA with an emphasis on the secondary structure. While biological function could be understood through the structure, a suitable visualization appears to be a milestone to allow better identification of functional areas and better comparison between structures. Sketching the structure of RNA remains difficult from an algorithmic point of view especially when the structure contains pseudo-knots. Today we have a wide range of tools that aim to optimize the representation of the structure either using template or template-free approaches. In this tutorial, we will cover the most used tools and current web servers offering structure visualization utilities through a lecture, a demonstration of existing tools and web servers, and hands-on training sessions. The audience will be equipped with the essential knowledge to select the most appropriate tool to use when needed.

Learning objectives:

  • Recall the available tools for RNA structure visualization.
  • Equip the audience with the essential to know about the visualization of RNA
    structure.

Learning goals:

  • To Understand the utility of each tool and become aware of the appropriate tool to use in need.
  • To Become familiar with the latest visualization web servers.
  • To be aware of the limitations and challenges of drawing an RNA structure.

Intended audience and level:
Computational Biologists who are dealing with RNA structure, beginner or intermediate level.

Maximum Participants: 30

- top -