Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

ISMB 2020 - Tutorials

ISMB 2020 features pre-conference tutorial sessions on Sunday, July 12, 2020 one day prior to the start of conference scientific program.

Register early for tutorials as seating is limited.

Tutorial attendees should register using the on-line registration system - pricing is available at https://www.iscb.org/ismb2020-registration. Tutorial participants must be registered for the ISMB conference to attend a tutorial. Attendees will receive a Tutorial Entry Pass (ticket) at the time they register on site.   

Lunch is not included as part of tutorial program.

 

Tutorial FD1: Mutational signature analysis: pipelines, machine learning, and benchmarking on synthetic data

Sunday, July 12, 9:00 am - 6:00 pm

Room: TBD

Presenters

Steven G. Rozen, Duke NUS Centre for Computational Biology Duke-NUS Medical School, Singapore.
Arnoud Boot, PhD Postdoctoral Fellow Duke-NUS Medical School, Singapore

Overview

Mutational signature analysis focuses on patterns of mutations across the genome to infer their causes, and is now an essential component of cancer-genomics studies. Over the last decade, mutational signatures have revealed endogenous mutational processes that are widespread in many cancer types but that were not previously known. Signatures also showed that exposure to naturally occurring mutagens that cause liver cancer is much more widespread than suspected. Mutational signature analysis can also provide insight into the causes of specific oncogenic mutations and can reveal gaps in our understanding of the mechanisms of DNA damage and repair. Mutational signatures can either be delineated in experimental systems (e.g. cell culture or rodents) or can be discovered by machine learning across sets of hundreds to 10s of thousands of tumors. More than 100 mutational signatures have been described, many of which have unknown causes. In line with the importance of mutational signature analysis, there are now ~20 software packages that use machine learning to discover mutational signatures and assess their activity in tumors. Unfortunately, however, the cancer genomics literature contains numerous erroneous mutational signature results stemming from uncritical application of these packages.

We will cover the basic concepts of mutational signature analysis and show how this analysis is important for understanding cancer development, for detecting mutational exposures that cause cancer, and for understanding DNA damage as processed by normal and defective DNA repair. We will introduce the computational analysis needed to delineate mutational signatures in experimental systems (e.g. cultured cells or rodents), including the computational subtraction of the signatures of background mutagenesis and of experimental artifacts. We will cover in detail machine learning approaches to discovering mutational signatures in large sets of tumors and the strengths and weaknesses of these approaches. We will also discuss in depth the importance of benchmarking the machine-learning approaches on synthetic data. Finally, the tutorial will show examples of the importance of interpreting machine-learning results in the light of all available evidence to obtain biologically relevant results.

This tutorial will equip participants with the ability to run machine-learning software to discover mutational signatures and to assess their activity in tumors and with strategies to evaluate the biological relevance of the results.

Audience

Computing experience: there will be exercises using the command line in Python and R. Participants must have a laptop with Python and R installed, along with the following packages (draft list):
https://cran.rstudio.com/web/packages/ICAMS/index.html
https://pypi.org/project/sigproextractor/
https://github.com/nicolaroberts/hdp
Relatively small data sets will also need to be downloaded before the tutorial.

Participants will need a basic understanding of genome organization, mutations, and modern high throughput Illumina-type sequencing (BAM files, variant call files, etc.)

Maximum Participants: 60

Schedule Overview
9:00 - 10:00 am Overview of mutational signatures
  • Introduction of instructors and goals of tutorial
  • What are mutational spectra and mutational signatures and what are they good for?
  • "Simple" mutational signatures based on single-nucleotide substitutions in trinucleotide context
  • Application of mutational signatures
    • in cancer epidemiology
    • to study DNA damage and repair
    • to study oncogenesis (the origins of tumors)
  • Computational analysis of experimentally delineated mutational signatures and spectra, including subtraction of signatures of background mutagenesis and of experimental artifacts
  • Mutational signatures of small indels and doublet base substitutions and of single-base substitutions in extended contexts
10:00 - 10:30 am Case study: Kucab et al., 2019
10:30-11:00 am (Hands on) Computational analysis of experimentally delineated mutational signatures
11:00-11:15 am Coffee break
11:15-12:00 pm (Hands on) Computational analysis of experimentally delineated mutational signatures; subtracting the signatures of background mutagenesis and of experimental artifacts
12:00 - 1:00 pm Machine learning for discovering mutational signatures
  • Non-negative matrix factorization based approaches
  • Hierarchical Dirichlet processes approaches
  • The twin problems of signatures discovery and determining how much of each signature is present in a tumor (“signature attribution”)
  • Challenges in signature discovery and attribution
  • Signature attribution as a separate problem from signature discovery (using COSMIC and/or experimental signatures)
  • Strategies for evaluating biological relevance of results
1:00 - 2:00 pm Lunch Break (lunch is not included as part of tutorial)
2:00 - 3:00 pm (Hands on) Non-negative matrix factorization for learning mutational signatures
3:00 - 4:00 pm (Hands on) Hierarchical Dirichlet processes for learning mutational signatures
4:00 - 4:15 pm Coffee Break
4:15 - 5:00 pm The importance of testing signature discovery and signature assignment on synthetic data
  • Considerations in generating synthetic data and assessing accuracy of discovered signatures
  • Considerations in testing signatures assignment on synthetic data
5:30 - 6:00 pm Future perspectives and summary
  • A broader perspective on mutational signatures
    • How to estimate what signatures (and by extension, which mutational processes) might have generated a specific mutation
    • Mutational signatures and the genome landscape, replication and transcriptional strand bias
    • Mutational signatures of structural (large-scale) genomic variation
  • Pointers to resources
  • Promise of mutational signatures analysis and open research questions
  • Summary

 

Tutorial FD2: Finding and analyzing data in the cloud with Gen3, Dockstore, Terra, and Galaxy

Sunday, July 12, 9:00 am - 6:00 pm

Room: TBD

Presenters

Geraldine Van der Auwera, Broad Institute of MIT and Harvard, United States
Robert Majovski, Broad Institute of MIT and Harvard, United States

Overview

The era of big data for biomedical research is here. Massive data sets and cloud-based platforms will enable breakthrough discoveries while overcoming challenges of cost, accessibility, and security. A key strength of this new research landscape is the availability of interoperable, community-driven components that enable robust analyses for a variety of research needs.

One challenge to fully realizing this vision for your research is not only learning how several new products and platforms work, but at the same time learning how they work together . In this full-day tutorial, we will guide you through a research journey that highlights the capabilities and components of the NHGRI Genomic Data Science Analysis, Visualization and Informatics Lab-space (AnVIL) resource. You will integrate a suite of interoperable platforms to complete a sample project, gaining working knowledge of how the components work together to perform an end-to-end genetic analysis.

Specifically, you will learn how to:

  • Find and access data in Gen3
  • Locate analysis tools in the Dockstore repository
  • Bring these data and tools together into a computational workspace in Terra
  • Process data with automated, reproducible analysis pipelines
  • Leverage Hail and Bioconductor in Jupyter Notebooks to do interactive analysis
  • Perform genome-wide association studies with Galaxy workflows
Audience

While we will work in the context of AnVIL, you will be able to apply your new skills to myriad other genomic-related data sets and tools. Attendees must bring a WiFi-enabled laptop with the Chrome browser installed. Prior coding experience (R and/or Python) is required.

Maximum Audience: 40

Schedule Overview
  1. Section I: Introduction
  2. Section II: Finding and analyzing data in the cloud with Gen3, Dockstore and Terra
    • Find and access data in the Gen3
    • Locate analysis tools in the Dockstore repository
    • Export both data and tools to Terra and run an analysis
  3. Section III: Interactive analysis
    • Find data
    • Hail with Jupyter Notebooks in Terra
    • Bioconductor with Jupyter Notebooks in Terra
  4. Section IV: Genome-wide association study workflows
    • Galaxy workflows and complementary components

 

Tutorial AM1: Full-Length RNA-Seq Analysis using PacBio long reads: from reads to functional interpretation

Sunday, July 12, 9:00 am - 1:00 pm

Room: TBD

Presenters

Ana Conesa, University of Florida, United States
Elizabeth Tseng, Pacific Biosciences, United States
Angeles Arzalluz, Polytechnical University Valencia, Spain
Francisco Pardo, Polytechnical University Valenciam, Spain

Overview

The PacBio Single-Molecule Real-Time sequencing technology produces highly accurate long reads that is suitable for full-length RNA sequencing. The Iso-Seq method generates full-length transcript sequences of 10 kb or longer that does not require transcript assembly or error correction. The high accuracy (>99%) of Iso-Seq transcripts allows for unambiguous characterization of alternative splicing events, direct ORF prediction without a reference genome, and identification of single cell barcodes.

The unique features of Iso-Seq data requires a special set of bioinformatics tools that typical short read RNA-seq tools fail to provide. The PacBio SMRT Analysis software processes raw sequencing data into full-length transcript sequences, which can then be analyzed with community tools that have been developed specifically for long read data: SQANTI compares Iso-Seq transcripts against known annotations (ex: GENCODE) to classify novel vs known genes and transcript, and remove artifacts; IsoAnnot functionally annotates Iso-Seq transcripts; tappAS compares multiple Iso-Seq samples to identify differential features. Existing RNA-Seq short read data are often paired with Iso-Seq data to strengthen the analysis.

Further, the Iso-Seq method can also be applied to single cell analysis. Matching single cell libraries of both long and short read data can be generated and combined to using the deeper coverage of short reads to identify cell types, while using matching cell barcodes to link fulllength isoforms generated by the long-read data back to individual cell types.

In this tutorial, we provide an overview of the Iso-Seq tools for both bulk and single cell RNAseq analysis and guide the audience through hands on analyses.

Audience

Beginner or intermediate. This tutorial will be of broad interest to researchers from academia or industry who want to learn to understand the unique features and tool sets of long read RNA sequencing (Iso-Seq) data using PacBio’s SMRT Technology.

Attendees are expected to have basic Unix command line skills and some familiarity with R/Rstudio. Programming knowledge is not required though most of the tools are written in Python.

Maximum Audience: 40

Requirements

Attendees are expected to bring their own laptops and have installed R/RStudio and the tappAS software. We will be using a shared instance in AWS for the first part of the analysis (Iso-Seq and SQANTI), then running tappAS on the local laptops.

Schedule Overview
9:00 - 9:30 am Introduction
  • Introduction to PacBio SMRT Technology and the Iso-Seq method (full-length RNA-Seq)
  • Review the official PacBio software (SMRT Analysis/BioConda)
  • Review the downstream community Iso-Seq tools: SQANTI, IsoAnnot, TAPPAS
9:30 - 10:15 am Demo & Hands-On Session: Iso-Seq using BioConda
  • Using a small human whole transcriptome Iso-Seq dataset, run through the Iso-Seq
  • pipeline using BioConda
  • Learn to visualize transcript GFF files in IGV and UCSC genome browser
10:15 - 11:00 am Demo & Hands-On Session: Functional analysis of Iso-Seq data
  • Use SQANTI to annotate novel/known genes/transcripts and remove artifacts
  • Use IsoAnnot to functionally annotate transcripts
  • Use tappAS to identify differentially expressed features across samples
11:00 - 11:15 am Coffee Break
11:15 - 11:45 am Single Cell Iso-Seq
  • Overview of applying long read sequencing for single cell transcript analysis
12:15 - 12:45 pm Hands-On Session: Single Cell Iso-Seq + RNA-Seq
  • Using a small single cell Iso-Seq and RNA-seq dataset, run through the Iso-Seq single cell pipeline
  • Learn to combine long & short read single cell data using Rstudio
12:50 - 1:00 pm Wrap Up

 

Tutorial AM2: A practical introduction to biomedical text mining in the era of deep learning

Sunday, July 12, 9:00 am - 1:00 pm

Room: TBD

Presenters

Qingyu Chen, National Library of Medicine, National Institutes of Health
Robert Leaman, National Library of Medicine, National Institutes of Health
Cecilia Arighi, Delaware Biotechnology Institute, University of Delaware
Zhiyong Lu, National Library of Medicine, National Institutes of Health

Overview

The volume of biomedical literature is growing at an exponential rate. PubMed, a biomedical literature search engine managed by the National Library of Medicine, has ~2 new articles indexed per minute. Such rapid growth challenges manual information extraction, curation and annotation. Biomedical text mining aims to apply natural language processing techniques to biomedical literature and automatically assist biocurators, biologists and health professionals to overcome the burden. Biomedical text mining has matured significantly in recent years. More specifically, deep learning – end-to-end neural networks inspired by biological systems – has achieved state-of-the-art performance in a range of biomedical text mining applications. In the bioinformatics community, the use of text mining via deep learning to support other research in the biological and medical sciences has been increasing. Not restricted to standalone tools, deep learning models have also been fully deployed to public web servers, further improving the quality of biomedical text mining tools and lowering the barriers for non-specialists.

This tutorial aims to familiarize the audience with an introduction to text mining the biomedical literature using deep learning methods and to provide hands-on training. The tutorial will address questions such as “What is biomedical text mining?”, “What is deep learning?”, “How can deep learning be applied to address biomedical text mining problems?”, and “What biomedical text mining tools are currently available?”. The tutorial will cover the basics of biomedical text mining and deep learning with concrete examples. The latest deep learning methods in biomedical text mining will also be explained and discussed. Also, the audience will have the opportunity to get the first hands-on experience to develop their deep learning models in biomedical literature analysis. Topics include:

  • Fundamentals of biomedical text mining and literature mining
  • Overview of deep learning in biomedical text mining
  • Word, sentence, concept embeddings for biomedical textual analysis
  • Public biomedical text mining tools for biomedical information retrieval and extraction
  • Case studies: biomedical literature analysis

This tutorial is an activity of the ISCB COSI on Text Mining.

Audience

We intend the tutorial to be for participants who are not text mining specialists but use or are interested in using it. This tutorial will provide a brief introduction, including describing existing tools and datasets. In addition, the session will provide an opportunity to describe their needs to text mining specialists.

Maximum Audience: 60

Requirements

None, if participants just wish to listen. Those who would like to also participate in the hands-on exercises are required to provide their own laptop and should have a basic knowledge of programming in Python.

Schedule Overview
9:00 - 9:30 am Introduction to biomedical text mining
  • Biomedical text processing pipeline
  • Biomedical text mining use cases
9:30 - 10:00 am Introduction to deep learning
  • Basics of deep learning
  • An overview of different deep learning models (more detail in the later session)
10:00 - 11:00 am Biomedical language models
  • Word embedding
  • Concept embedding
  • Sentence embedding
  • Contextual embedding (ELMO and BERT)
11:00 - 11:15 am Coffee Break
11:15 - 12:00 pm Demonstration: deep learning tools and datasets for biomedical text mining tasks
  • Named entity recognition
  • Relation extraction
  • Document classification
  • Sentence retrieval
  • Literature-based discovery
12:00 - 12:50 pm Hands-on Session: biomedical literature analysis
12:50 - 1:00 pm Q&A and feedback

 

Tutorial AM3: BioC++ - solving daily bioinformatic tasks with C++ efficiently

Sunday, July 12, 9:00 am - 1:00 pm

Room: TBD

Presenters

René Rahn, Max Planck Institute for Molecular Genetics, Algorithmic Bioinformatics, Germany
Svenja Mehringer, Free University Berlin, Algorithmic Bioinformatics, Germany
Marcel Ehrhardt, Free University Berlin, Algorithmic Bioinformatics, Germany

Overview

In this half-day tutorial we are going to teach how to use modern C++ and utilise modern C++ libraries to rapidly develop tools and scripts for operating on and manipulating large-scale sequencing data.

Motivation

The high variability and heterogeneity often observed within various genomic data is challenging for many standard tools, for example for read alignment and variant calling. Often, these tools are wrapped in complicated pre- and postprocessing data curation steps in order to obtain results with higher quality. However, these additional steps incur a high maintenance and performance burden to the established work process and often do not scale with larger data sets. Seldomly, C++ is considered as the language of choice for these small processes, although it is the main language used in high-performance computing. We are going to show that implementing modern C++ can be as easy as using other modern high-level languages.

Course outline:

This tutorial is organised as a half-day tutorial. At the beginning we are going to introduce fundamental concepts and principles of the C++ programming language. Further, we will teach how modern C++ features such as ranges and concepts can be used to rapidly develop high-quality C++ applications. This introduction to C++ follows a practical session were participants will read in typical files from sequencing experiments using the C++ library SeqAn and operate on the data with the taught principles to solve diverse problems, e.g. filtering out reads with low sequencing quality and others. In the last 30 minutes of the day we are going to summarise the learned concepts and compare the developed methods to current approaches.

Audience

This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, proteomics, read alignment, variant detection, etc.). A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge in programming with any high-level programming language, e.g. Python, Java or C++. Some basic C++-knowledge is helpful but not mandatory to successfully complete the course.

This tutorial is targeting beginners and intermediate C++ developers that want to learn more about modern C++ features like ranges and concepts.

Requirements:

Attendees should bring their own laptop.
Software for the tutorial can be installed beforehand, but we will also dedicate some extra time for installing required software during the tutorial.

  • Git
  • g++ >= 7
  • SeqAn 3 - (https://github.com/seqan/seqan3)
  • CMake >= 3.12

or, VirtualBox if the attendee wishes to use the provided virtual image running Ubuntu.

Maximum Attendees: 25

Schedule Overview
9:00 - 10:30 am Introduction to modern C++ [talk: 30 min]
Initial app and parsing sequencing data [hands-on: 60 min]
10:30 - 11:00 am Coffee Break
11:00 - 12:30 pm Filtering and data manipulation (hands-on)
12:30 - 1:00 pm Wrap-up [talk: 30 min]

 

Tutorial PM1: Translational use of multifaceted RNA-Seq bioinformatics analysis in genetic disease investigation

Sunday, July 12, 2:00 pm - 6:00 pm

Room: TBD

Presenters

Gavin R. Oliver, Center for Individualized Medicine, Mayo Clinic, United States
Garrett Jenkinson, Mayo Clinic, United States
Eric W. Klee, PhD, Center for Individualized Medicine, Mayo Clinic, United States

Overview

RNA-Seq is increasingly being recognized as a testing modality with significant untapped potential in the field of genetic disease studies. These data present a unique opportunity for diverse multifaceted analysis. Data profiling methods including expression outlier analysis, aberrant splicing detection, fusion transcript identification and allele-specific expression have been demonstrated to achieve genetic diagnosis of diseases escaping resolution through traditional clinical and research-based DNA-testing. Recent published works have highlighted the ability to increase diagnostic rates by as much as 35% utilizing RNA-Seq analysis, but analytical workflows are diverse and non-trivial to implement or interpret. This tutorial focuses on the utilization of RNA-based analysis for the improved diagnosis of rare genetic disease. An introduction will be given to the current state of genetic disease diagnostics and the benefits revealed to date by RNA-Seq. RNA-based testing paradigms will be introduced individually and discussed in terms of translational utility with a focus on data analysis methodologies and considerations. Each computational analysis solution will be overviewed with hands-on sessions highlighting the analytical capabilities of a specific informatics solution for each testing paradigm. Means of prioritizing results based on biological and phenotypic relevance will be addressed and cutting-edge computational solutions demonstrated. Finally consideration will be given to the principles and considerations underlying final data integration, review and analysis to maximize the likelihood of patient diagnosis amidst a growing data deluge.

Audience

Researchers or scientists with computational or genomics training and an interest in analytical techniques aimed at the improved diagnosis or rare genetic disease. Individuals with prior and current experience in the field of rare genetic disease will benefit from the ability to utilize the knowledge gained immediately in their own work. Programming knowledge (e.g., R, python, bash, or similar) is required only if participants wish to perform the practical components of the hands-on sessions. Instructions will be provided on downloading relevant data and setting up the user’s compute environment. Attendees wishing to perform the practical components of the hands-on sessions are required to provide their own laptop.

Maximum Audience: 40

Schedule Overview
2:00 - 2:30 pm Introduction
  • An introduction to rare genetic disease
  • A common problem - when rare isn’t rare
  • Rare genetic disease diagnosis in the era of next-generation sequencing
  • The promise of RNA-Seq in improving rare genetic disease diagnosis
2:30 - 3:00 pm Confounding variable correction and outlier expression analysis
  • Introduction to confounding variables in RNA data
  • Concepts of differential expression vs outlier detection
  • Software solutions overview and challenges
  • Machine learning-based correction of confounding variables
  • Outlier gene expression detection and utility in rare genetic disease studies
3:00 - 3:35 pm Hands-On Practicum: OUTRIDER expression analyses
  • Preprocessing and filtering counts
  • Autoencoder-based confounder corrections and library size correction
  • Model building and outlier detection
3:35 - 4:05 pm Fusion transcript detection in rare genetic disease
  • Fusion transcripts: an oncogenic phenomenon?
  • Technical challenges of detecting fusion transcripts in rare genetic disease
  • Software solutions overview and challenges
  • Software customizations to facilitate rare genetic disease diagnosis
  • Filtering strategies and result prioritization
4:05 - 4:20 pm Coffee Break
4:20 - 4:55 pm Hands-On Practicum: Fusion filtering and prioritization
  • Normal tissue based filtering strategies and challenges
  • Phenotypic prioritization using genelists and PCAN
4:55 - 5:25 pm Identification of aberrant splicing events in rare genetic disease patients
  • Introduction to splicing aberrations in rare genetic disease
  • Software solutions overview and challenges
  • Inverting the problem: intron vs exon usage
  • Intron cluster-based normalization
  • Statistical considerations and refinements for rare genetic disease diagnosis
5:25 - 6:00 pm Hands-On Practicum: Leafcutter for detecting splicing outliers
  • Junction counting and clustering
  • Dirichlet-Multinomial modeling and outlier detection

 

Tutorial PM2: Enhancing Molecular Dynamics Simulations Using Deep Learning

Sunday, July 12, 2:00 pm - 6:00 pm

Room: TBD

Presenters

Emmanuel Salawu, Machine Learning Laboratory, Amazon Web Services, United States
Lee-Wei Yang, Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Taiwan

Overview

Computational studies of molecules (such as through Molecular Dynamics, MD, simulations) offer a set of vital approaches for the elucidation of molecular mechanisms at the resolutions that are currently difficult or impossible to obtain from wet-lab experiments. In a similar way, the use of Artificial Intelligence (most especially, Deep Learning, DL) techniques makes it possible to study and even design and create new molecules that have specific desired properties with high success rates. The computational studies of molecules using a combination of MD simulations and DL techniques are, therefore, an active area of research and presents unprecedented tools for advancing our understanding of biology at the molecular level.

In this tutorial, we will introduce and teach MD simulations using Classical Mechanics and Force Fields/Empirical Potentials. Temperature Control and Pressure Control which forms the basis for Canonical Ensemble (NVT), and Isothermal–isobaric Ensemble (NPT) will also be introduced. Thereafter, we will introduce the Building Blocks and Architectures of Deep Neural Network (namely, Layers, Activations Functions, Lost Functions, Optimization, etc.), Convolutional Neural Networks (CNN), AutoEncoders, Variational AutoEncoders, and Adversarial Autoencoders. There will be three hands-on subsections on (1) Implementation and Execution of Unbiased/Vanilla MD Simulations of a Medium-Size Protein System; (2) Implementation and Execution of DL-Enhanced MD Simulations for the Protein System; and (3) Comparison of the Vanilla MD Simulations Results and the DL-Enhanced MD Simulations.

The participants/attendees of this tutorial will benefit tremendously from learning MD Simulations, Deep Learning, and their combinations (MD+DL). They will have the opportunity of learning and practicing how to write computer codes/programs that use MD, DL, and MD+DL techniques to study molecules. All these will allow the participants to add to their arsenal of New Cutting-Edge Tools for doing Molecular Biology (as well as Computational Chemistry) research.

Audience

The tutorial is suitable for beginners/intermediates and for people with little or no experience in MD Simulations and/or Deep Learning. Basic knowledge of and experience in Computer Programing in Python will be a great asset. Estimated Level of Difficulty: Intermediate.

Maximum Audience: 80

Requirements

None, if participants just wish to listen and watch. However, to actively participate and do the hands-on, each participant should bring his/her laptop computer with pre-installed

1. Anaconda for Python 3.7 (see https://www.anaconda.com/distribution/ )
2. OpenMM for Python 3.7 (see http://docs.openmm.org/latest/userguide/application.html#installing-openmm )
3. Pytorch for Python 3.7 (see https://pytorch.org/ )

2:00 - 2:30 pm Introduction to (and Mathematical Formalization) of Molecular Dynamics (MD) Simulations
  • Classical Mechanics
  • Force Fields
  • Canonical Ensemble (NVT)
    • Introduction to Temperature Control
  • Isothermal–isobaric Ensemble (NPT)
    • Introduction to Pressure Control
2:30 - 2:45 pm Examples and Limitations of Vanilla/Unbiased MD Simulations and the Need for Enhanced Sampling
2:45 - 3:15 pm Introduction to Deep Learning
  • Neural Network Building Blocks and Architectures
    • Activations functions
    • Lost functions
    • Optimization
  • Convolutional Neural Networks (CNN)
  • AutoEncoders and Variational AutoEncoders
  • Adversarial Autoencoders
3:30 - 4:00 pm Techniques for Achieving Enhanced Sampling in MD Simulations Using Deep Learning (DL)
  • Topics in BioVis (including examples)
  • Visualization of sequences, macromolecules, omics data, biological networks
4:00 - 4:15 am Coffee Break
4:15 - 4:45 pm Hands-on 1
  • Implementation and Execution of a Vanilla MD Simulations of a Medium-Size Protein System
  • Visualization of the MD Simulations Trajectories
  • Source Codes will be provided and explained
4:45 - 5:15 pm Hands-on 2
  • Implementation and Execution of a DL-Enhanced MD Simulations for the Protein System
  • Visualization of the DL-Enhanced MD Simulations Trajectories
  • Source Codes will be provided and explained
5:15 - 5:45 pm Hands-on 3
  • Comparison of the Vanilla MD Simulations Results and the DL-Enhanced MD Simulations
  • Source Codes will be provided and explained
5:45 - 5:55 pm Reflections and Conclusions
5:55 - 6:00 pm Questions and Answers
Closing Remarks


 

Tutorial PM3: Automation of Network Analysis in the Cytoscape Ecosystem

Sunday, July 12, 2:00 pm - 6:00 pm

Room: TBD

Presenters

Dexter Pratt, UC San Diego School of Medicine, United States
Alexander Pico, Gladstone Institutes, United States
John “Scooter” Morris, UCSF, United States

Overview

Cytoscape, one of the most popular tools for network analysis and visualization, is evolving into an ecosystem of web applications and cloud services integrated with the original desktop application. In this workshop, we will demonstrate new workflows involving core components of the ecosystem and the methods by which they can be automated for integration with your scripts, web applications, Cytoscape desktop apps. The workflows will use ecosystem components including the Cytoscape desktop, the NDEx public database, the new Cytoscape Integrated Query application (IQuery), and libraries in R, Python, and Javascript. We will begin with an overview of the ecosystem, and then discuss how its components can be applied to two common tasks: the analysis of molecular interaction data and the interpretation of gene sets. The bulk of the workshop will be a hands-on demonstration of how to use standard components in each programming environment in a workflow involving protein interaction data.

Audience

This tutorial is intended for an audience that has prior experience with:

  • R or Python
  • Basic Javascript
  • The Cytoscape desktop application
  • Bioinformatics analysis using R or Python

Participants are required to bring a laptop with a Cytoscape 3.8, either R and RStudio or Python 3.5+ and Jupyter notebooks, and an environment for web / Javascript development installed. The Chrome or Edge browsers are preferred. Detailed instructions will be provided in the weeks prior to the tutorial.

Maximum Audience: 60

2:00 - 2:40 pm Introduction
  • Quick introductions: presenters & audience.
  • A Cytoscape ecosystem overview.
  • Review of the four parallel methods for workflows in this tutorial. Unless otherwise stated, each segment will cover the same operations using:
    • Manual operations in Cytoscape, NDEx, IQuery.
    • Cytoscape Automation.
    • Direct access to NDEx and IQuery from Python or R.
    • Direct access to NDEx and IQuery from Javascript.
2:40 - 3:20 pm Setting up the Workspace
  • Clone the tutorial GitHub repository.
  • Launch Cytoscape.
  • Set up an NDEx account and configure Cytoscape to use it.
  • Open Jupyter Notebooks and run the “hello-ecosystem” notebook.
  • Open RStudio and run the “hello-ecosystem” script.
  • Open the “hello-ecosystem” web app.
3:20 - 4:00 pm Network I/O to NDEx and Basic Visualization
  • Download a public network from NDEx.
  • Apply a simple style to the network.
  • Apply a layout to the network.
  • Save the network to NDEx.
4:00 - 4:15 pm Coffee break
4:15 - 5:00 pm Data to Networks
  • Good practices for data formatting, CX name and identifier conventions.
  • Load protein interaction data as a network.
  • Annotate a network with gene expression data.
5:00 - 6:00 pm Additional Topics and Q&A
  • Cluster a network in Cytoscape and save one cluster to NDEx.
  • Launch IQuery to analyze the gene set from the cluster.
  • Save an analysis result network to NDEx.
  • From the example web app, open an analysis result network in Cytoscape.