ISMB 2020 Online Workshop and Tutorial Program

ISMB 2020 will hold a series of online workshops and tutorials prior to the start of the ISMB 2020 virtual conference scientific program.

Tutorial Registration is Closed.

Tutorial 1: Mutational signature analysis: pipelines, machine learning, and benchmarking on synthetic data

Saturday, July 11, 9:00 am - 1:00 pm (Eastern Daylight Time)
Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)


Presenters

Steven G. Rozen, Duke NUS Centre for Computational Biology Duke-NUS Medical School, Singapore.
Arnoud Boot, PhD Postdoctoral Fellow Duke-NUS Medical School, Singapore
Ferran Muiños, Institute for Research in Biomedicine Barcelona, The Barcelona Institute of Science and Technology, Barcelona, Spain

Overview

Mutational signature analysis focuses on patterns of mutations across the genome to infer their causes, and is now an essential component of cancer-genomics studies. Over the last decade, mutational signatures have revealed endogenous mutational processes that are widespread in many cancer types but that were not previously known. Signatures also showed that exposure to naturally occurring mutagens that cause liver cancer is much more widespread than suspected. Mutational signature analysis can also provide insight into the causes of specific oncogenic mutations and can reveal gaps in our understanding of the mechanisms of DNA damage and repair. Mutational signatures can either be delineated in experimental systems (e.g. cell culture or rodents) or can be discovered by machine learning across sets of hundreds to 10s of thousands of tumors. More than 100 mutational signatures have been described, many of which have unknown causes. In line with the importance of mutational signature analysis, there are now ~20 software packages that use machine learning to discover mutational signatures and assess their activity in tumors. Unfortunately, however, the cancer genomics literature contains numerous erroneous mutational signature results stemming from uncritical application of these packages.

We will cover the basic concepts of mutational signature analysis and show how this analysis is important for understanding cancer development, for detecting mutational exposures that cause cancer, and for understanding DNA damage as processed by normal and defective DNA repair. We will introduce the computational analysis needed to delineate mutational signatures in experimental systems (e.g. cultured cells or rodents), including the computational subtraction of the signatures of background mutagenesis and of experimental artifacts. We will cover in detail machine learning approaches to discovering mutational signatures in large sets of tumors and the strengths and weaknesses of these approaches. We will also discuss in depth the importance of benchmarking the machine-learning approaches on synthetic data. Finally, the tutorial will show examples of the importance of interpreting machine-learning results in the light of all available evidence to obtain biologically relevant results.

This tutorial will equip participants with the ability to run machine-learning software to discover mutational signatures and to assess their activity in tumors and with strategies to evaluate the soundness and biological relevance of the results.

Learning Objectives

(1) Understand basic concepts of mutational signature analysis. Understand the importance of mutational signature analysis for research into cancer development, for detecting mutational exposures that cause cancer, and for studying how endogenous and exogenous DNA damage as processed by normal and defective DNA repair leads to particular mutational signatures.

(2) Understand computational analysis for delineating mutational signatures in experimental systems, such as cultured cells or rodents, including subtracting signatures of background mutagenesis and of experimental artifacts.

(3) Understand machine learning approaches to discovering mutational signatures in large sets of mutational spectra plus the opportunities and challenges in using these approaches. Understand available software implementing these approaches. Understanding strategies for interpreting the results in the light of all available evidence to resolve unavoidable ambiguities and assess biological relevance.

(4) Understanding how testing machine learning methods on synthetic data as revealed the strengths, weaknesses, of different approaches

(5) Understand how processes of DNA damage, repair, and replication interact with genomic landscape.

Audience

Computing experience: there may be exercises using the command line in R; we will also share code snippets written in Python. Currently we hope that most computation can be handled using web servers. Participants will need a basic understanding of genome organization, mutations, and modern high throughput Illumina-type sequencing (BAM files, variant call files, etc.)

A list of small data sets and possibly software to be pre-downloaded onto students' computers will be available one week before the tutorial.

Maximum Participants: 60

Schedule Overview - Saturday July 11 - 9:00 - 1:00 pm Eastern Daylight Time
9:00 - 9:05 am Overview of mutational signatures
9:05 - 9:25 am Overview of mutational signatures
9:25 - 10:30 am Lecture 1, Arnoud Boot, Mutational signatures and experimental elucidation of mutational signatures

What are mutational spectra and what are they good for
"Simple" mutational spectra based on single nucleotide substitutions in trinucleotide context
Applications of mutational signatures
- in cancer epidemiology
- to study DNA damage and repair
- to study oncogenesis (origins of tumors)
- to study mutational processes in non-cancer tissue
Mutational signatures of small indels and doublet base substitutions and of single-base substitutions in extended contexts
Experimentally delineated mutational signatures and spectra; experimental procedures; possible pitfalls, e.g. contamination, difference in DNA repair, conceptual issues. If a compound just turns up the background signature, is that a signature? Can you estimate mutagenicity from cell line or mouse experiments; why that is difficult
Computational analysis of experimentally delineated signatures; process VCFs filtering, possible issues; Separating signatures of background mutagenesis and of experimental artifacts from the "target" signature

10:30 (?) (Hands on) Computational analysis of experimentally delineated mutational signatures; subtracting the signatures of background mutagenesis and of experimental artifacts
Noon 1:00 Lecture 2 Steve Rozen Machine learning for discovering mutational signatures
• The twin problems of signature discovery and determining how much of each signature is present in a tumor (“signature attribution”)
• Signature attribution as a separate problem from signature discovery (using COSMIC and/or experimental signatures)
• Non-negative matrix factorization based approaches
• Challenges in signature discovery and attribution: number of signatures, biological relevance, sparsity versus over-fitting
- Discovery and attribution are not purely algorithmic processes -- they require human judgement
1:00 pm End for Saturday
Schedule Overview - Sunday July 12 - 9:00 - 1:00 pm Eastern Daylight Time
9:00 - 9:30 am Lecture 2 (continued) Steve Rozen Machine learning for discovering mutational signatures

• Hierarchical Dirichlet process approaches
- Assessment with synthetic data
- -Strategies for evaluating results from signature discovery and signature attribution

9:30 - 10:30 am Exercises, machine learning / data mining for discovery, assessment with synthetic data
10:30 - 11:00 am Lecture 3 - Ferran Muiños - Signatures and Genomic Landscapes: Common Themes and Tactics in Genomic Landscape Analyses

Mutational profiles: from relative frequencies to conditional probabilities. A case for normalization: exome vs whole-genome data. Context of inference vs context of application.
Site-specific mutation rates. Assessing mutation rate heterogeneity. Expected number of mutations in genomic segments. The case of exon vs intron mutation rate.
Assessing mutation rate anomalies. The case of clustered mutations.
Shuffling mutations in genome segments according to prescribed mutational profiles.
Signature attribution revisited: using signature attributions to estimate mutational risks.
Synthetic mutational profiles revisited: injection of mutational signatures.
Assessing similarity: mutational profiles and exposure profiles. Cosine similarity. Relative entropy. Statistical similarity. Clustering samples by exposure profiles.
Odds and ends. Summary and take-home messages.
Description of code snippets and utilities supplied in the repository.

11:00 - 11:15 pm Break
11:15 - 1:00 pm Wrap up discussion, future prospects and challenges, pointers to resources
1:00 pm End of the course

Tutorial 2: Finding and analyzing data in the cloud with Gen3, Dockstore, Terra, and Galaxy

Thursday, July 9, 9:00 am - 1:00 pm (Eastern Daylight Time)

Agenda with Tutorial Materials
Presenters

Geraldine Van der Auwera, Broad Institute of MIT and Harvard, United States
Robert Majovski, Broad Institute of MIT and Harvard, United States

Overview

The era of big data for biomedical research is here. Massive data sets and cloud-based platforms will enable breakthrough discoveries while overcoming challenges of cost, accessibility, and security. A key strength of this new research landscape is interoperable, community-driven components that enable robust analyses for a variety of research needs.

Audience

Researchers and bioinformaticians interested in ways to maximize data and analysis resources in the cloud. The ideal tutorial participant will have coding experience and basic familiarity with genomics terminology and standard high-throughput sequencing data formats.

Goals

Guide you through the capabilities and components of the NHGRI Genomic Data Science Analysis, Visualization and Informatics Lab-space (AnVIL) resource.  Gain working knowledge of how the components work together to perform an end-to-end genetic analysis.

Slack: Please join us in the #ismb-2020 channel at https://anvilproject.org/contact

Virtual Event Agenda, all times ET
  •   9:00        AnVIL: A new vision for Analysis in the Cloud………..……..        Presenter Mike Schatz [PDF]
  •   9:15        Data that’s better, bigger, faster in the AnVIL        ………..……...        Presenter Liz Kiernan [PDF]
  •   9:25        Intro to Terra Overview ………..…….………..…….…………        Presenter Tiffany Miller  [PDF]
  •   9:40        Get set up in Terra (hands-on)………..…….………..……….        Presenter Allie Hajian [PDF]
  •   9:55        Break
  • 10:10        Data and documentation in a Workspace………..…….……        Presenter Tiffany Miller [PDF]
  • 10:30        Find and import workflows in Dockstore (hands-on)………. Presenter Tiffany Miller [PDF]
  • 10:45        Set up and run your workflow (hands-on)………..…….…….Presenter Tiffany Miller [PDF]
  • 11:00        Break
  • 11:20        Workflows outputs and troubleshooting………..…….………        Presenter Jason Cerrato [PDF]
  • 11:30        Interactive analysis (hands-on) plus Hail intro...…………….        Presenter Allie Hajian [PDF]
  • 11:50        Break
  • 12:05        Bioconductor for RNA-seq analysis (hands-on)………..……Presenter Liz Kiernan [PDF]
  • 12:50        Wrap-up / Q&A………..………………………………………...Presenter Mike Schatz [PDF]

Tutorial 3: Full-Length RNA-Seq Analysis using PacBio long reads: from reads to functional interpretation

Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)

Presenters

Ana Conesa, University of Florida, United States
Elizabeth Tseng, Pacific Biosciences, United States
Angeles Arzalluz, Polytechnical University Valencia, Spain
Francisco Pardo, Polytechnical University Valenciam, Spain
Carmen Guarco, Pacific Biosciences, United States

Overview

The PacBio Single-Molecule Real-Time sequencing technology produces highly accurate long reads that is suitable for full-length RNA sequencing. The Iso-Seq method generates full-length transcript sequences of 10 kb or longer that does not require transcript assembly or error correction. The high accuracy (>99%) of Iso-Seq transcripts allows for unambiguous characterization of alternative splicing events, direct ORF prediction without a reference genome, and identification of single cell barcodes.

The unique features of Iso-Seq data requires a special set of bioinformatics tools that typical short read RNA-seq tools fail to provide. The PacBio SMRT Analysis software processes raw sequencing data into full-length transcript sequences, which can then be analyzed with community tools that have been developed specifically for long read data: SQANTI compares Iso-Seq transcripts against known annotations (ex: GENCODE) to classify novel vs known genes and transcript, and remove artifacts; IsoAnnot functionally annotates Iso-Seq transcripts; tappAS compares multiple Iso-Seq samples to identify differential features. Existing RNA-Seq short read data are often paired with Iso-Seq data to strengthen the analysis.

Further, the Iso-Seq method can also be applied to single cell analysis. Matching single cell libraries of both long and short read data can be generated and combined to using the deeper coverage of short reads to identify cell types, while using matching cell barcodes to link fulllength isoforms generated by the long-read data back to individual cell types.

In this tutorial, we provide an overview of the Iso-Seq tools for both bulk and single cell RNAseq analysis and guide the audience through hands on analyses.

Audience

Beginner or intermediate. This tutorial will be of broad interest to researchers from academia or industry who want to learn to understand the unique features and tool sets of long read RNA sequencing (Iso-Seq) data using PacBio’s SMRT Technology.

Attendees are expected to have basic Unix command line skills and some familiarity with R/Rstudio. Programming knowledge is not required though most of the tools are written in Python.

Maximum Audience: 30

Requirements

Attendees are expected to bring their own laptops and have installed R/RStudio and the tappAS software. We will be using a shared instance in AWS for the first part of the analysis (Iso-Seq and SQANTI), then running tappAS on the local laptops.

Schedule Overview
9:00 - 9:30 am Introduction
  • Introduction to PacBio SMRT Technology and the Iso-Seq method (full-length RNA-Seq)
  • Review the official PacBio software (SMRT Analysis/BioConda)
  • Review the downstream community Iso-Seq tools: SQANTI, IsoAnnot, TAPPAS
9:30 - 10:15 am Demo & Hands-On Session: Iso-Seq using BioConda
  • Using a small human whole transcriptome Iso-Seq dataset, run through the Iso-Seq
  • pipeline using BioConda
  • Learn to visualize transcript GFF files in IGV and UCSC genome browser
10:15 - 11:00 am Demo & Hands-On Session: Functional analysis of Iso-Seq data
  • Use SQANTI to annotate novel/known genes/transcripts and remove artifacts
  • Use IsoAnnot to functionally annotate transcripts
  • Use tappAS to identify differentially expressed features across samples
11:00 - 11:15 am Break
11:15 - 11:45 am Single Cell Iso-Seq
  • Overview of applying long read sequencing for single cell transcript analysis
12:15 - 12:45 pm Hands-On Session: Single Cell Iso-Seq + RNA-Seq
  • Using a small single cell Iso-Seq and RNA-seq dataset, run through the Iso-Seq single cell pipeline
  • Learn to combine long & short read single cell data using Rstudio
12:50 - 1:00 pm Wrap Up

Tutorial 4: A practical introduction to biomedical text mining in the era of deep learning

Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)
Tutorial 4 Materials
Presenters

Qingyu Chen, National Library of Medicine, National Institutes of Health
Robert Leaman, National Library of Medicine, National Institutes of Health
Cecilia Arighi, Delaware Biotechnology Institute, University of Delaware
Zhiyong Lu, National Library of Medicine, National Institutes of Health

Overview

The volume of biomedical literature is growing at an exponential rate. PubMed, a biomedical literature search engine managed by the National Library of Medicine, has ~2 new articles indexed per minute. Such rapid growth challenges manual information extraction, curation and annotation. Biomedical text mining aims to apply natural language processing techniques to biomedical literature and automatically assist biocurators, biologists and health professionals to overcome the burden. Biomedical text mining has matured significantly in recent years. More specifically, deep learning – end-to-end neural networks inspired by biological systems – has achieved state-of-the-art performance in a range of biomedical text mining applications. In the bioinformatics community, the use of text mining via deep learning to support other research in the biological and medical sciences has been increasing. Not restricted to standalone tools, deep learning models have also been fully deployed to public web servers, further improving the quality of biomedical text mining tools and lowering the barriers for non-specialists.

This tutorial aims to familiarize the audience with an introduction to text mining the biomedical literature using deep learning methods and to provide hands-on training. The tutorial will address questions such as “What is biomedical text mining?”, “What is deep learning?”, “How can deep learning be applied to address biomedical text mining problems?”, and “What biomedical text mining tools are currently available?”. The tutorial will cover the basics of biomedical text mining and deep learning with concrete examples. The latest deep learning methods in biomedical text mining will also be explained and discussed. Also, the audience will have the opportunity to get the first hands-on experience to develop their deep learning models in biomedical literature analysis. Topics include:

  • Fundamentals of biomedical text mining and literature mining
  • Overview of deep learning in biomedical text mining
  • Word, sentence, concept embeddings for biomedical textual analysis
  • Public biomedical text mining tools for biomedical information retrieval and extraction
  • Case studies: biomedical literature analysis

This tutorial is an activity of the ISCB COSI on Text Mining.

Audience

We intend the tutorial to be for participants who are not text mining specialists but use or are interested in using it. This tutorial will provide a brief introduction, including describing existing tools and datasets. In addition, the session will provide an opportunity to describe their needs to text mining specialists.

Maximum Audience: 60

Requirements

None, if participants just wish to listen. Those who would like to also participate in the hands-on exercises are required to provide their own laptop and should have a basic knowledge of programming in Python.

Schedule Overview
9:00 - 9:25 am Introduction to biomedical text mining
  • Biomedical text processing pipeline
  • Biomedical text mining use cases
  • Q&A
9:25 - 9:30 am Introduction to biomedical text mining Short Break
9:30 - 9:55 am Introduction to deep learning
  • Basics of deep learning
  • An overview of different deep learning models (more detail in the later session)
  • Q & A
9:55 - 10:00 am Introduction to biomedical text mining Short Break
10:00 - 11:00 am Biomedical language models
  • Word embedding
  • Concept embedding
  • Sentence embedding
  • Contextual embedding (ELMO and BERT)
  • Q & A
11:00 - 11:15 am Long Break
11:15 - 12:00 pm S4. Demonstration: deep learning tools and datasets for biomedical text mining tasks
  • Named entity recognition
  • Relation extraction
  • Document classification
  • Sentence retrieval
  • Literature-based discovery
  • Q & A

Tutorial 5: BioC++ - solving daily bioinformatic tasks with C++ efficiently

Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)

Presenters

René Rahn, Max Planck Institute for Molecular Genetics, Algorithmic Bioinformatics, Germany
Svenja Mehringer, Free University Berlin, Algorithmic Bioinformatics, Germany
Marcel Ehrhardt, Free University Berlin, Algorithmic Bioinformatics, Germany

Overview

In this half-day tutorial we are going to teach how to use modern C++ and utilise modern C++ libraries to rapidly develop tools and scripts for operating on and manipulating large-scale sequencing data.

Motivation

The high variability and heterogeneity often observed within various genomic data is challenging for many standard tools, for example for read alignment and variant calling. Often, these tools are wrapped in complicated pre- and postprocessing data curation steps in order to obtain results with higher quality. However, these additional steps incur a high maintenance and performance burden to the established work process and often do not scale with larger data sets. Seldomly, C++ is considered as the language of choice for these small processes, although it is the main language used in high-performance computing. We are going to show that implementing modern C++ can be as easy as using other modern high-level languages.

Course outline:

This tutorial is organised as a half-day tutorial. At the beginning we are going to introduce fundamental concepts and principles of the C++ programming language. Further, we will teach how modern C++ features such as ranges and concepts can be used to rapidly develop high-quality C++ applications. This introduction to C++ follows a practical session were participants will read in typical files from sequencing experiments using the C++ library SeqAn and operate on the data with the taught principles to solve diverse problems, e.g. filtering out reads with low sequencing quality and others. In the last 30 minutes of the day we are going to summarise the learned concepts and compare the developed methods to current approaches.

Audience

This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, proteomics, read alignment, variant detection, etc.). A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge in programming with any high-level programming language, e.g. Python, Java or C++. Some basic C++-knowledge is helpful but not mandatory to successfully complete the course.

This tutorial is targeting beginners and intermediate C++ developers that want to learn more about modern C++ features like ranges and concepts.

Requirements:

Attendees should bring their own laptop.
Software for the tutorial can be installed beforehand, but we will also dedicate some extra time for installing required software during the tutorial.

  • Git
  • g++ >= 7
  • SeqAn 3 - (https://github.com/seqan/seqan3)
  • CMake >= 3.12

or, VirtualBox if the attendee wishes to use the provided virtual image running Ubuntu.

Maximum Attendees: 30

Schedule Overview
9:00 - 10:30 am Introduction to modern C++ [talk: 30 min]
Initial app and parsing sequencing data [hands-on: 60 min]
10:30 - 11:00 am Break
11:00 - 12:30 pm Filtering and data manipulation (hands-on)
12:30 - 1:00 pm Wrap-up [talk: 30 min]

Tutorial 6: Translational use of multifaceted RNA-Seq bioinformatics analysis in genetic disease investigation

Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)

Presenters

Gavin R. Oliver, Center for Individualized Medicine, Mayo Clinic, United States
Garrett Jenkinson, Mayo Clinic, United States
Eric W. Klee, PhD, Center for Individualized Medicine, Mayo Clinic, United States

Overview

RNA-Seq is increasingly being recognized as a testing modality with significant untapped potential in the field of genetic disease studies. These data present a unique opportunity for diverse multifaceted analysis. Data profiling methods including expression outlier analysis, aberrant splicing detection, fusion transcript identification and allele-specific expression have been demonstrated to achieve genetic diagnosis of diseases escaping resolution through traditional clinical and research-based DNA-testing. Recent published works have highlighted the ability to increase diagnostic rates by as much as 35% utilizing RNA-Seq analysis, but analytical workflows are diverse and non-trivial to implement or interpret. This tutorial focuses on the utilization of RNA-based analysis for the improved diagnosis of rare genetic disease. An introduction will be given to the current state of genetic disease diagnostics and the benefits revealed to date by RNA-Seq. RNA-based testing paradigms will be introduced individually and discussed in terms of translational utility with a focus on data analysis methodologies and considerations. Each computational analysis solution will be overviewed with hands-on sessions highlighting the analytical capabilities of a specific informatics solution for each testing paradigm. Means of prioritizing results based on biological and phenotypic relevance will be addressed and cutting-edge computational solutions demonstrated. Finally consideration will be given to the principles and considerations underlying final data integration, review and analysis to maximize the likelihood of patient diagnosis amidst a growing data deluge.

Audience

Researchers or scientists with computational or genomics training and an interest in analytical techniques aimed at the improved diagnosis or rare genetic disease. Individuals with prior and current experience in the field of rare genetic disease will benefit from the ability to utilize the knowledge gained immediately in their own work. Experience programming in R would be useful but practical sessions will be conducted within a Jupyter environment, enabling code to be followed and executed without programming expertise. Attendees wishing to perform the practical components of the hands-on sessions are required to provide their own laptop.

Maximum Audience: 40

Schedule Overview
9:00 - 9:30 am Introduction
  • An introduction to rare genetic disease
  • A common problem - when rare isn’t rare
  • Rare genetic disease diagnosis in the era of next-generation sequencing
  • The promise of RNA-Seq in improving rare genetic disease diagnosis
9:30 - 10:00 am Confounding variable correction and outlier expression analysis
  • Introduction to confounding variables in RNA data
  • Concepts of differential expression vs outlier detection
  • Software solutions overview and challenges
  • Machine learning-based correction of confounding variables
  • Outlier gene expression detection and utility in rare genetic disease studies
10:00 - 10:35 am Hands-On Practicum: OUTRIDER expression analyses
  • Preprocessing and filtering counts
  • Autoencoder-based confounder corrections and library size correction
  • Model building and outlier detection
10:35 - 11:05 am Fusion transcript detection in rare genetic disease
  • Fusion transcripts: an oncogenic phenomenon?
  • Technical challenges of detecting fusion transcripts in rare genetic disease
  • Software solutions overview and challenges
  • Software customizations to facilitate rare genetic disease diagnosis
  • Filtering strategies and result prioritization
11:05 - 11:20 am Break
11:20 - 11:55 am Hands-On Practicum: Fusion filtering and prioritization
  • Normal tissue based filtering strategies and challenges
  • Phenotypic prioritization using genelists and PCAN
11:55 - 12:25 pm Identification of aberrant splicing events in rare genetic disease patients
  • Introduction to splicing aberrations in rare genetic disease
  • Software solutions overview and challenges
  • Inverting the problem: intron vs exon usage
  • Intron cluster-based normalization
  • Statistical considerations and refinements for rare genetic disease diagnosis
12:25 - 1:00 pm Hands-On Practicum: Leafcutter for detecting splicing outliers
  • Junction counting and clustering
  • Dirichlet-Multinomial modeling and outlier detection

Tutorial 7: Automation of Network Analysis in the Cytoscape Ecosystem

Sunday, July 12, 9:00 am - 1:00 pm (Eastern Daylight Time)

Tutorial 7 Materials
Presenters

Dexter Pratt, UC San Diego School of Medicine, United States
Alexander Pico, Gladstone Institutes, United States
John “Scooter” Morris, UCSF, United States

Overview

Cytoscape, one of the most popular tools for network analysis and visualization, is evolving into an ecosystem of web applications and cloud services integrated with the original desktop application. In this workshop, we will demonstrate new workflows involving core components of the ecosystem and the methods by which they can be automated for integration with your scripts, web applications, Cytoscape desktop apps. The workflows will use ecosystem components including the Cytoscape desktop, the NDEx public database, the new Cytoscape Integrated Query application (IQuery), and libraries in R, Python, and Javascript. We will begin with an overview of the ecosystem, and then discuss how its components can be applied to two common tasks: the analysis of molecular interaction data and the interpretation of gene sets. The bulk of the workshop will be a hands-on demonstration of how to use standard components in each programming environment in a workflow involving protein interaction data.

Audience

This tutorial is intended for an audience that has prior experience with:

  • R or Python
  • Basic Javascript
  • The Cytoscape desktop application
  • Bioinformatics analysis using R or Python

Participants are required to bring a laptop with a Cytoscape 3.8, either R and RStudio or Python 3.5+ and Jupyter notebooks, and an environment for web / Javascript development installed. The Chrome or Edge browsers are preferred. Detailed instructions will be provided in the weeks prior to the tutorial.

Maximum Audience: 60

9:00 - 9:40 am Introduction
  • Quick introductions: presenters & audience.
  • A Cytoscape ecosystem overview.
  • Review of the four parallel methods for workflows in this tutorial. Unless otherwise stated, each segment will cover the same operations using:
    • Manual operations in Cytoscape, NDEx, IQuery.
    • Cytoscape Automation.
    • Direct access to NDEx and IQuery from Python or R.
    • Direct access to NDEx and IQuery from Javascript.
9:40 - 10:20 am Setting up the Workspace
  • Clone the tutorial GitHub repository.
  • Launch Cytoscape.
  • Set up an NDEx account and configure Cytoscape to use it.
  • Open Jupyter Notebooks and run the “hello-ecosystem” notebook.
  • Open RStudio and run the “hello-ecosystem” script.
  • Open the “hello-ecosystem” web app.
9:20 - 11:00 am Network I/O to NDEx and Basic Visualization
  • Download a public network from NDEx.
  • Apply a simple style to the network.
  • Apply a layout to the network.
  • Save the network to NDEx.
11:00 - 11:15 am Break
11:15 - 12:00 pm Data to Networks
  • Good practices for data formatting, CX name and identifier conventions.
  • Load protein interaction data as a network.
  • Annotate a network with gene expression data.
12:00 - 1:00 pm Additional Topics and Q&A
  • Cluster a network in Cytoscape and save one cluster to NDEx.
  • Launch IQuery to analyze the gene set from the cluster.
  • Save an analysis result network to NDEx.
  • From the example web app, open an analysis result network in Cytoscape.

Registration Fees

ISCB MEMBER FEES - Virtual Tutorials
Tutorial 1 will be held on two mornings
Tutorials 2 - 7 will be held on one morning
All times Eastern Daylight Time

High Income Countries Middle-Low Income Countries Low Income Countries
Student (Tutorial 1) $100.00 $50.00 $20.00
Post Doc (Tutorial1) $100.00 $50.00 $20.00
Professional: Academic; Non-profit; Government; or Corporate (Tutorial 1) $100.00 $50.00 $20.00
Student (Tutorials 2 - 7) $50.00 $25.00 $10.00
Post Doc (Tutorials 2 - 7) $50.00 $25.00 $10.00
Professional: Academic; Non-profit; Government; or Corporate (Tutorials 2 - 7) $50.00 $25.00 $10.00
NON-MEMBER FEES - Virtual Tutorials
(fee includes 1 year ISCB membership)

Tutorial 1 will be held on two mornings
Tutorials 2 - 7 will be held on one morning
All times Eastern Daylight Time
High Income Countries Middle-Low Income Countries Low Income Countries
Student (Tutorial 1) $165.00 $85.00 $35.00
Post Doc (Tutorial1) $195.00 $85.00 $35.00
Professional: Academic; Non-profit; Government; or Corporate (Tutorial 1) $240.00 $105.00 $55.00
Student (Tutorials 2 - 7) $110.00 $55.00 $25.00
Post Doc (Tutorials 2 - 7) $140.00 $55.00 $25.00
Professional: Academic; Non-profit; Government; or Corporate (Tutorials 2 - 7) $185.00 $75.00 $40.00