NGS Post-Conference Workshops
Accessing structural variation and population genomics data in Ensembl

Dr. Benjamin Moore, Ensembl Outreach Officer, EMBL-EBI
Laurent Gil, Ensembl Variation Developer, EMBL-EBI

The Ensembl genome browser (www.ensembl.org) provides visualisation and analysis of integrated genomic data, including genes, variants, comparative genomics and gene regulation, for over 70 species [1].

In this workshop, participants will learn about the structural variation and population genomics data in Ensembl and how to access these data using the Ensembl genome browser. A series of short demonstrations and exercises will allow participants to retrieve this data from Ensembl and think about how this data might be informative for their own research.

Participants will also learn how to upload their own data to view in the Ensembl genome browser alongside the publically available data, and how to analyse the consequences of sets of variants using the Variant Effect Predictor (VEP).

Learning objectives:
• Gain confidence to explore structural variation and population genomics data in Ensembl, and to export these data.
• Explore transcript haplotype and linkage disequilibrium data in Ensembl.
• Use the Variant Effect Predictor (VEP) to analyse the consequences of variants.
• Learn how to upload custom sequencing and variation data to the Ensembl browser.

Audience Level:
This is an introductory workshop and is aimed at new and intermediate Ensembl users from the wet-lab, bioinformatics and medical communities.

Large-scale reproducible NGS analyses with Nextflow

•  Paolo di Tommaso, Research Software Engineer, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology
•  Emilio Palumbo, BioinformaticsTechnician, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology
•  Anna Vlasova, Bioinformatics Technician, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology

Exponential growth of biological datasets in the last decade together with increasing complexity and variability of the methods became an important issue for standard and reproducible data analysis. The increasing heterogeneity of computational environments such as local workstations, big computational clusters or cloud services raised the need for easily portable computational workflows. This workshop will introduce Nextflow - a pipeline management tool to tackle these problems and its application to NGS technologies. In the frame of the workshop we will show how to make NGS pipelines flexible, reproducible and portable across different execution environments.

Reproducibility along with computational capabilities have become two of biology’s most pressing issues, because of the exponential growth of biological datasets and the reliance on increasingly complex methods and data analysis workflows.

This workshop will introduce the state-of-the-art technologies and best practices proposed to tackle the problem of irreproducible NGS data analyses and how to make large omics workflows portable across HPC clusters and cloud environments. The most common NGS use cases and applications will be discussed.

Audience Level:
The session is aimed at people interested in computational processing and analysis of big datasets in efficient and reproducible manner. In particular this session can be of interest for computational biologists and bioinformaticians. Participants ideally must have working experience with the command line and linux. Basic knowledge of computational workflows could be a plus.

Finding and Accessing Human Genome Data
Organizers: Manuel Corpas, Scientific Lead, Repositive

Repositive is a Cambridge-based (UK) startup building a platform for facilitating access to and reuse of genomic data. The platform includes features for discovery of available data, making your data visible to the research community, managing your data sources and initiating data collaborations. The platform also integrates methods for secure data sharing and a patented mechanism for secure privacy-preserving data access.

Repositive is a spin-out of the charity DNAdigest which was started by Fiona Nielsen in 2013 to educate, facilitate, and engage on issues regarding access to genomic data for research. DNAdigest is a non-profit organisation founded in Cambridge, UK. It is a community of individuals from diverse backgrounds who all want to see data used to its full potential for the benefit of patients. DNAdigest organises events, issues a newsletter, conducts interviews and runs a blog.

Both organisations are part of Global Alliance for Genomics and Health.

1) Discuss current challenges in data sharing, focusing especially on researchers studying human genomic datasets.
2) Present a number of tools and resources for finding, accessing and sharing genomic data:
- Repositive
- EGA (European Genome Phenome Archive)
- dbGaP
- GigaScience
- Nature Scientific Data
- figshare
3) Discuss the best practices for using data to power hypothesis testing and maximising research impact.

Audience Level:
• Genomic data providers
• Users of massive genomic datasets
• Bioinformaticians
• Data scientists
• Publishers
• Data curators

Manuel Corpas
