PRE-CONFERENCE WORKSHOPS, TUTORIALS, AND MINI-COURSES

Monday, November 5 - Tuesday, November 6, 2018
Location: Universidad Andres Bello - Vina Del Mar

Many of the tutorial offerings below require the use of a computer. Participants are strongly encourage to bring their own laptop! A limited number of laptops can be provided to those who are unable to bring their own.

Identifying potentially deleterious genetic variants in families with rare Mendelian diseases: from whole-exome sequencing to variant annotation and selection

Goals and Objectives:
Millions of families worldwide are affected by rare Mendelian disorders, inherited disease traits that are controlled by a single, extremely rare genetic variant. Applying next-generation sequencing to a select group of individuals in an affected family, it is possible to identify and extract potentially disease-causing variants from the background of natural human variation for further laboratory validation. The goal of this tutorial session will be to provide participants with a good understanding of the computer-based aspect of this line of research. Attendees will be provided with a step-by-step introduction to the sequencing and variant analysis of families with Mendelian diseases, from the quality control of raw sequencing data to the annotation and final selection of candidate variants for the disease. By the end of the session, participants will have the basic knowledge to carry out their own sequencing analysis given the appropriate sequencing data and accompanying clinical diagnosis. The session will provide a timely bioinformatics primer in an area of genomic medicine that remains largely unexplored in Latin America, and yet is a crucial component of the future of medicine both at home and worldwide.

Intended Audience:
The session will be open to any postgraduate-level researcher (or above) in genetics and/or genomics with a basic understanding of next-generation sequencing analysis (or with a strong interest in these topics). Familiarity with the Linux command line is not required but desirable.

Description:
Millions of families worldwide are affected by rare Mendelian disorders, inherited disease traits that are controlled by a single, extremely rare genetic variant. Thanks to important breakthroughs in the field of genomics over the past two decades, most notably the advent of next-generation sequencing, it is now possible to sequence several members of an affected family with the goal of identifying these disease-causing variants and thus providing much needed genetic counselling. In this session, we will provide the basic training necessary to carry out the computer-based aspect of this exciting and necessary area of genomic medicine, guiding current and aspiring bioinformaticians through the required steps of a typical analysis. Participants will be introduced to the tools, formats and thinking that accompany each of the critical steps of an analysis, namely quality control, sequence alignment, variant calling, and variant annotation and selection. Attendees will get the opportunity to simulate their own analysis, from the raw sequencing data up to the selection of candidate variants for the disease.

- top -

Analysing large datasets with Apache Spark

Trainer:
Apurva Nandan (CSC, Finland)

Goals and Objectives:
The goal of the session is to learn about Spark - a popular big data framework, which is used to tackle the huge volumes of data across a number of domains. Spark allows several researchers and professionals to work with large amounts of data using several programming APIs and even SQL. We will learn how to write simple to intermediate applications in Spark to perform data analysis. We would also be looking at the methods for running a spark cluster on a cloud based infrastructure, along with ways to manage and fine tune your cluster. The course will also demonstrate usage of Spark's native Machine learning library to perform ML based tasks and also how to work with real-time data streams.

Intended Audience:
Researchers, Students, Professionals with programming skills, preferably in Python, as the exercises are in Python. Some knowledge of SQL is also recommended.

Description:
When working with any kind of data, we might have ran into several problems - Low memory/cpu problems while working with huge datasets, waiting for hours to complete a job/analysis or starting all over again if the job fails. With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. The two day course addresses the technical architecture and use cases of Spark, setting it up for your work, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and Spark's Machine Learning library. The second day focuses on the analysis of data by running SQL queries in Spark, working with real-time data streams and how to setup and manage a spark cluster.

- top -

Designing, delivering and evaluating bioinformatics training

Goals and Objectives:
Data-driven biology needs both talented scientists and inspiring trainers, and owing to the complex, multidisciplinary nature of bioinformatics, those best placed to train others need to have an in-depth understanding of their topic. However, subject-matter expertise alone is not enough to deliver effective bioinformatics training. The goal of this highly interactive knowledge-exchange workshop is to bring to light all the other things that contribute to the design and delivery of effective bioinformatics training. We are motivated by the need to strengthen bioinformatics capacity in Latin America. By bringing together a cohort of scientists with a passion for excellence in learning, we hope to contribute to a network of effective and confident trainers in the region. We will draw on our experience as bioinformatics trainers, both in Europe and in Latin America, incorporating use cases from Latin America to ensure relevance to our target audience. The organisers are involved in several initiatives that will also provide further opportunities for workshop participants to reinforce their learning. These include the CABANA consortium, which will provide opportunities for participants to host bioinformatics workshops in their own institutes, Software Carpentry, which provides more advanced training for instructors, and the ISCB Education Committee’s task force on curriculum and competencies, which has developed a framework for capturing training needs and matching these to learning opportunities. The full programme is available here.

Click here to visit their website.

- top -

Open Data and Tools for Bioinformatics Research

Goals and Objectives:
The goal of this workshop is to encourage the open sharing of biological data by matching participants to the open data resources, tools and analysis techniques most suitable to their research, and by collaboratively building a catalogue of resources that helps to meet these needs. There is an underrepresentation of data of Latin American origin in the public data resources and this can make it challenging to perform some kinds of research.

We will showcase some projects that are actively filling this void and provide pointers to enable our participants to do the same for their research. Among the current world projects gathering large amount of sequence data are the UK 100,000 genomes project (whole genome sequencing of 100,000 individuals), the earth-microbe project (metagenome-oriented effort), and several other plant, fungi, and animal genome projects. There is a proposal to sequence every animal and plant species on earth in the next 10-years (the earth-biogenome project). We will focus on this workshop on local Latin-American efforts to generate molecular data at a large scale. All these records will give us a current picture of the molecular data being produced in the world and specifically in Latin American countries.

Description:
This workshop will focus on supporting researchers to make full use of tools available for the dissemination of molecular data of Latin American origin. The workshop will be highly interactive. Participants will learn about the importance of open data to research, find out about global initiatives involving Latin American scientists, and work together to make Latin American biodata more discoverable. An important part of the workshop will be a hackathon in which we catalogue open data initiatives, data resources and tools made in Latin America, using a purpose-built wiki module. By the end of the workshop t we hope to have nucleated a committed group of scientists with established procedures for the open publication of biomolecular data in Latin America. The full programme is available here.

Click here to visit their website.

- top -

Introduction to python with application on genomic data

Trainers:
Renato Augusto Corrêa dos Santos, MSc., Brazil
Diego Mauricio Riaño Pachón, Dr. rer. nat., Brazil
Sheila Tiemi Nagamatsu, B.Sc., Brazil

Goals and Objectives:
Introduction of programming skills with applications on biological data with Python.

Intended Audience:
We expect the audience to comprise mainly biology students with little background on computer science, and without prior experience with programming in Python language. We will introduce basic aspects of programming with Python (syntax), showing applications on analysis of biological data. We consider it to be introductory material. We expect around 20 participants.

Description:
The session/course will be prepared based on previous experiences of Python workshops organized by Renato Augusto Corrêa dos Santos and Diego M. Riaño Pachón, combined with a new dataset brought by Sheila Tiemi Nagamatsu to this session.

- top -

Human genomic analysis for precision medicine with GATK4 Best Practices and FireCloud

Goals and Objectives:
Rare disorders afflict millions of families worldwide. The extremely rare genetic variants that potentially underlie disease traits can be discovered by sequencing large cohorts of patients and analyzing the genomic data with sophisticated bioinformatic tools. In this workshop, a collaborative team from Universidad Andrés Bello (Chile) and the Data Sciences Platform at the Broad Institute of MIT and Harvard (United States) will introduce you to variant discovery step-by-step, from preliminary clinical steps and hypothesis generation to the fundamentals of genomic analysis with the Genome Analysis Toolkit (GATK). Developed at the Broad Institute, GATK is the most widely-used open-source software package for variant discovery on whole genome and exome data.

Through a mix of lectures and practical exercises, you will learn the key scientific concepts and approaches for (1) formulating a hypothesis that a disease affecting a patient is controlled by a single deleterious variant, and (2) the official GATK Best Practices pipelines for variant discovery published by the Broad Institute. This includes learning how to write GATK pipelines using the Workflow Description Language (WDL) that is used in the GATK Best Practices, and execute them using the Cromwell execution engine on any platform, including FireCloud -- the Broad's open-source cloud-based analysis platform. FireCloud integrates computational resources, methods repository and data management in a secure environment. We will then guide you through the reproduction of an exome-sequencing study involving a large cohort of Tetralogy of Fallot patients (soon to be published;
https://www.biorxiv.org/content/early/2018/04/13/300905) using FireCloud, simultaneously learning how to assemble a workspace containing workflows and data that recapitulates all stages of the approach, which could then be published as part of a manuscript to build a fully reproducible analysis. The session will close with a discussion of the next steps following variant discovery, from the variant validation process to in vitro and in vivo experiments to confirm the association between a candidate variant and the disease.

Intended Audience:
This session is primarily intended for researchers, analysts, and bioinformaticians who need to perform genomic analysis/variant discovery on high‐throughput sequencing data. Prior experience with GATK is not required, but basic familiarity with biological and genomics terminology is expected. Statistical concepts involved in variant calling will be discussed at a high level but not in detail. No algorithmic experience is required. Basic familiarity with the UNIX-style command‐line environment will be required. Prior scripting experience using e.g. bash or python is a plus but not required.

Overall the material in this session is considered Introductory to Intermediate and is designed to be taught to 30-40 participants maximum.

Description
FireCloud is the Broad Institute’s open platform for secure and scalable analysis on the cloud where users can run the Genome Analysis Toolkit (GATK) Best Practices workflows and various additional utilities. Join this interactive workshop to learn how to leverage FireCloud and GATK for the discovery of rare genetic variants that underlie disease traits. We will take you step-by-step, from formulating a hypothesis to variant discovery and analysis, while simultaneously building a reproducible workspace in FireCloud.

Preparing for the workshop:
To prepare your laptop for the hands-on exercises please follow the instructions here and register for a FireCloud account. Please tell us your FireCloud account name in this survey by November 4th so that we can continue to set up your account for the workshop. If you follow the instructions and have difficulty running Docker on Windows, we recommend that you borrow a Mac or Linux system. Please let us know if you are unable to resolve this issue by emailing This email address is being protected from spambots. You need JavaScript enabled to view it..

Presenters
Yossi Farjoun, Associate Director, Data Sciences Platform, Broad Institute, USA
Robert Majovski, Lead Educator, Data Sciences Platform, Broad Institute, USA
Tiffany Miller, Community Development Manager, Data Sciences Platform, Broad Institute, USA
Matthieu Joseph Miossec, Principal Investigator, Center for Bioinformatics and Integrative
Biology, Universidad Andrés Bello, Chile

- top -

Open data and tools for bioinformatics research

We will work together to build an openly accessible catalogue of resources, highlighting those that are ‘made in Latin America’. A major event to accomplish this purpose will be a hackathon where participants will have the chance of collaboratively create a catalogue of data resources and tools, incorporating as many as possible that are of Latin American origin. As part of the Hackathon, participants will be encouraged to share a description of their country’s molecular data on a Wiki page designed for that purpose.

One of our proposals is the development of a Wiki standard module for Latin American scientists to disseminate their projects focused on the generation and publication of molecular data of species from the region biodiversity. We will develop a central Wiki page for Latin America and we will link country individual pages to the central page. Our proposal will focus on the challenges we face to increase molecular data publication in Latin America and how to deal with them. Opportunities to improve publication requirements in the region include the design of specific workshops on public data dissemination, development of training material for leaders and research assistants on public databases and data submission and discussion of strategies to improve the dialogue between science funding agencies and scientists on the benefits of open data publication.

Intended Audience:
This workshop is for bioinformatics researchers whose research depends on public data, and/or who are producing lots of data and need support on how to share their data appropriately.

- top -

ISCB-LA SOIBIO EMBnet 2018 | Nov 5 – 9, 2018 | Viña del Mar, Chile | PRE-CONFERENCE WORKSHOPS, TUTORIALS, AND MINI-COURSES

PRE-CONFERENCE WORKSHOPS, TUTORIALS, AND MINI-COURSES

PRE-CONFERENCE WORKSHOPS, TUTORIALS, AND MINI-COURSES

Identifying potentially deleterious genetic variants in families with rare Mendelian diseases: from whole-exome sequencing to variant annotation and selection

Analysing large datasets with Apache Spark

Designing, delivering and evaluating bioinformatics training

Open Data and Tools for Bioinformatics Research

Introduction to python with application on genomic data

Human genomic analysis for precision medicine with GATK4 Best Practices and FireCloud

Open data and tools for bioinformatics research

ISCB On the Web