Workshops

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Workshops

Workshop 1: Bioinformatics Core Facilities
Workshop 2: Navigating the Granting Jungle
Workshop 3: Data Visualization and User Interfaces
Workshop 4: Unifying Bio-Resources Descriptors
Workshop 5: Workshop on Education in Bioinformatics (WEB 2011)
Workshop 6: Genomics for Non-Model Organisms

Workshop 1: Bioinformatics Core Facilities

Organizer(s):
Simon Andrews, Babraham Institute, United Kingdom
Fran Lewitter, Whitehead Institute, United States
Brent Richter, Partners Healthcare, United States
David Sexton, Vanderbilt University, United States

Date: Sunday, July 17
Time: 10:45 a.m. – 12:40 p.m.
Room: Hall N/O

Workshop Overview

This workshop is organized by an international organisation (http://www.bioinfo-core.org) that brings together computational biologists working in and running bioinformatics core facilities. The group was formed in 2002 and draws its membership from over 200 facilities in 23 countries.

Bioinformatics core facilities undertake work encompassing the full breadth of modern computational biology. Exposure to varied datasets from many different sources allows us to critically evaluate software and methods from a viewpoint not normally available to research groups.

Part A: Analysis Pipelines for high throughput sequencing – what can we learn, and what do we miss? (25 minutes of talks, 25 minutes of open discussion moderated by Brent Richter, Partners Healthcare)

A major goal of the bioinformatics core facility is to deliver high-quality data and tools to the researcher. Standardised pipelines coming either from instrument manufacturers or dedicated bioinformatics groups can provide a simple way to provide analysed data to researchers. However the unsupervised nature of pipelines leads to the possibility that useful and relevant information may be missed. We will explore the potential benefits of automated analysis, but will also investigate what insights might be missed by relying only on standard pipelines. We will address questions such as:

· Under what circumstances can delivering only the results of a standard pipeline be misleading?

· What is the potential value of the intermediate data calculated in larger pipelines?

· How can we best balance the value of in-depth investigation with the value derived from this extra effort?

· Can pipelines be given to non-bioinformaticians to use in their own data analysis without instruction on the caveats of their use?

· Can pipelines be shared within the community without the need to adapt them to each individual environment?

Speakers

Simon Andrews (Head of Bioinformatics, Babraham Institute) will present a series of case studies derived from two tools developed in his group; FastQC – a tool which analyses raw high throughput sequence data to identify trends and biases which might not appear in later analysis of mapped data, and a repeat mapping pipeline which provides a mechanism by which aggregate data on multiply mapping reads to be collected and analysed.

Jim Cavalcoli (Bioinformatics Core Director, University of Michigan) will present the results of two RNA-Seq studies, where data that were originally rejected as unmappable was reanalysed to reveal insights about transposable elements and non-standard splicing mechanisms.

Part B: Practical aspects of running a bioinformatics core facility (25 minutes of talks, 25 minutes of open discussion moderated by Fran Lewitter, Whitehead Institute)

We will explore the practicalities of providing a bioinformatics service to a large group of researchers. It will look at how such groups are structured, funded, staffed and monitored. The aim of the session is to share best practices and to stimulate discussions on how these facilities can achieve the best scientific results in the most efficient way.

Speakers

Matthew Eldridge (Head, Bioinformatics Core, Cancer Research UK, Cambridge Research Institute). The CRI Bioinformatics Core provides computational and statistical support to 21 research groups, with approximately 270 scientists, whose interests’ range from pure basic research questions to clinical applications with practical benefits for cancer patients.

Simon Lin (Bioinformatics Consulting Core Director, Northwestern University) The Northwestern Bioinformatics Core Facility provides analysis, support and design for next generation sequencing, microarrays, proteomics, clinical trial informatics as well as custom web-based database development for basic science and clinical projects. The goal is to provide researchers with in-house expertise of bioinformatics in order to produce studies that ultimately result in publications and grants.

top

Workshop 2: Navigating the Granting Jungle

Organizer(s):
Yana Bromberg, Rutgers University, United States
Magali Michaut, University of Toronto, Canada
Venkata P. Satagopam, EMBL, Germany
Andrea Schafferhans, TU Münich, Germany

Date: Sunday, July 17
Time: 2:30 p.m. – 4:25 p.m.
Room: Hall N/O

Workshop Overview

Many different agencies provide funding for research activities. These exist on national levels, but also fund bi-national or international co-operation. This workshop will help current and future principle investigators from the field of bioinformatics and computational biology to get an overview of funding options and information resources available to find such funding sources. In addition, the workshop will help select those funding schemes that are promising as fitted to the individual situation (career stage, collaborators, etc).

The proposed workshop in Grant writing is meant to complement the Tutorial on Grant writing to be given on July 15^th, 2011. While the tutorial focuses on how to write a grant application, the primary aim of the workshop will be to point researchers to the available funding options. The workshop will consist of a panel discussion and 6 invited presentations (12 min each with time for questions) on the following topics:

A) Overview of funding options (individual grants, group funding, integrated projects, etc.)

B) Experiences and recommendations from the reviewers’ point of view

C) How to find applicable grants and how to choose which ones to apply for

Speakers

Gary Bader is an Assistant Professor at the University of Toronto
Phil Irving is the Head of Grants Services, European Molecular Biology Laboratory (EMBL)
Andrew L. Hufton works at EMBO as an Editor for Molecular Systems Biology Journal
Lawrence Hunter is the Director of the Centre for Computational Pharmacology & Computational Bioscience Program
Michal Linial is the Director of the Sudarsky Center for Computational Biology, the Hebrew University of Jerusalem, Israel
Alfonso Valencia is the Director of the Spanish National Bioinformatics Institute (INB)

Programme Time Table

Time	Topic	Referent
2:30 p.m. - 2:55 p.m.	Funding availability	Andrew L. Hufton, Phil Irving
5 min break
3:00 p.m. - 3:25 p.m.	Reviewer’s recommendations	Michal Linial, Alfonso Valencia
5 min break
3:30 p.m. - 3:55 p.m.	Speaking from experience	Lawrence Hunter, Gary Bader
5 min break
4:00 p.m. - 4:25 p.m.	Finding/selecting funding schemes	Panel discussion with the previous speakers

Qualification of the submitter(s) to organize the workshop

Yana Bromberg is an assistant professor at Rutgers University, School of Environmental and Biological Sciences. She is the chair of the ISMB poster session, co-organizer of the SNP-SIG, and an active participant on a number of scientific conference and journal review boards. In her 10 years in the field, Yana had been funded through traditional (T15 and R01 mechanisms of the NIH/NLM) and through more exotic sources (SBIR R43, Private funding). In her new position she is looking forward to learning more about the available resources that will support her ongoing research.

Magali Michaut is a postdoctoral researcher at the University of Toronto. She has been highly involved with the ISCB Student Council for 4 years, in particular in the organization of the Student Council Symposium (reviewing committee 2008, speaker chair 2009, symposium co-chair 2010, ESCS1 chair 2010). She was one of the elected members (secretary) of the Student Council in 2009, initiated the RSG Europe (secretary) and created and chaired the RSG France.

Venkata P. Satagopam is a Senior Bioinformatics Scientist at EMBL. He is involved with ISCB student council since formed in 2004, particularly in the organization of several student council symposiums, initiated internships for students from developing nations etc. He is also involved in the organization of the Bioinformatics workshops at EMBL and organizing committee member of the ‘Bioinformatics to Systems Biology 2010 conference’.

Andrea Schafferhans is a postdoctoral researcher at the TU Munich. In her academic and industrial research positions, she has been involved in applications for national and European funding. As a researcher in the transition to independence she is gathering experience in navigating the funding agency jungle – with a special focus on non-standard careers including industry and family phases.

top

Workshop 3: Data Visualization and User Interfaces

Organizer(s):
Nils Gehlenborg, Harvard Medical School, United States
Lawrence Hunter, University of Colorado, United Kingdom
Seán O'Donoghue, European Molecular Biology Laboratory, Germany
James Procter, University of Dundee, United Kingdom

Date: Monday, July 18
Time: 10:45 a.m. – 12:40 p.m.
Room: Hall N/O

Workshop Overview

This workshop is aimed at ISMB attendees who develop interactive tools or methods to help biologists gain insight from data. Three expert speakers will cover principles, illustrative examples, and resources that can help to improve the effectiveness of data visualization, to improve the usability of user interfaces, and finally to help ensure that users find your tool truly helpful in their research. As biology is becoming increasingly data-driven, and as data volume and complexity increase, the above topics are becoming more relevant and urgent for bioinformatics tool developers. The workshop will consist of three 30 minute talks followed by a 30 minute discussion session, as outlined below.

Part A: 10:45 - 11:10 User Interface Design (Speaker: Scooter Morris). For many computational systems, including bioinformatics tools, ease-of-use can greatly influence the extent to which the end-user community either adopts the tool, or ignores it. This talk will outline key principles that can improve user interfaces, and present several case studies highlighting these principles.

Part B: 11:15 - 11:40 Data Visualization (Speaker: Nils Gehlenborg). Data visualization can be a powerful tool to explore large and complex data sets for hypothesis generation. This talk will outline strategies for the design and implementation of interactive visualizations and highlight successful applications of visualization in bioinformatics. The talk will also address the basic principles of data visualization and discuss some of the challenges that developers face in practice.

Part C: 11:45 - 12:10 Visual Analytics (Speaker: Carsten Görg). The interpretation of biological data is a complex process, involving many different data analysis techniques, often applied in multiple different combinations. Visual analytics tools combine the process of analysis and visualization, and this talk will highlight key tools that enable the researcher to visually analyse and identify significant aspects of their results.

Part D: 12:15 - 12:40 Discussion Session (Moderator: Seán O’Donoghue). Workshop participants will have the opportunity to discuss practical aspects of data visualization, interface design and visual analytics tools with the speakers. In addition to the above speakers, several developers of widely-used software tools have been invited to be part of this session to answer questions from the audience and to share their insights.

top

Workshop 4: Unifying Bio-Resources Descriptors

Organizer(s):
Dawn Field, NERC, NEBC, United Kingdom
Pascale Gaudet, Swiss Institute of Bioinformatics, Switzerland
Susanna-Assunta Sansone, University of Oxford, United Kingdom

Date: Monday, July 18

Time: 2:30 p.m. – 4:25 p.m.

Room: Hall N/O

Workshop Overview:

This workshop will bring together developers, curators, journal editors and researchers to discuss on the growing number of (closely related efforts) developing to catalogues of tools, databases, related data and publications. The focus on the workshop is a strawman uniform system for describing these bio-resources, in particular, indicating in a consistent manner which community-defined standards (minimal information checklists, terminologies and exchange formats) they implement.

Such uniform system will i) assist the research and bioinformatics communities to locate and access the information distributed in these bio-resources and ii) inform journal editors and funders, implementing data preservation, management and sharing policies, when recommending or requiring that certain standards are met and that data is deposited to a public database.

Background

High-throughput approaches in genomics and functional genomics bioscience domain have become ubiquitous in the past decade. Several policies have emerged in response to the increased quantity of data available and the correspondingly large variability in their storage and analysis¹. The need to store and share this data helps explaining the explosion in the number and variety of tools and databases that cater to the needs of biological research. Many of those resources have contributed to the development of community-level standards that support the harmonization of the annotation and storing process, so that different data can be comprehensible (in principle), reproducible, but also easily shared, analyzed, compared and integrated. The next challenge is to better integrate those policies and standards and link them to the relevant tools and databases implementing them. The goal is to present researchers with a portfolio of standards and resources to enable use and promote adoption. The grand vision is to ease data management and enable data exchange, shielding researchers from unnecessary complexity, and ultimately facilitate data analysis.

The International Society for Biocuration (ISB) and the BioSharing initiative have already taken steps in this direction by i) producing BioDBcore, a community-defined, uniform, generic description of the core attributes of biological databases and ii) releasing a first prototype of “one-stop shop” BioSharing catalogue for those seeking data sharing policy documents and information about the standards. BioDBcore has been developed and published with a wider list of contributors, including NAR and DATABASE editors and published in both journals² (Gaudet et al. 2011). The catalogue of policies and standards, an outcome of a Special Interest Group (SIG)³ at ISMB 2010, works to link to exiting complementary portals, such as MIBBI⁴, but also open access resources, such as BMC Research Notes and Nature Precedings, with documents or publications on standards, but also standards-compliant systems and research data.

Combined, BioDBcore and BioSharing efforts will enable existing and emerging resources - cataloguing tools and database - to have a uniform system for describing the biological databases and indicating what standards they implement.

Workshop Objectives

At the workshop we will provide updates as to the status of the progresses of how BioDBcore core attributes and the BioSharing catalogues can be used in combination, to unify the description of bio-resources and seek input as to the future of these resources. In addition, the mechanism to exchange the descriptors will be discussed, providing an opportunity for the many interested participants to engage in community assessment and planning for these new efforts. To engage key stakeholders, we will open discussion to emerging efforts working to incorporate BioDBcore descriptors and link to relevant standards catalogued by BioSharing.

References

1. Field, Sansone et al. ‘Omics Data Sharing, Science 9, 234 (2009).

2. Gaudet, Bairoch, Field, Sansone et al. Towards BioDBcore: a community-defined information specification for biological databases. Database 4 2011:baq027; and Nucleic Acids Res, 39(Database issue):D7-10 (2011).

3. Field, Sansone et al. Meeting Report: BioSharing at ISMB 2010,Standards in Genomic Sciences, 3 (2010).

4. Taylor, Field, Sansone et al. MIBBI: A Minimum Information Checklist Resource, Nat. Biotechnol. 26, 889 (2008).

Agenda

Time	Presentation Title	Presenting Author
2:30 p.m. - 3:00 p.m.	Unifying Bio-Resources Descriptors - Introduction & BioSharing [abstract]	Susanna-Assunta Sansone, Sansone, University of Oxford Dawn Field, UK NERC Bioinformatics Data Center
3:00 p.m. - 3:15 p.m.	BioDBcore: a community-defined information specification for biological databases	Pascale Gaudet, Swiss Institute of Bioinformatics
3:15 p.m. - 3:30 p.m.	BioDBcore: moving toward implementation	Philippe Rocca-Serra, University of Oxford
3:30 p.m. - 4:25 p.m.	Unifying Bio-Resources Descriptors - Panel Discussion	- Nicolas Le Novère, MIRIAM resource - Naomi Attar, BioMedCentral - Myles Axton, Nature Publishing Group - Adriaan Klinkenberg, Elsevier - Rebecca Lawrence, F1000 - Francis Ouellette, DATABASE and Bioinformatics Link Directory - Jeffrey Grethe, Neuroscience Information Network

Unifying Bio-Resources Descriptors

This workshop will bring together developers, curators, journal editors and researchers to discuss on the growing number of (closely related efforts) developing to catalogues of tools, databases, related data and publications. The focus on the workshop is a strawman uniform system for describing these bio-resources, in particular, indicating in a consistent manner which community-defined standards (minimal information checklists, terminologies and exchange formats) they implement.

Such uniform system will i) assist the research and bioinformatics communities to locate and access the information distributed in these bio-resources and ii) inform journal editors and funders, implementing data preservation, management and sharing policies, when recommending or requiring that certain standards are met and that data is deposited to a public database.

BioSharing: http://biosharing.org
bioDBcore: http://biocurator.org/biodbcore.shtml

top

Workshop 5: Workshop on Education in Bioinformatics (WEB 2011)

Organizer(s):

Michelle Brazas, Ontario Institute for Cancer Research, Canada

Fran Lewitter, Whitehead Institute for Biomedical Research, United States

Andrea Schafferhans, TU München, Germany

Vicky Schneider, EMBL-EBI, United Kingdom

Date: Tuesday, July 19

Time: 10:45 a.m. – 12:40 p.m.

Room: Hall N/O

Workshop Overview:

The Workshop on Education in Bioinformatics (WEB 2011) is for ISMB attendees who are involved in training, education and teaching. The workshop will provide a platform to listen from experienced experts and discuss the following topics relevant to the continued advancement of this field:

1) The key elements required of a good bioinformatics tutorial;

2) How bioinformatics tutorials and training materials can be used in remote training;

3) Sharing of training related materials.

Time Table

Time	Speakers/Panellists
Part A: 10:45 a.m. - 11:10 a.m.	Dr. Anna Tramontano Dr. Gary Bader
Part B: 11:15 a.m. - 11:40 a.m.	Panel discussion lead by Dr. Fran Lewitter
Part C: 11:45 a.m. - 12:10 p.m.	Dr. Aidan Budd Dr. Alex Bateman Dr. Vicky Schneider
Part D: 12:15 p.m. - 12:40 p.m.	Panel discussion lead by Dr. Michelle Brazas (Panel composed of all WEB11 contributing speakers)

Part A: Effective Bioinformatics Training: what makes a good tutorial

Bioinformatics Training is challenging not only because of the interdisciplinary nature of the field and audience, but also because the current rapid technological advances demand new and increasingly complex bioinformatics tools. Educational materials must be constantly updated and produced to meet training needs. Solutions that increase development and dissemination of useful and effective learning tools, and increase access and uptake of bioinformatics training would benefit the bioinformatics community as a whole.

Increased exposure of bioinformatics training content in peer-reviewed journals would benefit bioinformatics training efforts as well as the scientific community at large. Experts in Bioinformatics training will present their views. This will be followed by a discussion of what constitutes a ‘good tutorial’ which would help define the landscape for the development and dissemination of effective bioinformatics training material in the future.

Part B: Panel Discussion, Gary Bader, Anna Tramontano, Andrea Schafferhans, Moderator: Fran Lewitter

Part C: Effective sharing of training efforts: from the materials to the live session

As effective training materials are developed and disseminated, such materials may be used to expand bioinformatics training to researchers in more remote locations where face-to-face training is costly and prohibitive. Speakers will discuss their own experiences with remote training, as well as using Wikipedia and other databases to share training materials. A discussion of how technology enables such training would follow. The aim will be to define the landscape for expanding bioinformatics education and awareness to the wider scientific community.

Part D: Panel Discussion, Gary Bader, Alex Bateman, Aidan Budd, Vicky Schneider, Anna Tramontano, Moderator: Michelle Brazas

Panellist Chairs:

Dr. Fran Lewitter

Dr. Michelle Brazas

Confirmed speakers include:

Dr. Gary Bader (University of Toronto)

Dr. Alex Bateman (Wellcome Trust Sanger Institute)

Dr. Aidan Budd (EMBL-Heidelberg)

Dr. Anna Tramontano (Universita La Sapienza)

Dr. Vicky Schneider (EMBL-EBI)

top

Workshop 6: Genomics for Non-Model Organisms

Organizer(s):

Dave Clements, Emory University, United States

Date: Tuesday, July 19

Time: 2:30 p.m. – 4:25 p.m.

Room: Hall N/O

Workshop Overview:

This workshop will demonstrate implemented, successful approaches to several common challenges when working with genomic data from non-model organisms. Early genomics work focused on well established model organisms and built on the availability of robust resources such as high-quality annotated reference genomes and manually curated online model organism databases. This workshop will highlight recent advances in working with genomic data when you don't have access to the rich resources that come with well-established model organisms.

2:30 p.m. - 2:55 p.m. Using RAD-seq for genomic and transcriptomic studies of non-model organisms, Julian Catchen
3:00 p.m. - 3:25 p.m. Using RNA-seq for gene annotation, quantitation, and function in non-model organisms, Jeremy Goecks
3:30 p.m. - 3:55 p.m. Development of a workflow for SNP detection in grapevine species: MAPHiTS, Marc Bras
4:00 p.m. - 4:25 p.m. Repeatable plant pathology bioinformatic analysis: not everything is NGS data, Peter Cock

Part A: Using RAD-seq for genomic and transcriptomic studies of non-model organisms
Julian Catchen (presenting) and William Cresko, University of Oregon, United States

We have developed novel next generation sequencing protocols for high-throughput genotyping and allele-specific profiling in organisms with or without prior genomic resources. Our technique, Restriction-site Associated DNA sequencing (RAD-seq), focuses high-throughput sequencing effort on regions of the genome adjacent to restriction enzyme recognition sites. RAD-seq enables the construction of genetic maps, genome contig tiling, QTL and association mapping, and studies of evolutionary processes in natural populations. We will discuss 1) the suite of RAD molecular biology techniques with examples from non-model organisms; 2) analytical and computational tools that we have developed, encapsulated in the software package Stacks, for the analysis of RAD-seq; and 3) the major remaining challenges for applying techniques like RAD to studies in non-model organisms.

Part B: Using RNA-seq for gene annotation, quantitation, and Function in non-model organisms
Jeremy Goecks, Emory University, United States

A primary use of RNA-seq data from a non-model organism is to gain insight into the organism's genes. Specifically, it is often useful to annotate a genome assembly with gene locations; also, regardless of whether a genome assembly is available, it is useful to quantitate the expression of genes and predict their likely function. Using multiple tools, including Cufflinks, Trans-ABySS, and Galaxy, we have developed methods and pipelines for performing gene annotation, quantitation, and predicting function. We will discuss a case study where we have analyzed RNA-seq data from four related organisms to characterize the similarities and differences in gene expression amongst the organisms.

Part C: Development of a workflow for SNP detection in grapevine species: MAPHiTS
Marc Bras, INRA-URGI, Centre de Recherche de Versailles-Grignon, France

A Single-Nucleotide Polymorphism (SNP) is a DNA sequence variation. It can be used as a marker to characterize genetic variations. They can be used to detect complex traits such as those involved in diseases resistance or agronomical performance. The URGI platform developed a workflow for SNPs detection from short reads (MAPHiTS: Mapping Analysis Pipeline for High-Throughput Sequences), integrated in the Galaxy workflow manager. Galaxy allows, through a web page, to chain different tools graphically. In addition, a large number of workflows can be built and shared. MAPHiTS workflow is able to deliver all SNPs and small indels found in the data set and to filter them according to various parameters such as the genome coverage, the allele frequency and pValue.

Part D: Repeatable plant pathology bioinformatic analysis: not everything is NGS data.
Peter Cock (presenting), Leighton Pritchard, James Hutton Institute (formerly SCRI, Scottish Crop Research Institute), United Kingdom

The James Hutton Institute's Plant Pathology programme studies a wide range of biological kingdoms and diverse pathogens of great social impact. We are interested in high throughput sequencing data, but our principal use for Galaxy is to provide a framework to deliver standardised, reusable components for repeatable bioinformatic analyses of assembled sequences and gene sets. We aim to empower bench scientists to carry out their own complex bioinformatic analyses and construct workflows. We present examples of the integration of bioinformatic tools using the Galaxy framework, and common workflows for plant pathology, such as identifying candidate proteins that are delivered by pathogens into their hosts - an important mechanism for pathogens and parasites.

top