Industry Posters - ISMB 2014

IP01 - Accurate Structural Variant Detection and Utilization in Comprehensive Clinical Interpretation
Scientific Area: Genetic Variation Analysis

Presenting author: Ming Li, Personalis, United States

Additional authors:
Stephen Chervitz, Personalis, United States
Daniel Newburger, Personalis, United States
Sarah Garcia, Personalis, United States
Gemma Chandratillake, Personalis, United States
Michael Clark, Personalis, United States
Nan Leng, Personalis, United States
Jason Harris, Personalis, United States
Mark Pratt, Personalis, United States
Michael Snyder, Personalis, United States
John West, Personalis, United States
Richard Chen, Personalis, United States

Presentation Overview: Show/Hide


Genomic structural variants (SVs) – inversions, translocations, deletions, and duplications, play an important role in understanding genetic. It is a challenging task to accurately detect and characterize SVs in genomic sequence data. Here we present an approach to integrate orthogonal algorithms with targeted local reassembly to improve SV detection performance. Also we determine the genomic context, zygosity and exact breakpoints of the SVs when possible. Identified SVs are annotated and ranked based on biomedical relevance and predicted likelihood of causing disease using public and proprietary databases.

The performance of our SV detection approach was assessed by analyzing deletions from both simulated and experimental genome sequencing data. With simulated data at approximately 46X coverage, the sensitivity and FDR were 96.3% and 1.4% respectively, compared to 55.6% and 27.6% average for the SV detection methods used independently. With experimental sequencing data for a trio, a gold standard SV set is constructed and vetted by pedigree consistency. The average sensitivity for SV detection on this data was 96.8% and the FDR was 1.4%, consistent with the results from simulation.

We demonstrated the utility of our SV calls for medical interpretation by using our method to identify, annotate and prioritize SVs in samples known to harbor pathogenic SVs. Utilizing our knowledge-based ranking system for disease variant discovery, we demonstrate our ability to integrate SVs with SNVs and indels to correctly detect a known, causative compound heterozygous mutation in the ATM gene.



IP02 - Variant detection in tumor samples through PCR-based enrichment and Next-Generation Sequencing
Scientific Area: Bioinformatics of Disease and Treatment

Presenting author: Sivakumar Gowrisankar, Novartis Institutes for BioMedical Research, United States

Additional authors:
Zachary Zwirko, Novartis Institutes for BioMedical Research, United States
Vera Ruda, Novartis Institutes for BioMedical Research, United States
Yanqun Wang, Novartis Institutes for BioMedical Research, United States
Oleg Iartchouk, Novartis Institutes for BioMedical Research, United States

Presentation Overview: Show/Hide


High-throughput genetic profiling of tumor tissues especially those that are formalin fixed and paraffin embedded (FFPE) are highly limited by sensitivity and specificity of assays. This has been due to a wide array of issues such as low DNA starting material, DNA degradation, tumor heterogeneity to name a few. Several methods have been proposed to profile mutations within tumor samples such as the targeted hybrid capture and PCR-based amplicons enrichment. Hybridization based approaches have the caveat of requiring higher input starting material and complicated workflows. Most PCR-based approaches have been known to suffer from high false positives due to the inability to remove PCR-duplicates. On the other hand whole-genome and exome sequencing are still prohibitively expensive to employ on large-scale studies to characterize tumor samples.

We here present a tumor profiling approach based on PCR-based amplification of selected genes followed by next-generation sequencing. We first randomly barcode PCR-products by adaptor ligation, followed by PCR-amplification and subsequent sequencing. This approach has the distinct advantages of requiring lower DNA starting material, simple workflow, and ability to distinguish PCR-duplicates. In addition the uniquely barcoded reads can be used to reduce false positives. The high correlation of read distribution between tumor-normal or tumor-resistant tumor samples yield itself to reliable copy number variant (CNV) detection. In this poster we provide the results on sensitivity, specificity and CNV detection on 24 paired and pooled control samples to demonstrate the utility of this approach.


IP03 - A publication model that aligns with the key Open Source Software principles

Presenting author: Michael Markie, F1000Research, United Kingdom
Presentation Overview: Show/Hide


In recent years, software development has had a significant impact on scientific research and continues to play a major role in facilitating advances with the life sciences in particular. Building code using open repositories such as GitHub allows it to be continually improved both during the development phase and after the software has been more widely disseminated. However, the long term availability of code is important in order to be reproducible, and to enable future scientific research which may require further modification of existing code1. Documentation of code for scholarly purposes usually takes the form of a publication in a peer reviewed article. This allows the developer to provide context around their code for both fellow programmers and non-computational users. A published paper also contributes to the developer’s formal academic output but also helps foster vibrant collaborative communities that help nurture and spread new ideas as well as reinforcing the quality of the code that is produced.

Releasing information in incremental steps is nothing new to software developers, who regularly release updates and patches that add new functionality to existing programmes. The launch of a new bioinformatics tool is often accompanied by a paper describing the software for new users. However, the paper describing the tool will be out-of-date as soon as a new software update is released but the changes are often not significant enough to warrant a whole new paper, and thus the most recent developments go undocumented for a sustained period of time. Trying to publish such dynamic information in traditional ‘static’ journals is much like fitting a square peg in a round hole.

The F1000Research ( publishing model is much more in synch with the way software is developed. Each software tool published can be updated at any time as a new version (clearly linked to the original and previous versions of the article) allowing any new code, tweaks and features to be documented with relative ease. Furthermore, F1000Research ensures that all the code and related data are freely available from the paper. A usable copy of the code as it was at the time of publication remains available, with the code being forked into an archival F1000Research space within the same repository used by the authors. A copy of the code as at the time of publication is also assigned a persistent identifier to eliminate any ambiguity about the code that is described in the article. Additionally, F1000Research ensures the paper includes a link to the author’s own working repository, so that readers can easily navigate to the latest version of the source code. By taking these measures, users are able to establish the provenance of the code and reuse it easily, hence supporting the reproducibility of the software, which ultimately contributes to making the software more robust. F1000Research also uses open peer review, providing an additional layer of validation for published software articles. Experts from the scientific community are invited to constructively critique the software and lay the foundations for any improvements. Having these reviews, together with any user comments, open to everyone helps to mirror the collaborative approach encouraged by open source initiatives and embraces the open source community' ethos.

By aligning with the requirements of publishing software, F1000Research has started to encourage computational science software developers to create an F1000Research Article Collection to augment their open source software projects. In February 2014, we launched the BioJS Collection which comprises individual software components, each of which are like a standard Lego-like pieces for building web applications that visualise biological data4.

With this poster, we will discuss the novel requirements associated specifically with the needs of articles associated with open source software development, and discuss new publishing opportunities that better reflect and support those needs for the benefit of both software developers and scientific researchers as a whole.

IP04 - Analysis of 8,000 cancer exomes from the Oncomine® Knowledge Base to identify NFE2L2 pathway as a novel therapeutic opportunity in multiple cancer types.
Scientific Area: Bioinformatics of Disease and Treatment

Presenting author: Nickolay Khazanov, Thermo Fisher Scientific, United States

Additional authors:
Sean Eddy, Thermo Fisher Scientific, United States
Marry Ellen, Thermo Fisher Scientific, United States
Jia Li, Thermo Fisher Scientific, United States
Mark Tomilo, Thermo Fisher Scientific, United States
Dinesh Cyanam, Thermo Fisher Scientific, United States
Armand Bankhead, Thermo Fisher Scientific, United States
Sarah Anstead, Thermo Fisher Scientific, United States
Nikki Bonnevich, Thermo Fisher Scientific, United States
Becky Steck, Thermo Fisher Scientific, United States
Peter Wyngaard, Thermo Fisher Scientific, United States
Seth Sadis Thermo Fisher Scientific, United States
Emma Bowden Thermo Fisher Scientific, United States
Bryan Johnson Thermo Fisher Scientific, United States
Dan Rhodes Thermo Fisher Scientific, United States

Presentation Overview: Show/Hide


To reduce late-stage drug attrition in oncology, it is critical to identify appropriate drug targets and pre-clinical models. NGS analysis of cancer exomes provides a comprehensive assessment of alterations; however discerning rare driver events from abundant passenger aberrations remains a challenge. To maximize the value of NGS, it is imperative to delineate the driver alterations and annotate them for clinical relevance.

Here we present our framework for mining the multi-dimensional NGS data in the Oncomine® Knowledge Base for candidate driver lesions across dozens of cancer types and candidate drug targets. An integrative framework was designed to compute associations among driver mutations, fusions and copy alterations to define the driver aberration landscape of common cancers, then correlate the drivers to clinical metadata. Genes were ranked through associations with patient survival, and potential clinical actionability.

We verified the majority of known driver genes across samples from major cancer types, and nominated novel infrequently altered potential drivers. We found strong evidence implicating NFE2L2 as an oncogene. Recurrent NFE2L2 mutations were found in samples from multiple cancer types and associated with poor outcome in head and neck squamous cell carcinoma. We also investigated KEAP1, a repressor of NFE2L2 activity. Mutations in KEAP1 tended to localize within the NFE2L2 binding domains and did not co-occur with NFE2L2 recurrent mutations. Genes up-regulated in NFE2L2 or KEAP1 mutant samples significantly associated with genes up-regulated in chemotherapy-resistant cell lines. Using cell line exome data we were also able to identify cell lines representative of samples from clinical populations containing the significant mutations.



Analysis of 8,000 cancer exomes from the Oncomine® Knowledge Base to identify NFE2L2 pathway as a novel therapeutic opportunity in multiple cancer types.