Late Breaking Research Presentations

Highlights, Late Breaking Research and Proceedings Track presentations will be presented by Theme.

Data:

Includes data and text-mining, ontologies, databases and machine learning approaches that do not fit in other categories.

Disease:

Includes analysis of mutations, phenotypes, drugs, epidemiology and other clinically relevant areas

Genes

Includes work in genes (including non-coding RNA), transcriptomes, genomes and variation.

Proteins:

Includes analysis of proteins and their structures and proteomics.

Systems:

This theme includes higher level systems such as cells, tissues, whole organisms and ecosystems. Includes systems biology, molecular interactions and genetic regulation.

Other:

Research areas that do not fall within the five (5) main thematic areas. The organizers may, at their discretion, move submissions to other thematic areas.

Data
Presenting author: Tudor Groza, Garvan Institute of Medical Research, Australia
Date:Tuesday, July 14 10:10 am - 10:30 amRoom: Liffey Hall 2

Additional authors:
Tudor Groza, Garvan Institute of Medical Research, Australia
Sebastian Köhler, Charité-Universitätsmedizin Berlin, Germany
Dawid Moldenhauer, Charité-Universitätsmedizin Berlin, Germany
Nicole Vasilevsy, Oregon Health & Science University, United States
Gareth Baynam, King Edward Memorial Hospital, Australia
Lynn Schriml, University of Maryland School of Medicine, United States
Warren Kibbe, National Cancer Institute, United States
Tim Beck, University of Leicester, United Kingdom
Anthony Brookes, University of Leicester, United Kingdom
Andreas Zankl, The Children's Hospital at Westmead, Australia
Nicole Washington, Lawrence Berkeley National Laboratory, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Melissa Haendel, Oregon Health & Science University, United States
Peter Robinson, Charité-Universitätsmedizin Berlin, Germany

Area Session Chair: Ioannis Xenarios

Presentation Overview:
Deep phenotyping, the precise and comprehensive analysis of individual phenotypic abnormalities for the purpose of translational research, diagnostics, or personalized care, depends on computational resources to capture the phenotype of patients or diseases and integrate it with other relevant information such as genomic variation. The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence variation data, and translational research, but a comparable resource has not been available for common disease. This presentation introduces disease models for 3,145 common human diseases comprising a total of 132,006 annotations to terms of the HPO, which enabled us to build a common disease phenotypic network, as well as to study the phenotypic and genetic overlap across common diseases.
TOP

Presenting author: Noah Daniels, Massachusetts Institute of Technology, United States
Date:Tuesday, July 14 11:40 am - 12:00 pmRoom: Liffey Hall 2

Additional authors:
Y. William Yu, MIT, United States
Bonnie Berger, MIT, United States
David Danko, MIT, United States
Noah Daniels, MIT, United States

Area Session Chair: Ioannis Xenarios

Presentation Overview:
The continual onslaught of new omics data has forced upon scientists the fortunate problem of having too much data to analyze. Luckily, it turns out that many datasets exhibit well-defined structure that can be exploited for the design of smarter analysis tools. We introduce an entropy-scaling data structure—which given a low fractal dimension database, scales in both time and space with the entropy of that underlying database—to perform similarity search, a fundamental operation in data science. Using these ideas, we present accelerated versions of standard tools for use by practitioners in the three domains of high-throughput drug screening, metagenomics, and protein structure search, none of which have any loss in specificity or significant loss in sensitivity: a 12x speedup of small molecule similarity search (SMSD) with less than 4% loss in sensitivity; a 673x speedup of BLASTX with less than 5% loss in sensitivity; and a 10x speedup of protein structure search (FragBag) with less than 0.2% loss in sensitivity.
TOP

Disease
Presenting author: Malachi Griffith, Washington University, United States
Date:Sunday, July 12 2:40 pm - 3:00 pmRoom: The Auditorium

Additional authors:
Malachi Griffith, Washington University, United States
Christopher Miller, Washington University, United States
Obi Griffith, Washington University, United States
Kilannin Krysiak, Washington University, United States
Zachary Skidmore, Washington University, United States
Avinash Ramu, Washington University, United States
Jason Walker, Washington University, United States
Ha Dang, Washington University, United States
Lee Trani, Washington University, United States
David Larson, Washington University, United States
Ryan Demeter, Washington University, United States
Michael Wendl, Washington University, United States
Rachel Austin, Washington University, United States
Vincent Magrini, Washington University, United States
Sean McGrath, Washington University, United States
Amy Ly, Washington University, United States
Shashikant Kulkarni, Washington University, United States
Joshua McMichael, Washington University, United States
Matt Cordes, Washington University, United States
Catrina Fronick, Washington University, United States
Robert Fulton, Washington University, United States
Christopher Maher, Washington University, United States
Li Ding, Washington University, United States
Jeffery Klco, Washington University, United States
Elaine Mardis, Washington University, United States
Timothy Ley, Washington University, United States
Richard Wilson, Washington University, United States

Area Session Chair: Yana Bromberg

Presentation Overview:
Tumors are typically sequenced to depths of 75-100x (exome) or 30-50x (whole genome). We demonstrate that current sequencing paradigms based on this coverage are inadequate for tumors that are impure, aneuploid, and/or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312x) whole genome sequencing and exome capture (up to ~433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested 7 alignment algorithms and 7 single-nucleotide variant callers, and validated ~200,000 putative SNVs by sequencing them to mean depths of ~1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ~250,000x sampling of selected sites (of up to 2 ug of input DNA per assay). Using these data, we evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource.
TOP

Presenting author: Thomas LaFramboise, Case Western Reserve University, United States
Date:Monday, July 13 2:20 pm - 2:40 pmRoom: The Auditorium

Area Session Chair: Paul Horton

Presentation Overview:
Somatic mitochondrial DNA (mtDNA) mutations accumulate in human cancers, although the mutations’ roles in tumorigenesis are unclear and subject to some debate. In contrast to the nuclear genome’s two copies per cell, the mitochondrial genome – although very small at 16,568 bp – is typically present at hundreds to thousands of copies per cell. This complicates analysis of mtDNA level variants since they may be present at a continuous range of abundances between 0% and 100%, as opposed to the 0%, 50% or 100% discrete levels for nuclear genome variants. Furthermore, the per-cell copy number of the mitochondrial genome often shifts dramatically between the tumor and surrounding normal tissue, although the reasons for this phenomenon and its role, if any, in tumor development are unclear. To address these issues in a rigorous manner, we perform an analysis of cancer-specific mutational patterns and copy number changes in the whole mitochondrial genomes of 7,817 patient samples across 14 tumor types. We develop and apply statistical tests to query selection for somatic and inherited variants in mitochondrial DNA. We find specific tumor types and specific genes that show particularly prominent signals of positive selection. Since selection implies function, our results support the role of mtDNA mutations as causative factors in the initiation and development of human cancer.
TOP

Presenting author: Xinghua Lu, University of Pittsburgh,
Date:Monday, July 13 2:40 pm - 3:00 pmRoom: The Auditorium

Additional authors:
Xinghua Lu, University of Pittsburgh, United States

Area Session Chair: Paul Horton

Presentation Overview:
An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usu-ally do not co-occur in a tumor. With some success, this property has been utilized as an objective function to guide the search for driver mutations. However, the mutual exclusivity alone is not suffi-cient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a nov-el, signal-oriented framework for identifying driver SGAs, such that our new method constrains the mutual exclusivity only on tumors that have SGAs to perturb a common signal (not on all tumors as previous methods used). We apply this framework to the OV and GBM data from TCGA, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways.
TOP

Presenting author: Chad Myers, University of Minnesota, United States
Date:Monday, July 13 3:30 pm - 3:50 pmRoom: The Auditorium

Additional authors:
Scott Simpkins, University of Minnesota, United States
Justin Nelson, University of Minnesota, United States
Jeff Piotrowski, University of Wisconsin-Madison, United States
Raamesh Deshpande, University of Minnesota, United States
Sheena Li, RIKEN, Japan
Jacqueline Barber, RIKEN, Japan
Hamid Safizadeh, University of Minnesota, United States
Reika Okamoto, RIKEN, Japan
Mami Yoshimura, RIKEN, Japan
Tamio Saito, RIKEN, Japan
Hiroyuki Osada, RIKEN, Japan
Minoru Yoshida, RIKEN, Japan
Charles Boone, University of Toronto, Canada
Chad Myers, University of Minnesota, United States

Area Session Chair: Paul Horton

Presentation Overview:
As an alternative to“target-centric” approaches to drug discovery, we have developed an ultra high-throughput yeast chemical genomics assay that allows the prediction of a compound’s gene- and process-level targets across the entire genome. This methodology provides a novel and informative way to screen compounds for specific bioactivities. This methodology was applied to screen more than 13,000 compounds with diverse origins (synthetic, natural product and derivative, and clinically-relevant compounds). We obtain high confidence process-level predictions for over 10% of the screened compounds. At the current level of throughput, we can screen more than 10,000 compounds and generate genome-wide target predictions within a few months’ time, demonstrating that we have developed an efficient, high-throughput method to assess genome-wide bioactivities.
TOP

Presenting author: Gaurav Chopra, University of California, San Francisco, United States
Date:Monday, July 13 4:10 pm - 4:30 pmRoom: The Auditorium

Additional authors:
Gaurav Chopra, UCSF & SUNY-Buffalo, United States
Ram Samudrala, SUNY-Buffalo, United States
Ram Samudrala, State University of New York, Buffalo, United States

Area Session Chair: Paul Horton

Presentation Overview:
We have developed a Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando/) funded by a 2010 NIH Director's Pioneer Award that analyzes compound-proteome interaction signatures to determine drug behavior, in contrast to traditional single (or few) target approaches. Our platform implements a modeling pipeline that generates an interaction matrix between 3,733 human approved drugs and 48,278 proteins using a hierarchical chem- and bio-informatic fragment-based docking with dynamics protocol (~ 1 billion predicted interactions evaluated, considering multiple binding sites per protein). The platform then uses similarity of interaction signatures across all proteins indicative of similar functional behavior and nonsimilar signatures for off- and anti-target (side) effects, in effect inferring homology of compound/drug behavior at a proteomic level. The benchmarking accuracy using this approach to rank compounds for over 650 indications/diseases is ~36%, in contrast to accuracies of ~0.2% obtained when using scrambled control matrices. We prospectively validated “high value” predictions in vitro and in vivo preclinical studies for more than a dozen indications, including type 1 diabetes, herpes, dental caries, dengue, tuberculosis, malaria, hepatitis B, and different cancers. Our drug prediction accuracy is ~35% across the nine indications, where 57/162 compounds validated thus far show comparable or better activity than an existing drug, or micromolar inhibition at the cellular level, and serve as novel repurposeable therapies. Our approach is broadly applicable beyond repurposing, enables personalized and precision medicine, and foreshadows a new era of faster, safer, and cheaper drug discovery.
TOP

Presenting author: Andreas Beyer, University of Cologne, Germany
Date:Tuesday, July 14 2:00 pm - 2:20 pmRoom: The Auditorium

Additional authors:
Andreas Beyer, University of Cologne, Germany
Betty Friedrich, ETH Zurich, Switzerland
Michael Seifert, TU Dresden, Germany

Area Session Chair: Natasa Przulj

Presentation Overview:
Copy number alterations (CNAs) of large genomic regions are frequent in many tumor types, but only few of them are assumed to be relevant for the cancerous phenotype. It has proven exceedingly difficult to ascertain rare mutations that might have strong effects in individual patients. Here, we show that a genome-wide transcriptional regulatory network inferred from gene expression and gene copy number data of 768 human cancer cell lines can be used to quantify the impact of individual patient-specific gene CNAs on cancer-specific survival signatures. The model was highly predictive for gene expression in 4,548 clinical samples originating from 13 different tissues. Focused analysis of tumors from six tissues revealed that in an individual patient a combination of up to 100 gene CNAs directly or indirectly affected the expression of clinically relevant survival signature genes. Importantly, rare patient-specific mutations (< 1% in a given cohort) often had stronger effects on signature genes than frequent mutations. Subsequent integration with genomic data suggests that frequency variation among high-impact genes is mainly driven by gene location rather than gene function. Our framework contributes to the individualized quantification of cancer risk, along with determining individual key risk factors and their downstream targets.
TOP

Presenting author: Roland Schwarz, European Molecular Biology Laboratory - European Bioinformatics Institute, United Kingdom
Date:Tuesday, July 14 2:20 pm - 2:40 pmRoom: The Auditorium

Additional authors:
Roland Schwarz, European Molecular Biology Laboratory, United Kingdom

Area Session Chair: Natasa Przulj

Presentation Overview:
Accurate reconstruction of the evolutionary history of cancer in the patient and quantification of intra-tumour heterogeneity are current challenges in cancer genomics. The accuracy of tree inference from genomic rearrangements depends on the quality of the phasing of copy-numbers: the assignment of major and minor copy-numbers to the two physical parental alleles. So far phasing has been done using evolutionary criteria alone, a heuristic and computationally expensive procedure which impedes probe-level resolution tree reconstruction.

We here present a novel phasing algorithm, which extends our previous work on allele-specific segmentation of copy-numbers. Using the shared genetic background of multiple samples from the same patient we assign copy-numbers to physical alleles based on the bi-allelic frequency distribution of heterozygous SNPs. In combination with our previously established evolutionary phasing algorithm this provides a new, accurate and fast phasing method which leverages the available SNP data effectively. This is a crucial step towards probe-level resolution tree inference on genomic rearrangement events in cancer and exact quantification of genetic heterogeneity for routine applications in translational cancer research.
TOP

Presenting author: Edith Ross, University of Cambridge, United Kingdom
Date:Tuesday, July 14 2:40 pm - 3:00 pmRoom: The Auditorium

Additional authors:
Edith Ross, University of Cambridge, United Kingdom
Florian Markowetz, University of Cambridge, United Kingdom

Area Session Chair: Natasa Przulj

Presentation Overview:
Tumour evolution leads to genetic intra-tumour heterogeneity, which poses major challenges to cancer therapy. While this heterogeneity has been documented in several cases, many details of the underlying evolutionary processes are still unknown.

Studying pathways of tumour evolution promises to provide insights into early stages of cancer development and to allow predictions about whether or not early-stage tumours are likely to progress to more aggressive forms. So far, most methods for inferring tumour phylogenies use bulk sequencing data. However, they struggle to deconvolute the mixed signal into separate clones and their corresponding genotypes.

Here, we present oncoNEM, a probabilistic method for inferring intra-tumour evolutionary lineage trees from noisy exome- or genome-wide single-cell sequencing data. OncoNEM is based on the nested structure of mutations observed between cells and jointly infers the tree structure, the number of clones and their composition.

We evaluate the accuracy of oncoNEM in the controlled setting of a simulation study and demonstrate that (i) our method can accurately infer trees of tumour evolution despite the high allelic dropout rates of current single-cell sequencing technologies, (ii) it is robust to inaccuracies in the estimation of model parameters and (iii) it substantially outperforms competing methods.
TOP

Presenting author: Benjamin Hescott, Tufts University, United States
Date:Monday, July 13 10:30 am - 10:50 amRoom: Liffey Hall 2

Additional authors:
Inbar Fried, Tufts University, United States
Anthony Cannistra, Tufts University, United States
Carter Casey, Tufts University, United States
Adam Piel, Tufts University, United States
Mark Crovella, Boston University, United States
Benjamin Hescott, Tufts University, United States

Area Session Chair: Yves Moreau

Presentation Overview:
In this work we shift focus in the global network alignment problem, moving away from identifying local structural similarities, and focusing instead on finding coherent, functionally related groups of genes across species. We introduce CANDL — Coarsely Aligning Networks with Diffusion and Landmarks. Unlike previous methods that seek to conserve local motifs, CANDL identifies neighborhoods that are functionally similar. To do this, CANDL incorporates two key innovations. First, it uses a small set of known homologs to establish a set of landmarks that form the basis for a metric embedding of network nodes. Second, CANDL embeds the network using metrics known to capture functionally-relevant network structure, namely random walk commute time and eigenvectors of the Laplacian heat kernel. We show that CANDL captures functionally coherent neighborhood mappings considerably better than current state of the of art aligners. To do so we introduce two new validation tests based of functional coherence: cross validation using known homologs, and similarity of GO terms in neighborhoods. In the process, we also identify and quantify previously overlooked limitations of structural network alignment techniques that arise due to network automorphisms.
TOP

Presenting author: Michiel Adriaens, Maastricht University, Netherlands
Date:Monday, July 13 12:00 pm - 12:20 pmRoom: Liffey Hall 2

Additional authors:
Michiel Adriaens, Maastricht University, Netherlands
Aida Moreno-Moral, Imperial College London, United Kingdom
Elisabeth Lodder, AMC, Netherlands
Carol Ann Remme, AMC, Netherlands
Rianne Wolswinkel, AMC, Netherlands
Enrico Petretto, Imperial College London, United Kingdom
Stuart Cook, Imperial College London, United Kingdom
Connie Bezzina, AMC, Netherlands

Area Session Chair: Yves Moreau

Presentation Overview:
Genome-wide association studies have identified many common genetic variants impacting on susceptibility to cardiac arrhythmias and sudden cardiac death (SCD). But uncovering the underlying disease mechanisms remains a substantial challenge, as the required resources for the human heart are sparse and underpowered. Hence, the only means to paint the full picture is to complement insights derived from human studies with systems genetics approaches in statistically powerful animal models. In this study we use 29 BXH/HXB recombinant inbred (RI) rat strains, a strong model to uncover the mechanisms modulating cardiac electrical function. Prolonged ECG indices of conduction and repolarization are risk factors for cardiac arrhythmias and SCD, and here we combine such indices with genotyping and RNA-seq transcriptomics data. In this data we hunt for quantitative trait loci (QTL): genetic markers associated with changes in a quantitative trait, i.e. an ECG index or gene expression level. Using a Bayesian systems genetics framework, we identified multiple candidate genes and networks. One of these genes is Acbd4: a nearby genetic marker appears to modulate the expression of this gene (eQTL). Additionally, the same marker is associated with PR prolongation (ecgQTL). The protein product of Acbd4 plays a role in vesicle formation, deregulation of which is known to be linked to heart disease. Acbd4’s co-expression network is significantly positively correlated with PR duration and partly conserved in human, suggesting that the underlying mechanism may be of clinical relevance as well. Validation of our findings is currently ongoing.
TOP

Presenting author: Tianyun Liu, Stanford University, United States
Date:Monday, July 13 12:20 pm - 12:40 pmRoom: Liffey Hall 2

Area Session Chair: Yves Moreau

Presentation Overview:
We identified molecular mechanisms of drug side-effects by associating drugs to essential proteins using canonical component analysis.
TOP

Genes
Presenting author: Rene Warren, BC Cancer Agency, Canada
Date:Sunday, July 12 10:50 am - 11:10 amRoom: The Liffey A

Additional authors:
Rene Warren, BC Cancer Agency, Canada
Benjamin Vandervalk, BC Cancer Agency, Canada
Steven Jones, BC Cancer Agency, Canada
Inanc Birol, BC Cancer Agency, Canada

Area Session Chair: Siu Ming Yiu

Presentation Overview:
Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. Established and emerging long read technologies show great promise in this regard, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they could be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a solution that makes use of the information in error-rich long reads, without the need for read alignment or base correction. We show how the contiguity of an ABySS E. coli K-12 genome assembly could be increased over five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. (ONT) long reads, and how LINKS leverages long-range information in S. cerevisiae W303 ONT reads to yield an assembly with less than half the errors of competing applications. Re-scaffolding the colossal white spruce (P. glauca) assembly draft (20 Gbp), we demonstrate how LINKS scales to larger genomes.
Availability: http://www.bcgsc.ca/bioinfo/software/links
TOP

Presenting author: Davide Bau, National Center for Genomic Analysis, Spain
Date:Sunday, July 12 2:00 pm - 2:20 pmRoom: The Liffey A

Area Session Chair: Reinhard Schneider

Presentation Overview:
Advances in genomic technologies and the development of new analytical methods (e.g. Hi-C) have allowed to get better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interacting domains that are conserved among different cell types.
The Drosophila genome is also folded into TADs, which are packaged into a mosaic of five principal chromatin types, defined by a unique combination of proteins. The five types of chromatin differ substantially in their genome coverage, numbers of domains, and numbers of genes [1]. To determine whether these TADs correspond to functional domains defined by epigenetic marks, Hou et al. [2], examined the composition of chromatin types within physical domains, following the 5-colors classification described in [1]. To figure out whether these “chromatin color blocks” have characteristic structural features, we studied the relationship between the 3D architecture of selected regions of the Drosophila genome and their chromatin color. Using Hi-C data at 10 Kb resolution, we found that the analyzed regions have structural features characteristic of their functional signatures. Although with the present data resolution it is not possible to unambiguously distinguish between different chromatin types by simple comparison of their structural features, our results show that different chromatin type have specific structural characteristics that correlate with their functional roles, with active and inactive chromatin type showing significantly different structural characteristics.

[1] Filion et al. Cell, 143(2), 212–224.
[2] Hou et al. Molecular Cell, 48(3), 471–484.
TOP

Presenting author: Kyoung-Jae Won, University of Pennsylvania, United States
Date:Monday, July 13 3:30 pm - 3:50 pmRoom: The Liffey A

Additional authors:
Kyoung-Jae Won, University of Pennsylvania, United States
Inchan Choi, Univerisity of Pennsylvania, United States
Benjamin Garcia, University of Pennsylvania, United States

Area Session Chair: Uwe Ohler

Presentation Overview:
Genome-wide localization analyses using chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) against the four histone variants (H3.1, H3.3, H2A.Z and macroH2A) identified various combinations of histone variants (histone variants codes). While H2A.Z were highly enriched at promoter, H3.3 and H3.1 were observed at the body and the 3’UTR of active genes. While majority of distal regulatory regions were enriched for H3.3 and/or H2A.Z, we newly identified a group of regulatory regions enriched in H3.1 and the histone variant associated with repressive marks macroH2A, indicating that histone variants are deposited at regulatory regions to assist gene regulation. Systematic analysis identified both symmetric and asymmetric patterns of histone variant (H3.3 and H2A.Z) occupancies at intergenic regulatory regions. Strikingly, these directional patterns were associated with RNA Polymerase II (PolII). These asymmetric patterns correlated with the enhancer activities measured by global run-on sequencing (GRO-seq) data. We also showed that enhancers with skewed histone variants patterns well facilitate enhancer activity. Our study indicates that H2A.Z and H3.3 delineate the orientation of transcription at enhancers as observed at promoters.
TOP

Presenting author: Guillaume Filion, Center for Genomic Regulation, Spain
Date:Monday, July 13 3:50 pm - 4:10 pmRoom: The Liffey A

Area Session Chair: Uwe Ohler

Presentation Overview:
Recent genome-wide mapping studies in eukaryotes have shown that most transcriptionally silent domains lack repressive histone marks and repressors of transcription, prompting to ask what makes genes of these regions silent. Here we set out to answer this question by assaying position effects genome-wide for several reporters of transcription. To this end, we used a shotgun approach called TRIP (Thousands of Reporters Integrated in Parallel) to insert identical reporter genes at different loci of the Drosophila genome and measure their expression. We obtained expression data for more than 85,000 integrated reporters under eight different promoters, constituting the largest dataset of position effects available to date. We identified 10-100 kb domains of either high or low reporter activity. These domains are similar for different reporter constructs, showing that they correspond to the underlying organization of the genome. While these domains are similar between constructs, the degree of response to the context of each promoter is variable, yet the constructs are equally permeable to the neighboring chromatin. We identified novel protein signatures associated to the repression of reporter genes. One of them consists of chromatin proteins associated to transcriptionally active regions with a deficit of DMAP1, which suggests that this protein is critical for the expression of reporters. Overall, our results reveal that the effect of the chromatin context on transcription results from multiple processes at work simultaneously.
TOP

Presenting author: Maga Rowicka, University of Texas Medical Branch, United States
Date:Sunday, July 12 2:00 pm - 2:20 pmRoom: The Liffey B

Additional authors:
Maga Rowicka, University of Texas Medical Branch, United States

Area Session Chair: Janet Kelso

Presentation Overview:
Double-stranded DNA breaks (DSBs) are most dangerous form of DNA damage. Despite many studies on the mechanisms of DSB formation, our knowledge of them is very incomplete, due to lack of appropriate techniques to detect DSBs accurately genome-wide. We recently developed a method to label DSBs in situ followed by deep sequencing (BLESS), and used it to map DSBs in human cells with a resolution 2-3 orders of magnitude better than previously achieved. Here, we will show how mathematical modelling and numerical simulations can elucidate and quantify various mechanisms of DSB formation. This paradigm of using in silico experiments as a method of choice for discovery and quantification of global, genome-wide rules and chromatin context dependence should be also beneficial for other systems studied using omics data.
TOP

Proteins
Presenting author: Michael Liam Tress, Spanish National Cancer Research Centre (CNIO), Spain
Date:Monday, July 13 3:50 pm - 4:10 pmRoom: The Liffey B

Additional authors:
Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain
Federico Abascal, Spanish National Cancer Research Centre (CNIO), Spain
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain
Juan Rodriguz, Spanish National Cancer Research Centre (CNIO), Spain
Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), Spain
Iakes Ezkurdia, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Jesus Vazquez, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Spain
Angela del Pozo, Hospital Universitario La Paz, Spain

Area Session Chair: Russell Schwartz

Presentation Overview:
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Although large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, results have been contradictory.

Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing detectable by high-resolution mass spectroscopy. While we identified peptides for almost 64% of human protein coding genes, we detected just 282 splice events. We demonstrate that this is fewer splice events than would be expected, and show that most genes have a single dominant isoform at the protein level.
The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from their frequency in the genome. These homologous exon substitution events were remarkably conserved - all the homologous exons we identified evolved over 460 million years ago - and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing is a clear indication that isoforms generated from homologous exons may have important cellular roles.
TOP

Presenting author: Sanne Abeln, VU University, Netherlands
Date:Tuesday, July 14 10:10 am - 10:30 amRoom: The Liffey B

Area Session Chair: Francisco Melo Ledermann

Presentation Overview:
The hydrophobic effect is the main driving force in protein folding. One can estimate the relative strength of this hydrophobic effect for each amino acid by mining a large set of experimentally determined protein structures. However, the hydrophobic force is known to be strongly temperature dependent. This temperature dependence is thought to explain the denaturation of proteins at low temperatures. Here we investigate if it is possible to extract this temperature dependence directly from a large set of protein structures determined at different temperatures.
Using NMR structures filtered for sequence identity, we were able to extract hydrophobicity propensities for all amino acids at five different temperature ranges (spanning 265-340 K). These propensities show that the hydrophobicity becomes weaker at lower temperatures, in line with current theory. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures. Moreover, this work provides a method for probing the individual temperature dependence of the different amino acid types, which is difficult to obtain by direct experiment.
TOP

Presenting author: John-Marc Chandonia, Lawrence Berkeley National Laboratory, United States
Date:Tuesday, July 14 10:30 am - 10:50 amRoom: The Liffey B

Additional authors:
John-Marc Chandonia, Berkeley National Lab, United States
Steven Brenner, University of California, Berkeley, United States

Area Session Chair: Francisco Melo Ledermann

Presentation Overview:
The number of new protein structures deposited every month in the PDB has steadily increased, and is now at over 750 structures per month. On average, fewer than 15 of these structures (i.e., 2%) represent the first solved structure from a Pfam protein family. Fifteen families per month is the lowest rate at which families have been structurally characterized in nearly 20 years, despite vastly more efficient technology. Today, less than half as many families are newly structurally characterized every month as during the heyday of Structural Genomics, between 2003 and 2007. Because the rate of sequencing has outpaced the rate of structural characterization of families, the fraction of large protein families with a known structure peaked 7 years ago, and is 10% lower today than it was at its peak. This makes curation of protein structure classification databases easier, but interpretation of sequence variation is more challenging than would otherwise be the case.
TOP

Presenting author: Argyris Politis, King's College London, United Kingdom
Date:Tuesday, July 14 2:00 pm - 2:20 pmRoom: The Liffey B

Area Session Chair: Donna Slonim

Presentation Overview:
We present an integrated mass spectrometry (MS)-computational method for modelling the structure and dynamics of large protein assemblies. This method computationally integrates orthogonal data sets derived from native MS, ion mobility MS and labelling MS experiment with different levels of resolution and information content. We assessed the method on its ability to reproduce the native structures in a set of five benchmark complexes with varying levels of MS-derived data. Then we applied the method to characterizing the 3D architecture of the yeast eukaryotic initiation factor eIF3 in complex with eIF5.
TOP

Presenting author: Nicholas Furnham, London School of Hygiene and Tropical Medicine, United Kingdom
Date:Tuesday, July 14 4:10 pm - 4:30 pmRoom: The Liffey B

Additional authors:
Nidhi Tyagi, European Molecular Biology Laboratory, United Kingdom
Edward Farnell, University of Cambridge, United Kingdom
Colin Fitzsimmons, University of Cambridge, United Kingdom
Stephanie Ryan, University of Edinburgh, United Kingdom
Rick Maizels, University of Edinburgh, United Kingdom
David Dunne, University of Cambridge, United Kingdom
Janet Thornton, European Molecular Biology Laboratory, United Kingdom
Nicholas Furnham, London School of Hygiene & Tropical Medicine, United Kingdom

Area Session Chair: Donna Slonim

Presentation Overview:
Allergic reactions are observed to be very similar to those implicated in the acquisition of an important degree of immunity against metazoan parasites, eliciting a similar immunoglobulin E (IgE) immune response. Based on the hypothesis that IgE-mediated immune responses evolved to provide extra protection against metazoan parasites rather than to cause allergy, we predict that environmental allergens will share key molecular properties with metazoan parasite antigens that are specifically targeted by IgE. Using large scale computational studies, we have established molecular similarity between parasite proteins and allergens and are able to predict the regions of parasite proteins that potentially share similarity with the IgE-binding region(s) of allergens. Nearly half of 2445 parasite proteins that show significant similarity with allergenic proteins fall within the 10 most abundant allergenic protein domain families. Our experimental studies support the predictions, and we present the first confirmed example of a plant pollen-like protein that is the commonest allergen in pollen in a worm and confirming it is targeted by IgE in those exposed to infection in a schistosomiasis endemic area of Uganda. The identification of such similarities explains the ‘off-target’ effects of the IgE-mediated immune system in allergy.
TOP

Systems
Presenting author: Alex Cornish, Imperial College London, United Kingdom
Date:Sunday, July 12 2:40 pm - 3:00 pmRoom: Liffey Hall 2

Additional authors:
Ioannis Filippis, Imperial College London, United Kingdom
Alessia David, Imperial College London, United Kingdom
Michael Sternberg, Imperial College London, United Kingdom

Area Session Chair: Nicolas Le Novere

Presentation Overview:
While the majority of diseases are manifested within a specific anatomical structure, known disease-associated alleles are often inherited and therefore present throughout the body. Understanding how these ubiquitous alleles produce localized disease is key to understanding the mechanisms that drive disease. We have developed a novel approach, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes on cell type-specific interactomes to identify the cell types most likely to be affected by the alleles. Cell type-specific interactomes were created through the integration of protein-protein interaction (PPI) data and cell type-specific expression data from the FANTOM5 project. We conducted text-mining of the PubMed database to produce an independent map of disease-associated cell types, which we used to validate our method. Our method identifies previously-suggested associations, along with associations that warrant further study. These include mast cells and multiple sclerosis (MS); a population of cells that is currently being targeted in an MS phase 2 clinical trial. Furthermore, we used the associations identified by our method to construct a pathogenic cell type-based diseasome, offering insight into diseases linked by common etiology. The dataset produced represents the first large-scale mapping of diseases to their pathogenic cell types. Overall, we demonstrate that the GSC method links disease-associated genes to the phenotypes they produce; one of the key goals of systems biology.
TOP

Presenting author: Antti Honkela, University of Helsinki, Finland
Date:Monday, July 13 2:20 pm - 2:40 pmRoom: The Liffey B

Additional authors:
Antti Honkela, University of Helsinki, Finland
Jaakko Peltonen, Aalto University, Finland
Hande Topa, Aalto University, Finland
Iryna Charapitsa, Institute for Molecular Biology Mainz, Germany
Filomena Matarese, Radboud University Nijmegen, Netherlands
Korbinian Grote, Genomatix Software GmbH, Germany
Hendrik G. Stunnenberg, Radboud University Nijmegen, Netherlands
George Reid, Institute for Molecular Biology Mainz, Germany
Neil D. Lawrence, University of Sheffield, United Kingdom
Magnus Rattray, University of Manchester, United Kingdom

Area Session Chair: Russell Schwartz

Presentation Overview:
Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles due to differences in transcription time, degradation rate and RNA processing kinetics. Recent studies have shown that a splicing-associated RNA processing delay can be significant. We introduce a joint model of transcriptional activation and mRNA accumulation which can be used for inference of transcription rate, RNA processing delay and degradation rate given genome-wide data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a non-parametric statistical modelling approach which allows us to capture a broad range of activation kinetics, and use Bayesian parameter estimation to quantify the uncertainty in the estimates of the kinetic parameters.

We apply the model to data from estrogen receptor (ER-α) activation in the MCF-7 breast cancer cell line. We use RNA polymerase II (pol-II) ChIP-Seq time course data to characterise transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 minutes between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated processing delays in many genes.
TOP

Presenting author: Mario Flores, University of Texas at San Antonio, United States
Date:Monday, July 13 3:30 pm - 3:50 pmRoom: The Liffey B

Area Session Chair: Russell Schwartz

Presentation Overview:
Postranscriptional regulation of gene expression can be modeled as a competitive endogenous RNA (ceRNA) network in which mRNAs compete for miRs binding. Previous research shows that this competition maintains and fine-tune levels of protein coding genes and the disruption of the network contributes to phenotypic conditions like cancer. Based on our previous studies we provided a tool (TraceRNA) for reconstruction of ceRNA networks around a gene of interest (GoI). The approach used in TraceRNA although practical and useful for gene-based studies provides only a partial landscape of the ceRNA mechanisms and phenotypes. Besides TraceRNA offers an ad-hoc approach for the study of the ceRNA phenomenon. In this work we present a formal genome-wide approach for ceRNA networks study. This novel and formal treatment of the ceRNA phenomenon provides new perspectives in the study of ceRNA networks and its specific phenotype. We divide the study of genome-wide ceRNA networks in three main sections: network construction, analysis of network components by network perturbation and network stability.
TOP

Presenting author: Brandon Malone, Max Planck Institute for Biology of Ageing, Germany
Date:Monday, July 13 4:10 pm - 4:30 pmRoom: The Liffey B

Additional authors:
Brandon Malone, Max Planck Institute for Biology of Ageing, Germany
Florian Aeschimann, Friedrich Miescher Institute for Biomedical Research, Switzerland
Jieyi Xiong, Max Planck Institute for Biology of Ageing, Germany
Helge Grosshans, Friedrich Miescher Institute for Biomedical Research, Switzerland
Christoph Dieterich, Max Planck Institute for Biology of Ageing, Germany

Area Session Chair: Russell Schwartz

Presentation Overview:
Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new upstream open reading frames, alternative start codons and new isoforms. Furthermore, this data allows the study of translational dynamics, such as decoding speed and ribosome pausing. Despite the wealth of information offered by ribo-seq, current analysis techniques have focused on coarse, gene-level statistics. In this work, we propose a hidden Markov model (HMM) approach to predict, at base-pair resolution, ribosome occupancy and translation. We use state-of-the-art learning algorithms to fit the parameters of our model, which correspond to biologically meaningful quantities, such as expected ribosome occupancy. Furthermore, we extend the model with Bayesian hyperparameters to quantify the uncertainty of the learned parameters. Preliminary evaluation shows that the HMM achieves a much higher true positive rate, and overall higher AUC, in identifying proteomics-verified coding regions compared to using the raw profile.
TOP

Presenting author: Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France
Date:Tuesday, July 14 3:50 pm - 4:10 pmRoom: The Auditorium

Additional authors:
Inna Kuperstein, Institut Curie –U900 INSERM - Mines ParisTech, France
Eric Bonnet, Institut Curie –U900 INSERM - Mines ParisTech, France
Eric Viara, Institut Curie –U900 INSERM - Mines ParisTech, France
Maia Chanrion, Institut Curie –U900 INSERM - Mines ParisTech, France
Hien-Anh Nguyen, Institut Curie –U900 INSERM - Mines ParisTech, France
David Cohen, Institut Curie –U900 INSERM - Mines ParisTech, France
Laurence Calzone, Institut Curie –U900 INSERM - Mines ParisTech, France
luca Grieco, Institut Curie –U900 INSERM - Mines ParisTech, France
Christophe Russo, Institut Curie –U900 INSERM - Mines ParisTech, France
Maria Kondratova, Institut Curie –U900 INSERM - Mines ParisTech, France
Marie Dutreix, Institut Curie –U900 INSERM - Mines ParisTech, France
Sylvie Robine, Institut Curie –U900 INSERM - Mines ParisTech, France
Emmanuel Barillot, Institut Curie –U900 INSERM - Mines ParisTech, France
Andrei Zinovyev, Institut Curie –U900 INSERM - Mines ParisTech, France

Area Session Chair: Natasa Przulj

Presentation Overview:
The successful application of bioinformatics and systems biology methods for analysis of high-throughput data in cancer research depends on availability of global and detailed reconstructions of signaling networks amenable for computational analysis. The Atlas of Cancer Signaling Network (ACSN) is an interactive and comprehensive map of molecular mechanisms implicated in cancer that includes tools for map navigation, visualization and analysis of molecular data in the context of signaling network maps. Constructing and updating ACSN involves manual literature curation and participation of experts in the corresponding fields. The cancer-oriented content of ACSN is original and covers major mechanisms involved in cancer progression. Cell signaling mechanisms are depicted in details, together creating a seamless ‘geographic-like’ map of molecular interactions frequently deregulated in cancer. The map is browsable using NaviCell web interface using the Google Maps engine and semantic zooming principle. The associated web-blog provides a forum for commenting and curating the ACSN content. ACSN allows uploading heterogeneous omics data from users on top of the maps for visualization and performing functional analyses. We suggest several scenarios for ACSN application in cancer research for visualizing high-throughput data. In addition, we show a study on drug sensitivity prediction using the ACSN. Finally, we describe how epithelial to mesenchymal transition (EMT) signaling network from the ACSN collection has been used for finding metastasis inducers in colon cancer through network analysis. ACSN may support data analysis and interpretation; patient stratification; prediction of treatment response and resistance to cancer drugs and design of novel treatment strategies.
TOP

Presenting author: Sarah-Jane Schramm, The University of Sydney, Australia
Date:Monday, July 13 11:40 am - 12:00 pmRoom: The Liffey B

Additional authors:
Shila Ghazanfar, The University of Sydney, Australia
Sarah-Jane Schramm, The University of Sydney, Australia
John T. Ormerod, The University of Sydney, Australia
Graham J. Mann, The University of Sydney, Australia
Jean Yee Hwa Yang, The University of Sydney, Australia

Area Session Chair: Hidde de Jong

Presentation Overview:
A long standing goal in cancer research is to describe the landscape of mutations responsible for neoplastic development and progression. Improved understanding of how gene and protein networks function in cancers would lead to identification of potential therapeutic targets, paving the way for advances in disease management at the level of individual patients. Using melanoma as a model disease, we recently found that differences in the coordination of gene co-expression among protein-protein interaction (PPI) networks were significantly associated (p<0.05) with patient survival. Moreover, these survival-related networks showed significant increases in the number of functional mutations present, relative to networks without such gene co-expression disruption. These findings suggest that increased functional mutation burden may be a pathogenic mechanism behind the differential network behavior observed. If true, these mutations would form a selectable basis of accumulation of disturbances during tumorigenesis, and be important drivers of disease progression/clinical outcome. Extending these analyses, we have recently shown in unpublished work that our original findings are reproducible in other cancers including lung squamous cell carcinoma (p<0.02), and serous ovarian cancer (p<0.05). Subsequent literature-based analysis reveals these survival-related networks are highly relevant to biology underlying tumor behaviour. These findings may guide the identification of therapeutically targetable mutations, including outside the exome.
TOP