HOME

Tweets by @ISMBinfo

Proceedings Track Presentations

As of May 13, 2014 (schedule subject to change)

Attention Conference Presenters - please review the Speaker Information Page available here.

All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.
A full schedule of Paper Presentations can be found here.

Presenting Authors are are shown in bold:

LBR01 Transcriptome analysis reveals thousands of targets of nonsense-mediated mRNA decay that offer clues to the mechanism in different species

Room: 302
Date: Sunday, July 13, 10:30 a.m. - 10:55 a.m.

Author(s):
Steven Brenner, University of California, Berkeley, United States
Courtney E. French, University of California, Berkeley, United States
Gang Wei, Fudan University, China
Angela Brooks, Broad Institute of MIT AND Harvard, United States
Thomas Gallagher, Ohio State University, United States
Li Yang, Partner Institute FOR Computational Biology, China
Brenton Graveley, University of Connecticut Health Center, United States
Sharon Amacher, Ohio State University, United States
Steven Brenner, University of California, Berkeley, United States

Session Chair: Bonnie Berger

Abstract Show

Nonsense-mediated mRNA decay (NMD) is an RNA surveillance system that degrades isoforms containing a premature termination codon (PTC). NMD coupled with alternative splicing is a mechanism of post-transcriptional gene regulation. The canonical model of defining a PTC in mammals is the 50nt rule: a termination codon more than 50 nucleotides upstream of an exon-exon junction is premature and triggers degradation. There is evidence that this rule holds in Arabidopsis but not in other eukaryotes such as Drosophila. There is also evidence that a longer 3’ UTR triggers NMD in plants, flies, and mammals.

To survey the targets of NMD genome-wide in human, zebrafish, and fly, we performed RNA-Seq analysis on cells where NMD has been inhibited via knockdown of UPF1, a critical NMD protein. We found that thousands of genes produce alternative isoforms degraded by NMD in the three species. We found that the 50nt rule is a strong predictor of NMD degradation in human, and has an effect in zebrafish and in fly. In contrast, we found little correlation between the likelihood of degradation by NMD and 3' UTR length in any of the three species.

LBR02 Leveraging network structure to discover genetic interactions in genome-wide association studies

Room: 306
Date: Sunday, July 13, 10:30 a.m. - 10:55 a.m.

Author(s):
Wen Wang, University of Minnesota, United States
Gang Fang, Icahn School of Medicine at Mount Sinai, United States
Vanja Paunic, University of Minnesota, United States
Xiaoye Liu, University of Minnesota, United States
Benjamin Oatley, University of Minnesota, United States
Majda Haznadar, University of Minnesota, United States
Michael Steinbach, University of Minnesota, United States
Brian Van Ness, University of Minnesota, United States
Nathan Pankratz, University of Minnesota, United States
Vipin Kumar, University of Minnesota, United States
Chad Myers, University of Minnesota, United States

Session Chair: Jason Ernst

Abstract Show

Genetic interactions (epistasis) are important factors in complex diseases that may contribute to unexplained heritability in genome-wide association studies (GWAS). However, existing methods for identifying genetic interactions, which mainly focus on testing individual locus pairs, lack statistical power. We proposed a novel computational approach for discovering disease-specific, pathway-pathway genetic interactions from GWAS data. The key motivation, derived from the extensive analysis of genetic interaction networks in yeast, is that genetic interactions tend to occur between functionally compensatory modules rather than between isolated pairs of genes. We developed a method that explicitly searches for such large structures, guided by established sets of genes belonging to characterized pathways. We applied this approach to a Parkinson's disease (PD) GWAS study and found 50 statistically significant (FDR ?0.25) pathway level interactions, suggesting large genetic interaction structures indeed exist and can be discovered by leveraging structural properties with prior information on pathways. Interestingly, many of the discovered interactions are associated with reduced disease risk while a substantially smaller number are associated with increased disease risk. A significant fraction of them are validated in two independent cohorts. Our study highlights specific insights derived from analysis of the PD interactions and, more broadly, provides a general framework for systematic detection of genetic interactions from GWAS studies.

LBR03 Heterogeneous Network Link Prediction Prioritizes Disease-Associated Genes.

Room: 306
Date: Sunday, July 13, 11:00 a.m. - 11:25 a.m.

Author(s):
Daniel Himmelstein, University of California, San Francisco, United States
Sergio Baranzini, University of California, United States

Session Chair: Jason Ernst

Abstract Show

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important tasks will be translating this information into a multiscale understanding of pathogenic variants, and increasing the power of existing and future studies through prioritization. We show that heterogeneous network link prediction accomplishes both these tasks. First we constructed a network with 22 node types and 24 edge types from high-throughput publicly-available resources. From this network we extracted features describing the topology between specific genes and diseases. Using a machine learning approach that relies on GWAS-discovered associations for positives, we predicted the probability of association between each protein-coding gene and each of 23 diseases. These predictions achieved a testing AUROC of 0.845 and a 200-fold enrichment in precision at 10% recall. We compared the informativeness of each included network component. The full model outperformed any individual domain, highlighting the benefit of integrative approaches. For multiple sclerosis (MS), we predicted 5 novel susceptibility genes, 4 of which (JAK2, TNFAIP3, REL, RUNX3) achieved Bonferroni validation on a 9,772-case GWAS masked from our analysis. Regions containing two of these genes were uncovered in a recent MS ImmunoChip-based study highlighting our ability to identify the causal gene within a locus.

LBR04 Genome annotation of multiple cell types and chromatin architecture using graph-based regularization

Room: 306
Date: Sunday, July 13, 3:05 p.m. - 3:30 p.m.

Author(s):
Maxwell Libbrecht, University of Washington, United States
Maxwell Libbrecht, University of Washington, United States
Michael Hoffman, Princess Margaret Cancer Centre, United States
Ferhat Ay, University of Washington, United States
David Gilbert, Florida State University, United States
Jeffrey Bilmes, University of Washington, United States
William Noble, University of Washington, United States

Session Chair: Dana Pe'er

Abstract Show

Semi-automated genome annotation algorithms are widely used to summarize functional genomics data (such as ChIP-seq) into human-interpretable form. We present a single solution to two seemingly quite different problems that existing algorithms fail to address: (1) performing genome annotation in multiple cell types and (2) integrating 3D genome architecture data into the annotation. Our solution uses an analytic framework based on the idea of a pairwise prior, which states that we have a prior belief that certain pairs of genomic positions should be more likely to receive the same label in our annotation. We developed a novel convex optimization method, called graph-based regularization (GBR) which admits efficient inference in the presence of a pairwise prior. We applied GBR in both settings mentioned above and, by comparing our annotations to functional genomics experiments not used in training, we demonstrated that GBR improves the quality of the resulting annotations in both cases

LBR05 Deconvolution of massively-parallel reporter assays tiling 15,000 human regulatory regions reveal activating and repressive regulatory sites at nucleotide-level resolution.

Room: 306
Date: Sunday, July 13, 3:35 p.m. - 4:00 p.m.

Author(s):
Jason Ernst, UCLA, United States
Tarjei Mikkelsen, Broad Institute, United States
Manolis Kellis, Massachusetts Institute of Technology, United States

Session Chair: Dana Pe'er

Abstract Show

Massively parallel reporter assay designs have been demonstrated that test a large number of regulatory elements or discover specific activating and repressive bases for a small number of regulatory elements, but effectively doing both simultaneously has been a limitation. Here, we overcome this limitation, and present a new Bayesian tiling deconvolution approach, which combines experimental tiling of regulatory regions using 31 sequences of length 145bp at 5bp intervals with computational deconvolution of the resulting signal to infer a nucleotide-level view of regulatory activity across thousands of regulatory regions. By exploiting the multiple overlapping sequences in a probabilistic framework, our method is also robust to noisy or missing measurements, and enables high resolution inferences with a very small number of tested sequences per target region. This enables the de novo discovery of individual binding sites, and inference of their activating or repressive action in a single experiment across thousands of candidate regions. We apply this method in two cell types to more than 15,000 regions in the human genome selected based on chromatin data to provide the first nucleotide-level view of activating and repressive sites across a sizeable fraction of the regulatory human genome.

LBR06 Linking tumor mutations to drug responses via a quantitative chemical-genetic interaction map

Room: 306
Date: Sunday, July 13, 4:05 p.m. - 4:30 p.m.

Author(s):
Sourav Bandyopadhyay, University of California, San Francisco, United States

Session Chair: Dana Pe'er

Abstract Show

There is an urgent need in oncology to link molecular aberrations in tumors with therapeutics that can be administered in a personalized fashion. One approach identifies synthetic-lethal genetic interactions or emergent dependencies that cancer cells acquire in the presence of specific mutations. Using engineered isogenic cells, we generated an unbiased, quantitative chemical-genetic interaction map that measures the influence of 51 aberrant cancer genes on 90 drug responses. The dataset strongly predicts drug responses found from profiling cancer cell lines, indicating that it accurately models more complex cellular contexts. Applied to triple-negative breast cancer, we interrogate several clinically actionable synthetic lethal interactions with the MYC oncogene, providing new drug and biomarker pairs for clinical investigation. This scalable approach enables the prediction of drug responses from patient data and can be used to accelerate the development of new genotype-directed therapies.

LBR07 Linking Signaling Pathways to Transcriptional Programs in Breast Cancer

Room: 306
Date: Monday, July 14, 2:10 p.m. - 2:35 p.m.

Author(s):
Hatice Osmanbeyoglu, MSKCC, United States
Hatice Ulku Osmanbeyoglu, MSKCC, United States
Raphael Pelossof, MSKCC, United States
Jacqueline F. Bromberg, MSKCC, United States
Christina Leslie, MSKCC, United States

Session Chair: Sourav Bandyopadhyay

Abstract Show

Cancer cells acquire genetic and epigenetic alterations that often lead to dysregulation of oncogenic signal transduction pathways, which in turn alters downstream transcriptional programs. Numerous methods attempt to deduce aberrant signaling pathways in tumors from mRNA data alone, but these pathway analysis approaches remain qualitative and imprecise. Here, we present a statistical method to link upstream signaling to downstream transcriptional response by exploiting reverse phase protein arrays and mRNA expression arrays in The Cancer Genome Atlas breast cancer project. Formally, we use an algorithm called affinity regression to learn an interaction matrix between upstream signal transduction proteins and downstream transcription factors (TFs) that explains target gene expression. The trained model can then predict the TF activity given a tumor sample’s protein expression profile or infer the signaling protein activity given a tumor sample’s gene expression profile. Breast cancers are comprised of molecularly distinct subtypes that respond differently to pathway-targeted therapies. We trained our model on the breast cancer data set and identified subtype-specific and common TF regulators of gene expression. Finally, inferred protein activity predicted clinical outcome within the METABRIC Luminal A cohort, identifying high- and low-risk patient groups within this heterogeneous subtype.

LBR08 Beyond Argonaute: understanding microRNA dysregulation in cancer and its effect on protein interaction and transcriptional regulatory networks

Room: 306
Date: Monday, July 14, 2:40 p.m. - 3:05 p.m.

Author(s):
Sara Gosline, MIT, United States
Sara Gosline, Massachusetts Institute of Technology, United States
Coyin Oh, Massachusetts Institute of Technology, United States
Ernest Fraenkel, Massachusetts Institute of Technology, United States

Session Chair: Sourav Bandyopadhyay

Abstract Show

microRNAs (miRNAs) cause changes in gene expression through repression of target mRNA and are highly dysregulated in cancer. However, many effects of mIRNA changes cannot be attributed to direct miRNA-mRNA interactions. As such, we propose an integrative approach that characterizes the effect miRNAs can have on protein-protein interaction networks with the hopes of identifying proteins and pathways that correlate with patient prognosis.

LBR09 Extensive trans and cis-QTLs revealed by large scale cancer genome analysis

Room: 306
Date: Monday, July 14, 3:10 p.m. - 3:35 p.m.

Author(s):
Kjong-Van Lehmann, Memorial Sloan-Kettering Cancer Hospital, United States
Andre Kahles, Memorial Sloan Kettering Cancer Center, United States
Cyriac Kandoth, Memorial Sloan Kettering Cancer Center, United States
William Lee, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Robert Klein, Memorial Sloan Kettering Cancer Center, United States
Oliver Stegle, EBI, United Kingdom
Gunnar Rätsch, Memorial Sloan-Kettering Cancer Center, United States

Session Chair: Sourav Bandyopadhyay

Abstract Show

While population structure can be one of the most severe confounding factors in QTL analysis, tumor samples open up many new additional challenges. Tumor specific somatic mutations and recurrence patterns are known to explain large amounts of the observed transcriptome variation and sample heterogeneity can lead to spurious associations. We have developed a new strategy to perform a common variant association study (CVAS) using mixed models on tumor samples, which enables us to account for tumor specific genotypic and phenotypic heterogeneity as well as population structure. We apply this strategy to investigate the relationship between germline and somatic variants as well as splicing patterns and expression changes in order to discover determinants of transcriptome variation. Due to sample size constraints, many QTL studies have been limited to the analysis of cis-associated variants. We use whole genome, exome and RNA-seq data from the TCGA project to overcome this limitation and discover trans-associated variants as well. A rare variant association study (RVAS) using variants from whole genome and exome sequencing data is being utilized to investigate the basis of rare mutations.

LBR10 Utilizing a Phylogeographic Generalized Linear Model for Identifying Predictors Driving H5N1 Diffusion within Egypt

Room: 306
Date: Monday, July 14, 3:40 p.m. - 4:05 p.m.

Author(s):
Matthew Scotch, Arizona State University, United States
Matthew Scotch, Arizona State University, United States
Daniel Magee, Arizona State University, United States
Rachel Beard, Arizona State University, United States

Session Chair: Sourav Bandyopadhyay

Abstract Show

Egypt has become an epicenter of highly pathogenic avian influenza H5N1 influenza transmission. Like many viruses, the diffusion of H5N1 is a highly complicated process that depends on a large number of factors, most of which are poorly understood. We adopted a Bayesian phylogeographic GLM as developed by Lemey et al. in which viral diffusion patterns are reconstructed while predictors are simultaneously assessed.

LBR11 Clonality Inference in Multiple Tumor Samples Using Phylogeny

Room: 306
Date: Tuesday, July 15, 10:30 a.m. - 10:55 a.m.

Author(s):
Nilgun Donmez, Simon Fraser University, Canada
Salem Malikic, Simon Fraser University, Canada
Andrew McPherson, British Columbia Cancer Agency, Canada
Nilgun Donmez, Vancouver Prostate Centre, Canada
Cenk Sahinalp, Indiana University, United States

Session Chair: Teresa Przytycka

Abstract Show

Most human tumors exhibit a large degree of heterogeneity that is not only apparent in histology but also presents itself in various features such as genomic copy number alterations and structural rearrangements as well as other aberrations. While the origins of the intra-tumor heterogeneity are still debated, research suggests that this diversity is likely to have clinical implications and may be linked to metastatic potential and drug response.

Although the multi-clonal nature is virtually common to most tumor samples, determining the clonal subpopulations is a challenging process. Currently, single-cell sequencing has a prohibitive cost in the scales that would be necessary to representatively sample a tumor tissue. Furthermore, methods such as Fluorescence in Situ Hybridization (FISH) or Silver in Situ Hybridization (SISH) can only assess a small number of probes in individual cells of a tumor sample.

In silico separation of the clonal subpopulations may provide a viable alternative to these aforementioned methods. Despite the importance of clonal diversity and its clinical implications, relatively few computational methods have been developed to date.

To address the problem of accurately determining subclonal frequencies in tumors as well as their evolutionary history, we have developed a novel combinatorial algorithm, named CITUP (Clonality Inference in Tumors Using Phylogeny), that determines subclonal frequencies in tumors as well as their evolutionary history. CITUP has the ability to exploit multiple samples from the same patient to achieve more accurate estimates and works on a variety of point mutations such as small indels and single nucleotide variants, as well as structural alterations. Through an efficient and robust multi-dimensional clustering approach, our method can handle a large number of mutations per patient. In addition to its exact Quadratic Integer Programming (QIP) formulation, CITUP also employs an approximate iterative module which achieves comparable accuracy to the QIP module for faster solutions.

Using extensive simulations where we experiment with a variety of phylogenetic trees with differing number of subclones and model parameters, we evaluated the performance of CITUP and compared it to the performance of other state-of-the-art tools. In these simulations, we used a comprehensive set of evaluation measures ranging from the ability to infer the correct evolutionary trajectory of the tumor to identifying mutational profile and relative abundance of the subclones. These measures show that CITUP consistently outperforms the other tools in estimating the subclonal frequencies and inferring phylogenetic relationships.

LBR12 Expansion of biological pathways based on evolutionary inference

Room: 306
Date: Tuesday, July 15, 11:00 a.m. - 11:25 a.m.

Author(s):
Sarah Calvo, Broad Institute, United States
Yang Li, Harvard University, United States
Roee Gutman, Brown University, United States
Jun Liu, Harvard University, United States
Vamsi Mootha, HHMI and Massachusetts General Hospital, United States

Session Chair: Teresa Przytycka

Abstract Show

One approach to predict gene function is to identify modules of genes that have been lost together multiple times across evolution. We developed CLIME, a principled “phylogenetic profiling” algorithm that clusters an input gene-set into modules based on shared evolutionary history, and then expands each module with additional genes that likely arose under the inferred model of evolution. CLIME models evolution of the input gene set using a Bayesian mixture of tree-based hidden Markov models (simultaneously learning module number and membership via Markov Chain Monte Carlo sampling for Dirichlet process mixture models). Using data from 138 diverse eukaryotic species, we applied CLIME to 1000 human pathways/complexes as well as to the entire genomes of three model organisms (yeast, malaria parasite, and red alga). These analyses revealed unexpected evolutionary modularity even in well-studied pathways and many novel, co-evolving components.

LBR13 Accurate prediction of mitochondrial presequences and their cleavage sites with MitoFates identifies hundreds of novel human mitochondrial protein candidates

Room: 306
Date: Tuesday, July 15, 11:30 a.m. - 11:55 p.m.

Author(s):
Yoshinori Fukasawa, University of Tokyo, Japan
Yoshinori Fukasawa, University of Tokyo, Japan
Kenichiro Imai, The National Institute of Advanced Industrial Science and Technology, Japan
Junko Tsuji, University of Tokyo, Japan
Szu-Chin Fu, University of Tokyo, Japan
Kentaro Tomii, The National Institute of Advanced Industrial Science and Technology, Japan
Paul Horton, The National Institute of Advanced Industrial Science and Technology, Japan

Session Chair: Teresa Przytycka

Abstract Show

Mitochondria provide numerous essential functions for cells, and their dysfunction causes diseases such as neurodegenerative diseases. Thus obtaining a complete mitochondrial proteome should be a crucial step towards understand the roles of mitochondria. Many mitochondrial proteins have been identified but a complete list is not available. Unfortunately, the accuracy of existing predictors is far from perfect and has not improved significantly for a decade!
Here, we report MitoFates, a predictor to accelerate the discovery of mitochondrial proteins. In developing MitoFates we introduced novel presequence features: a modified hydrophobic moment, novel motifs and refined PWM for the cleavage site. We combined those with classical features and presented them to an SVM.
According to our benchmarks on a non-redundant test set of proteins, MitoFates achieves significantly higher performance than the well known predictors TargetP, Predotar and MitoProtII.
To investigate the utility of MitoFates, we looked for undiscovered mitochondrial proteins from the human proteome. MitoFates predicts 1231 genes, and 633 of these were annotated as “mitochondria” in neither UniProt nor GO. Interestingly, these include candidate regulators of Parkin translocation to damaged mitochondria, a trigger of degradation of dysfunctional mitochondria. This suggests that careful investigation of other predictions will be helpful in elucidating the functions of mitochondria in health and disease.

LBR14 Stable identifiability of the human microbiome based on metagenomic hitting sets

Room: 306
Date: Tuesday, July 15, 12:00 p.m. - 12:25 p.m.

Author(s):
Eric Franzosa, Harvard School of Public Health, United States
Eric Franzosa, Harvard School of Public Health, United States
Katherine Huang, The Broad Institute, United States
James Meadow, University of Oregon, United States
Dirk Gevers, The Broad Institute, United States
Katherine Lemon, The Forsyth Insitute, United States
Brendan Bohannan, University of Oregon, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Session Chair: Morris Quaid

Abstract Show

Recent large-scale investigations of the human microbiome have revealed great variability in body site-specific microbial community structure across healthy individuals. However, it remains unknown if this variability is sufficient to uniquely identify individuals within a large population, or if it is sufficiently stable to continue uniquely identifying individuals at later times. We investigated these questions by developing a hitting set-based coding algorithm and applying it to individuals from the Human Microbiome Project cohort. Specifically, our approach defined metagenomic fingerprints: sets of microbial taxa or genes that distinguished individuals from a background population, with features prioritized based on predicted stability. Fingerprints based on clade-specific marker genes were able to distinguish almost all individuals. However, at most body sites, these fingerprints uniquely identified their owners in only ~30% of cases when re-assessed after a period of 30-300 days (due to microbial strain loss). The gut microbiome was an exception, as over 80% of its marker gene-based fingerprints remained stable and unique at later times. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work places an upper bound on the identifiability of human-associated microbial communities over mid-to-long time scales, a result with important ethical implications for future microbiome study design.

LBR15 Novel Computational Approach for Integration of Omics-platforms with Application to Hypertension in Recombinant Rat Strains

Room: 306
Date: Tuesday, July 15, 2:00 p.m. - 2:25 p.m.

Author(s):
Stefka Tyanova, Max Planck Institute of Biochemistry, Germany
Kathrine Sylvestersen, The Novo Nordisk Foundation Center for Protein Research, Proteomics, Denmark
Matthias Mann, Max Planck Institute of Biochemistry, Germany
Michael Lund Nielsen, The Novo Nordisk Foundation Center for Protein Research, Proteomics, Denmark
Juergen Cox, Max Planck Institute of Biochemistry, Germany

Session Chair: Morris Quaid

Abstract Show

We propose a novel computational approach for efficient integration of genomic, transcriptomic and proteomic data. We investigate the disease phenotypes characterizing the set of recombinant rat strains HXB/BXH, which is of large relevance to metabolic and cardiovascular diseases. We employ proteomic and transcriptomic quantitative measurements of the founding and the recombinant strains in combination with a genetic markers map of the recombinant strains. First, the molecular feature spaces at the proteome and transcriptome levels are orthogonally transformed and components accounting for the variability explaining the phenotype of interest are extracted. This defines a quantitative measure of the disease phenotype along the recombinant strains. To incorporate genetic information, the map of alleles from the recombinant strains is transformed to a numeric matrix assigning 1 if the recombinant region comes from the diseased founding strain and 0 otherwise. Support Vector Machine Regression (SVR) is then used to build a model that can correctly assign the phenotypic association of the strains based on their genetic characteristics. To identify disease-related genetic loci, we tested different feature selection strategies based on mutual information and SVR and measured their performance for various combinations of features. We identified a small number of genetic markers that are strongly associated with the disease phenotypes.

LBR17 Utilizing Docking Score Distributions to Identify Novel Protein-Drug Interactions

Room: 306
Date: Tuesday, July 15, 3:00 p.m. - 3:25 p.m.

Author(s):
Ariel Feiglin, Bar-Ilan University, Israel
Ariel Feiglin, Bar-Ilan University, Israel
Olga Leiderman, Bar-Ilan University, Israel
Ron Unger, Bar-Ilan University, Israel
Yanay Ofran, Bar-Ilan University, Israel

Session Chair: Morris Quaid

Abstract Show

Predicting whether a given protein and drug interact, is an important yet greatly unresolved goal. We introduce a fast and computationally inexpensive approach for determining whether proteins and drugs bind each other. This is accomplished by training a machine learning algorithm to differentiate between docking results of real protein-drug pairs and docking results of pairs that do not interact. The features used for training include structural and biophysical features of specific poses. However, the “secret ingredient”, is the use of features derived from the distribution of the docking scores across all proposed binding modes for a given protein-drug pair. We used this approach to identify real protein-drug interactions from a pool of 488 real complexes and 194,770 presumably false ones with precision of 0.6 (i.e. 60% of the predicted interactions were true) at a recall of 0.2. This is >500 fold better than random and >30 fold better than the precision that would be obtained by using only the docking score of the best pose. Applying this method to a large dataset of proteins and FDA approved drugs, we identified novel protein-drug interactions and validated them experimentally. We also show that our predicted interactions are significantly enriched in a large dataset of known protein-drug interactions.

LBR18 Interactomics: Computational Analysis of Novel Drug Opportunities

Room: 306
Date: Tuesday, July 15, 3:30 p.m. - 3:55 p.m.

Author(s):
GAURAV CHOPRA, University of California, San Francisco, United States
Ram Samudrala, University of Washington, United States

Session Chair: Morris Quaid

Abstract Show

We have developed a Computational Analysis of Novel Drug Opportunities (CANDO) platform funded by a 2010 NIH Director’s Pioneer Award that analyses compound-proteome interaction signatures to determine drug behaviour, in contrast to traditional single (or few) target approaches. The platform uses similarity of interaction signatures across all proteins as indicative of similar functional behaviour and nonsimilar signatures for off- and anti-target (side) effects, in effect inferring homology of compound/drug behaviour at a proteomic level. This results in an interaction matrix between 3,733 human FDA approved drugs and supplements × 48,278 proteins using our hierarchical chem- and bio-informatic fragment-based docking with dynamics protocol (>500 million predicted interactions total) with a benchmarking success for over 650 indications/diseases. Using signatures we ranked compounds for existing indications and prospectively validate “high value” predictions in vitro, in vivo, and by clinical studies for more than twenty indications , including dental caries, dengue, tuberculosis, ovarian cancer, cholangiocarinomas, among many others, and 49/82 validations done thusfar show better activity to an existing drug or micromolar inhibition at the cellular level that may serve as novel repurposeable therapies . Our approach is applicable to any compound, includes models to enable personalization foreshadowing a new era of faster, safer, better and cheaper drug discovery.

PP01 Simultaneous Identification of Multiple Driver Pathways in Cancer

Room: 311
Date: Sunday, July 13, 10:30 am - 10:55 am

Author(s):
Mark Leiserson, Brown University, United States
Dima Blokh, Tel-Aviv University, Israel
Roded Sharan, Tel-Aviv University, Israel
Benjamin Raphael, Brown University, United States

Session Chair: Terry Gaasterland

Abstract Show

An important challenge in cancer genome sequencing is to distinguish the small subset of somatic driver mutations that cause cancer from the multitude of random passenger mutations in a tumor. Since patients with the same cancer type typically have different collections of mutations, single-gene tests of recurrence are insufficient for this task. We present Multi-Dendrix, an algorithm to identify combinations of mutations with combinatorial properties consistent with cancer pathways. Multi-Dendrix does not use prior knowledge of pathways, and finds multiple sets of mutations simultaneously since driver mutations target multiple pathways in a patient. We applied Multi-Dendrix to glioblastoma and breast cancer data from The Cancer Genome Atlas. In both cancers, Multi-Dendrix identified gene sets overlapping major signaling pathways -- including Rb, PI(3)K, and p53 -- that were manually annotated in the TCGA publications, as well as novel gene sets that include transcription factors and regulators.

PP02 Ragout - A reference-assisted assembly tool for bacterial genomes

Room: 304
Date: Sunday, July 13, 10:30 am - 10:55 am

Author(s):
Son Pham, University of California, San Diego, United States
Mikhail Kolmogorov, Saint-Petersburg Academic University, Russia
Benedict Paten, University of California, Santa Cruz, United States
Brian Raney, University of California, Santa Cruz, United States

Session Chair: Serafim Batzoglou

Abstract Show

Bacterial genomes are simpler than mammalian ones, and
yet assembling the former from the data currently generated by
high-throughput short read sequencing machines still results in
hundreds of contigs. To improve assembly quality, recent studies
have utilized longer Pacific Biosciences (PacBio) reads or jumping
libraries to connect contigs into larger scaffolds or help assemblers
resolve ambiguities in repetitive regions of the genome. However,
their popularity in contemporary genomic research is still limited
by high cost and error rates.

In this work, we explore the
possibility of improving assemblies by using complete genomes
from closely related species/strains. We present Ragout, a genome
rearrangement approach, to address this problem. In contrast with
most reference-guided algorithms, where only one reference genome
is used, Ragout uses multiple references along with the evolutionary
relationship among these references in order to determine the correct
order of the contigs. Additionally, Ragout uses the assembly graph
and multi-scale synteny blocks to reduce assembly gaps caused
by small contigs from the input assembly. In simulations as well
as real datasets, we believe that for common bacterial species,
where many complete genome sequences from related strains have
been available, the current high-throughput short read sequencing
paradigm is sufficient to obtain a single high-quality scaffold for
each chromosome. The Ragout software is freely available at:
https://github.com/fenderglass/Ragout.

PP03 Linking hypothetical patterns to disease molecular signatures in Alzheimer's disease

Room: 311
Date: Sunday, July 13, 11:00 a.m. - 11:25 a.m.

Author(s):
Ashutosh Malhotra, Fraunhofer institute for algorithms and scientific computing, Germany
Martin Hofmann-Apitius, Fraunhofer institute for algorithms and scientific computing, Germany
Erfan Younesi, Fraunhofer institute for algorithms and scientific computing, Germany

Session Chair: Terry Gaasterland

Abstract Show

Automated information extraction and knowledge acquisition technology (“text mining”) share the potential to possibly reduce manual reading and human curation efforts for the construction of knowledge bases. Particularly in reference to complex, mostly idiopathic diseases like Alzheimer’s disease (AD), automatic recognition of stage specific speculative statements communicating experimental finding can provide a new insights into the directions of disease etiology and progression. However, a systematic gathering of all scientific speculation that exists in a given context is a non-trivial task and, if done manually, is laborious and time-consuming.This work presents a methodology that demonstrates how using a dictionary of speculative patterns (HypothesisFinder approach) in combination with designed Alzheimer's disease ontology (ADO) enables the collection, interpretation, curation and discovery of a broad spectrum of knowledge needed for efficient and systematic AD research.

PP04 AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

Room: 304
Date: Sunday, July 13, 11:00 am - 11:25 am

Author(s):
Ergude Bao, University of California, Riverside, United States
Tao Jiang, University of California, Riverside, United States
Thomas Girke, University of California, Riverside, United States

Session Chair: Serafim Batzoglou

Abstract Show

De novo assemblies of genomes remain one of the most challenging applications in next generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species.

Here we introduce AlignGraph, an algorithm for extending and joining de novo assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and pre-assembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the paired-end multi-positional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references.

PP05 Cross-study validation for assessment of prediction models and algorithms

Room: 302
Date: Sunday, July 13, 11:00 am - 11:25 am

Author(s):
Christoph Bernau, Leibniz Supercomputing Center, Germany
Markus Riester, Harvard School of Public Health, United States
Anne-Laure Boulesteix, LMU Munich, Germany
Giovanni Parmigiani, Dana-Farber Cancer Institute, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Levi Waldron, City University of New York School of Public Health, United States
Lorenzo Trippa, Dana-Farber Cancer Institute,, United States

Session Chair: Bonnie Berger

Abstract Show

Motivation: Numerous competing algorithms for prediction modeling
in high-dimensional settings have been developed in the statistical
and machine learning literature. Learning algorithms and the
prediction models they generate are typically evaluated on the basis
of cross-validation error estimates in a few examplary datasets.
However, in most applications, the ultimate goal of prediction
modeling is to provide accurate predictions for independent samples
processed in different laboratories, and cross-validation within
examplary datasets may not adequately reflect performance in this
context.

Methods: Systematic cross-study validation is performed in
simulations and in a collection of eight estrogen-receptor positive
breast cancer microarray gene expression datasets, with the objective
of predicting distant metastasis-free survival (DMFS). An evaluation
statistic, in this paper the C-index, is computed for all pairwise
combinations of training and validation datasets. We evaluate several
alternatives for summarizing the pairwise validation statistics, and
compare these to conventional cross-validation.

Results: We develop a systematic approach to “cross-study
validation” to replace or supplement conventional cross-validation for
evaluation of high-dimensional prediction models when independent
datasets are available. In data-driven simulations and in our
application to survival prediction with eight breast cancer microarray
datasets, standard cross-validation suggests inflated discrimination
accuracy for all competing algorithms when compared to cross-study
validation. Furthermore, the ranking of learning algorithms differs,
suggesting that algorithms performing best in cross-validation may
be suboptimal when evaluated through independent validation.
Availability: The survHD: Survival in High Dimensions package
(http://www.bitbucket.org/lwaldron/survhd) will be made available
through Bioconductor.

PP06 Constructing module maps for integrated analysis of heterogeneous biological networks

Room: 311
Date: Sunday, July 13, 11:30 am - 11:55 pm

Author(s):
David Amar, Tel Aviv University, Israel

Session Chair: Terry Gaasterland

Abstract Show

We developed a method that takes as input two types of gene interactions and constructs a summary module map, which integrates the two sources. The presentation will start with a thorough introduction to the concept of module maps as tools for summarizing heterogeneous interaction networks. Then we will discuss extant and novel methods to construct such maps. We shall show that our novel algorithm considerably improves over prior art on simulated and real data. We shall demonstrate the method in analyses of data from three distinct domains: (1) yeast protein-protein interactions and negative genetic interactions, (2) protein-protein interactions and DNA damage-specific positive genetic interactions in yeast, and (3) gene expression profiles of lung cancer patients. Each analysis provides confirmatory and novel insights, and demonstrates the power of module maps for deeper analysis of heterogeneous high throughput data.

PP07 ExSPAnder: a Universal Repeat Resolver for DNA Fragment Assembly

Room: 304
Date: Sunday, July 13, 11:30 am - 11:55 pm

Author(s):
Andrey D. Prjibelski, St. Petersburg Academic University, Russian Federation
Irina Vasilinetc, St. Petersburg Academic University, Russia
Anton Bankevich, St. Petersburg Academic University, Russia
Alexey Gurevich, St. Petersburg Academic University, Russia
Tatiana Krivosheeva, St. Petersburg Academic University, Russia
Sergey Nurk, St. Petersburg Academic University, Russia
Son Pham, University of California, San Diego, United States
Anton Korobeynikov, St. Petersburg Academic University, Russia
Alla Lapidus, St. Petersburg Academic University, Russia
Pavel Pevzner, University of California, San Diego, United States

Session Chair: Serafim Batzoglou

Abstract Show

Next-generation sequencing (NGS) technologies have raised a challenging de novo genome assembly problem that is furtheramplified in recently emerged single-cell sequencing projects. Whilevarious NGS assemblers can utilize information from several libraries of read-pairs, most of them were originally developed for a singlelibrary and do not fully benefit from multiple libraries. Moreover, mostassemblers assume uniform read coverage, condition that does nothold for single-cell projects where utilization of read-pairs is evenmore challenging. We have developed an exSPAnder algorithm thataccurately resolves repeats in the case of both single and multiplelibraries of read-pairs in both standard and single-cell assemblyprojects.

PP08 Large scale analysis of signal reachability

Room: 311
Date: Sunday, July 13, 12:00 pm - 12:25 pm

Author(s):
Andrei Todor, University of Florida, United States
Haitham Gabr, University of Florida, United States
Alin Dobra, University of Florida, United States
Tamer Kahveci, University of Florida, United States

Session Chair: Terry Gaasterland

Abstract Show

Motivation: Major disorders, such as leukemia, have been shown to alter the
transcription of genes. Understanding how gene regulation is
affected by such aberrations is of utmost importance. One promising
strategy towards this objective is to compute whether signals can
reach to the transcription factors through the transcription
regulatory network. Due to the uncertainty of the regulatory
interactions, this is a #P-complete problem and thus solving it for
very large transcription regulatory networks remains to be a
challenge.

Results: We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can
arrive at any given set of target genes (i.e., transcription
factors) when the topology of the underlying signaling network is
uncertain. Our method tackles this problem for large networks while
providing a provably accurate result. Our method follows a
divide-and-conquer strategy. We break down the given network into a
sequence of non-overlapping subnetworks such that reachability can
be computed autonomously and sequentially on each subnetwork. We
represent each interaction using a small polynomial. The product of
these polynomials express different scenarios when a signal can or
cannot reach to target genes from the source genes. We introduce
polynomial collapsing operators for each subnetwork. These operators
reduce the size of the resulting polynomial and thus the
computational complexity dramatically. We show that our method
scales to entire human regulatory networks in only seconds, while
the existing methods fail beyond a few tens of genes and
interactions. We demonstrate that our method can successfully
characterize key reachability characteristics of the entire
transcriptions regulatory networks of patients affected by eight
different subtypes of leukemia, as well as those from healthy
control samples.

PP09 Complete Genome Assembly with Long Reads

Room: 304
Date: Sunday, July 13, 12:00 pm - 12:25 pm

Author(s):
Adam Phillippy, National Biodefense Analysis and Countermeasures Center, United States
Sergey Koren, National Biodefense Analysis and Countermeasures Center, United States
Gregory Harhay, U.S. Department of Agriculture, United States
Timothy Smith, U.S. Department of Agriculture, United States
James Bono, U.S. Department of Agriculture, United States
Dayna Harhay, U.S. Department of Agriculture, United States
Scott McVey, U.S. Department of Agriculture, United States
Diana Radune, National Biodefense Analysis and Countermeasures Center, United States
Nicholas Bergman, National Biodefense Analysis and Countermeasures Center, United States

Session Chair: Serafim Batzoglou

Abstract Show

The short reads generated by first- and second-generation sequencing often produce highly fragmented assemblies, even for small genomes. Single-molecule sequencing addresses this problem by greatly increasing read length, which simplifies assembly. By analyzing the repeat complexity of 2,267 complete microbial genomes, we have shown that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio sequencing library. This reduces the cost of microbial finishing by an order of magnitude. More recently, we assembled the eukaryotic genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, and Drosophila melanogaster using only PacBio reads. In the case of D. melanogaster, our PacBio Corrected Reads (PBcR) algorithm assembled the genome more completely than the current reference, which involved over a decade of manual finishing. I will present both these past and present results, as well as a new approach for scaling long-read assembly to gigabase-sized genomes.

PP10 GRASP: Analysis of genotype-phenotype results from 1,390 genome-wide association studies and corresponding open access database

Room: 311
Date: Sunday, July 13, 3:05 pm - 3:30 pm

Author(s):
Richard Leslie, University of Massachusetts Medical School, United States
Christopher O'Donnell, National Institute's of Health, United States
Andrew Johnson, National Institute's of Health, United States

Session Chair: Fran Lewitter

Abstract Show

We created a deeply extracted and annotated database of GWAS study results. GRASP v1.0 contains >6.2 million SNP-phenotype association from among 1,390 GWAS studies. We re-annotated GWAS results with 16 annotation sources including some rarely compared to GWAS results (e.g., RNAediting sites, lincRNAs, PTMs). RESULTS. GWAS have grown exponentially, with increases in sample sizes and markers tested, and continuing bias toward European ancestry samples. GRASP contains >100,000 phenotypes, roughly: eQTLs (71.5%), metabolite QTLs (21.2%), methylation QTLs (4.4%), and diseases, biomarkers and other traits (2.8%). cis-eQTLs, meQTLs, mQTLs and MHC region SNPs are highly enriched among significant results. After removing these categories, GRASP still contains a greater proportion of studies and results than comparable GWAS catalogs.

Cardiovascular disease and related risk factors predominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR=0.18[0.16-0.20]) and Y (OR=0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P<0.05) levels. We show this gene length correlation decays at increasingly more stringent P-value thresholds. Potential pleiotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. CONCLUSION. Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new insights. The GRASP database is available at http://apps.nhlbi.nih.gov/grasp.

PP11 Capturing short tandem repeat variation from paired-end sequencing data

Room: 304
Date: Sunday, July 13, 3:05 pm - 3:30 pm

Author(s):
Mikael Boden, The University of Queensland, Australia
Minh Cao, The University of Queensland, Au
Edward Tasker, Monash University, Au
Sailaja Vishwanathan, Monash University, Au
Sridevi Sureshkumar, Monash University, Au
Sureshkumar Balasubramanian, Monash University, Au
Kai Willadsen, The University of Queensland, Au
Michael Imelfort, The University of Queensland, Au

Session Chair: Cenk Sahinalp

Abstract Show

Expansion of tri-nucleotide repeats is known to cause over twenty neurological diseases. While next-generation sequencing technologies offer unprecedented opportunities to assess variation in genomes, they have limitations in regard to repeat regions. We review options scientists have to estimate length variation of short tandem repeats of biological significance, and to investigate what causes their instability. We present a Bayesian method to statistically detect variation from paired-end sequence data to (for the first time) analyse repeat tracts of sizes beyond the read length of current technology. Using strains of A. thaliana, we experimentally validate estimates, and recover the only known unstable repeat locus IIL1. Extensive quantitative comparisons of alternative analysis pipelines provide guidance to the likely outcome in terms of repeat variant calling accuracy.

PP12 How antibodies chase antigens, how Antigens try to escape and how we can use this to predict antibody specificity

Room: 302
Date: Sunday, July 13, 3:05 pm - 3:30 pm

Author(s):
Inbal Sela-Culang, Bar-Ilan University, Israel
Yanay Ofan, Bar Ilan University, Israel
Vered Kunik, Bar Ilan University, Israel
Anat Burkovitz, Bar Ilan University, Israel
Guy Nimrod, Bar Ilan University, Israel
Mohammed Rafii-El-Idrissi Benhnia, La Jolla Institute for Allergy and Immunology, United States
Michael H. Matho, La Jolla Institute for Allergy and Immunology, United States
Thomas Kaever, La Jolla Institute for Allergy and Immunology, United States
Matt Maybeno, La Jolla Institute for Allergy and Immunology, United States
Andrew Schlossman, La Jolla Institute for Allergy and Immunology, United States
Dirk Zajonc, La Jolla Institute for Allergy and Immunology, United States
Shane Crotty, La Jolla Institute for Allergy and Immunology, United States
Bjoern Peters, La Jolla Institute for Allergy and Immunology, United States
Sheng Li, University of Texas Health Science Center, United States
Yan Xiang, University of Texas Health Science Center, United States

Session Chair: Toni Kazic

Abstract Show

Abs must bind indistinct patches on proteins that attempt to escape recognition. They must be able to recognize virtually any surface while strictly maintaining their own fold. A little is known about the mechanisms that allow Abs to do this. Thus, while most drugs that are in clinical development are Abs, there is currently no simple way to determine experimentally or computationally what exactly they bind.
We will review a series of studies that revealed key mechanisms that enable Abs to perform these tasks. We will present a novel prediction approach that utilizes these findings, combined with simple competition assays, to predict where on an Ag a given Ab will bind. The accuracy of these predictions is verified experimentally using crystallography and other methods. To conclude, we will bring more examples, and discuss the power of combining sophisticated predictions with simple experiments.

PP13 A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer

Room: 311
Date: Sunday, July 13, 3:35 pm - 4:00 pm

Author(s):
Levi Waldron, City University of New York, United States
Benjamin Haibe-Kains, Princess Margaret Cancer Centre, Ca
Aedin Culhane, Dana-Farber Cancer Institute, Us
Markus Riester, Dana-Farber Cancer Institute, Us
Thomas Risch, Dana-Farber Cancer Institute, Us
Svitlana Tyekucheva, Dana-Farber Cancer Institute, Us
Ina Jazic, Dana-Farber Cancer Institute, Us
Xin Victoria Wang, Dana-Farber Cancer Institute, Us
Mahnaz Ahmadifar, Dana-Farber Cancer Institute, Us
Benjamin Frederick Ganzfried, Dana-Farber Cancer Institute, Us
Giovanni Parmigiani, Dana-Farber Cancer Institute, Us
Curtis Huttenhower, Havard School of Public Health, Us
Michael Birer, Massachusetts General Hospital, Us
Christoph Bernau, LMU Munich, De

Session Chair: Fran Lewitter

Abstract Show

The growth of genomic technologies has generated a bioinformatics cottage industry creating gene signatures of disease and disease outcomes. Specialized tools for regression and machine learning now make it relatively easy to tune and train high-dimensional models for the prediction of patient outcome from genomic data, but most such published models remain orphaned in the literature, without follow-up validation or clinical application. We therefore undertook a systematic evaluation of 14 published gene expression-based outcome prediction models for late-stage, high-grade, serous ovarian cancer, in a curated database of 1,251 patients from 10 microarray datasets. This work assesses: 1) the accuracy of published predictive models when applied to independent datasets, 2) which modeling approaches that have been most and least effective, and 3) the influence of popular validation datasets on the literature. This talk argues for changes in what constitutes “validation” of prediction models generated from genomic data.

PP14 BlockClust: efficient clustering and classification of non-coding RNAs from short read profiles

Room: 304
Date: Sunday, July 13, 3:35 pm - 4:00 pm

Author(s):
Fabrizio Costa, University of Freiburg, Germany
Dominic Rose, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany
Pavankumar Videm, University of Freiburg, Germany

Session Chair: Cenk Sahinalp

Abstract Show

Non-coding RNAs play a vital role in many cellular processes such as
RNA splicing, translation, gene regulation. However the vast
majority of ncRNAs still have no functional annotation. One prominent
approach for putative function assignment is clustering of
transcripts according to sequence and secondary structure. However
sequence information is changed by post-transcriptional
modifications, and secondary structure is only a proxy for the true
three dimensional conformation of the RNA polymer. A different type
of information that does not suffer from these issues and that can
be used for the detection of RNA classes, is the pattern of
processing and its traces in small RNA-seq reads data.

Here we introduce BlockClust, an efficient approach to detect
transcripts with similar processing patterns. We propose a novel way
to encode expression profiles in compact discrete structures, which
can then be processed using fast graph kernel techniques. We perform
both unsupervised clustering and develop family specific
discriminative models; finally we show how the proposed approach is
both scalable, accurate and robust across different organisms,
tissues and cell lines.

PP15 Tertiary structure-based prediction for conformational B-cell epitopes through B factors

Room: 302
Date: Sunday, July 13, 3:35 pm - 4:00 pm

Author(s):
Jing Ren, University of Technology Sydney, Australia
Qian Liu, University of Technology Sydney, Australia
John Ellis, University of Technology Sydney, Australia
Jinyan Li, University of Technology Sydney, Australia

Session Chair: Toni Kazic

Abstract Show

Motivation: B-cell epitope is a small area on the surface of an antigen that binds to an antibody. Accurately locating epitopes is of critical importance for vaccine development. Compared with wet-lab methods, computational methods have strong potential for efficient and large-scale epitope prediction for antigen candidates at much lower cost. However, it is still not clear which features are good determinants for accurate epitope prediction, leading to the unsatisfactory performance of existing prediction methods. Method and results: We propose a much more accurate B-cell epitope prediction method. Our method uses a new feature B factor (obtained from X-ray crystallography), combined with other basic physicochemical, statistical, evolutionary and structural features of each residue. These basic features are extended by a sequence window and a structure window. All these features are then learned by a two-stage random forest model to identify clusters of antigenic residues and to remove isolated outliers. Tested on a dataset of 55 epitopes from 45 tertiary structures, we prove that our method significantly outperforms all three existing structure-based epitope predictors. Following comprehensive analysis, it is found that features such as B factor, relative accessible surface area and protrusion index play an important role in characterizing B-cell epitopes. Our detailed case studies on an HIV antigen and an influenza antigen confirm that our second stage learning is effective for clustering true antigenic residues and for eliminating self-made prediction errors introduced by the first-stage learning.

PP16 Dissecting Cancer Heterogeneity with network based approach

Room: 311
Date: Sunday, July 13, 4:05 pm - 4:30 pm

Author(s):
Teresa Przytycka, National Institutes of Health, United States
DongYeon Cho, NIH, Us

Session Chair: Fran Lewitter

Abstract Show

One of the major obstacles in developing cancer treatment is cancer heterogeneity. Heterogeneity of genetic and epigenetic alterations leads heterogeneity in gene expression making the discovery of genetic drivers and key genes dysregulated by their aberration very challenging.
Pathway-centric approaches have emerged as methods that can empower studies of cancer heterogeneity. I will describe two approaches we have recently developed. First, combining the utility of algorithmic techniques with the power of network-centric approaches, we designed a novel approach that allows unsupervised detection of subnetworks that are dysregulated in a subgroup of patients. The second, complementary approach, builds in topic modeling and utilizes a mixture model. Our model is based on two components (i) a measure of phenotypic similarity between the patients (ii) a list of features - possible disease causes such as mutations, copy number variations. This works complements the appreciation of cancer diversity wight the ability to represent it.

PP17 WISECONDOR detects small fetal chromosomal aberrations in low-coverage NGS data of maternal plasma.

Room: 304
Date: Sunday, July 13, 4:05 pm - 4:30 pm

Author(s):
Roy Straver, VU University Medical Center Amsterdam, Netherlands
Erik Sistermans, VU University Medical Center Amsterdam, Nl
Henne Holstege, VU University Medical Center Amsterdam, Nl
Daphne van Beek, VU University Medical Center Amsterdam, Nl
Allerdien Visser, VU University Medical Center Amsterdam, Nl
Cees Oudejans, VU University Medical Center Amsterdam, Nl
Marcel Reinders, Delft University of Technology, Nl

Session Chair: Cenk Sahinalp

Abstract Show

The presentation gives a background of non-invasive prenatal testing with NGS and addresses the necessity of using low coverage data, as well as the resulting disadvantages on downstream analysis. We highlight our algorithmic contribution in the detection of fetal copy number aberrations which is based on a within-sample read depth comparison. Also, we address some of the underlying basic assumptions incorporated in our approach. Then we show that our method, called WISECONDOR, reliably detects small copy number changes in a cohort of more than 200 pregnancies. We take a few exemplary cases that highlight the potential of WISECONDOR and show how clinical geneticists can use this tool. Finally, we discuss how the tool is currently being implemented throughout hospitals in the Netherlands.

PP18 An Efficient Parallel Algorithm for Accelerating Computational Protein Design

Room: 302
Date: Sunday, July 13, 4:05 pm - 4:30 pm

Author(s):
Yichao Zhou, Tsinghua University, China
Wei Xu, Tsinghua University, China
Bruce R. Donald, Duke University, United States
Jianyang Zeng, Tsinghua University, China

Session Chair: Toni Kazic

Abstract Show

Motivation: Structure-based computational protein design is an
important topic in protein engineering. Under the assumption of a rigid
backbone and a finite set of discrete conformations for side-chains,
various methods have been proposed to address this problem. A
popular method is to combine the Dead-End Elimination (DEE) and
A* tree search algorithms, which provably finds the Global Minimum
Energy Conformation (GMEC) solution.

Results: In this paper, we improve the efficiency of computing A*
heuristic functions for protein design and also propose a variant of A*
algorithm in which the search process can be performed on GPUs in
a massively parallel fashion . In addition, we made some efforts to
address the memory exceeding problems in A* search. As a result,
our enhancements can achieve a significant speedup of the original
A* search for protein design in four orders of magnitude on big scale
test data, while still maintaining an acceptable memory overhead. Our
parallel A* search algorithm can be combined with iMinDEE, a recent
DEE criterion for rotamer pruning to further improve structure-based
computational protein design with the consideration of side-chain
flexibility.

PP19 Inductive Matrix Completion for Predicting Gene-Disease Associations

Room: 311
Date: Sunday, July 13, 4:35 pm - 5:00 pm

Author(s):
Nagarajan Natarajan, University of Texas at Austin, United States
Inderjit Dhillon, University of Texas at Austin, United States

Session Chair: Fran Lewitter

Abstract Show

Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies --- for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms by observing patients. Similarly, the type of evidence available for genes varies --- for example, specific microarray probes convey information only for certain sets of genes. In this paper, we apply a novel matrix completion method recently developed by Jain 2013 to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases emph{not} seen at training time, unlike traditional matrix completion approaches and network-based inference methods that are transductive.

Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better - it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed method (second best) that has less than 15\% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e., genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature curated by Bornigen.

Availability: Source code and datasets at http://www.cs.utexas edu/~naga86/research/IMC

PP20 Probabilistic Method for Detecting Copy Number Variation in a Fetal Genome using Maternal Plasma Sequencing

Room: 304
Date: Sunday, July 13, 4:35 pm - 5:00 pm

Author(s):
Ladislav Rampášek, University of Toronto, Canada
Aryan Arbabi, University of Toronto, Canada
Michael Brudno, University of Toronto, Canada

Session Chair: Cenk Sahinalp

Abstract Show

Motivation: The past several years have seen the development of
methodologies to identify genomic variation within a fetus through the
non-invasive sequencing of maternal blood plasma. These methods
are based on the observation that maternal plasma contains a
fraction of DNA (typically 5-15%) originating from the fetus, and such
methodologies have already been used for the detection of whole-
chromosome events (aneuploidies), and to a more limited extent for
smaller (typically several megabases long) Copy Number Variants
(CNVs).

Results: Here we present a probabilistic method for non-invasive
analysis of de novo CNVs in fetal genome based on maternal plasma
sequencing. Our novel method combines three types of information
within a unified Hidden Markov Model: the imbalance of allelic ratios
at SNP positions, the use of parental genotypes to phase nearby
SNPs, and depth of coverage to better differentiate between various
types of CNVs and improve precision. Our simulation results, based
on in silico introduction of novel CNVs into plasma samples with 13%
fetal DNA concentration, demonstrate a sensitivity of 90% for CNVs
>400 kilobases (with 13 calls in an unaffected genome), and 40% for
50-400kb CNVs (with 108 calls in an unaffected genome).

Availability: Implementation of our model and data simulation
method is available at http://github.com/compbio-UofT/fCNV

PP21 BioC: a minimalist approach to interoperability for biomedical text processing

Room: 302
Date: Sunday, July 13, 4:35 pm - 5:00 pm

Author(s):
Donald C. Comeau, U.S. National Library of Medicine, United States
Rezarta Islamaj Doğan, U.S. National Library of Medicine, United States
Paolo Ciccarese, Harvard University, United States
Kevin Bretonnel Cohen, University of Colorado, United States
Martin Krallinger, Spanish National Cancer Research Centre, Spain
Florian Leitner, Spanish National Cancer Research Centre, Spain
Zhiyong Lu, U.S. National Library of Medicine, United States
Yifan Peng, University of Delaware, United States
Fabio Rinaldi, University of Zurich, Ch
Manabu Torii, University of Delaware, United States
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Karin Verspoor, The University of Melbourne, Australia
Thomas C. Wiegers, North Carolina State University, United States
Cathy H. Wu, University of Delaware, United States
W. John Wilbur, U.S. National Library of Medicine, United States

Session Chair: Toni Kazic

Abstract Show

After a brief motivation, there will an overview of the BioC format and supporting input/output libraries. Then there will be a summary of the BioC implementations, tools, services, and corpora currently available. Implementations of BioC to hold this data, read it from and write it back to XML files are available in C++, Go, Java, Python, and Ruby. Online services using the format are available for semantic role labeling, sentence simplification, and entity labeling. Text preprocessing pipelines for sentence segmentation, tokenization, parts of speech, lemmatization, and parsing are available in C++ and Java. Named entity recognizers are available for disease, genes, chemicals, species and mutations. Annotated corpora are available for abbreviation definition detection, disease mentions, protein-protein interaction events and metabolites. The examples will focus on tools and applications that demonstrate the features and flexibility of BioC.

PP22 Country-specific antibiotic use practices impact the human gut resistome

Room: 311
Date: Monday, July 14, 10:30 am - 10:55 am

Author(s):
Kristoffer Forslund, European Molecular Biology Laboratory, Germany
Shinichi Sunagawa, EMBL, Germany
Jens Roat Kultima, EMBL, Germany
Daniel Mende, EMBL, Germany
Manimozhiyan Arumugam, Copenhagen University, Germany
Athanasios Typas, EMBL, Germany
Peer Bork, EMBL, Germany

Session Chair: Janet Kelso

Abstract Show

Despite increasing concerns over inappropriate use of antibiotics in medicine and food production, population-level resistance transfer into the human gut microbiota has not been demonstrated beyond individual case studies. To determine the "antibiotic resistance potential" for entire microbial communities, we employ metagenomic data and quantify the totality of known resistance genes in each community (its resistome) for 68 classes and subclasses of antibiotics. In 252 fecal metagenomes, we show that the most abundant resistance determinants are those for antibiotics also used in animals, and for antibiotics longer in use. Resistance potential is higher in samples from Spain, Italy and France than from Denmark, the US, or Japan. Differences in country-level data on antibiotic use in both humans and animals, where available, match the observed resistance potential differences. Antibiotic resistance determinants of individuals persist in the human gut flora for at least a year.

PP23 RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

Room: 304
Date: Monday, July 14, 10:30 am - 10:55 am

Author(s):
Zhaojun Zhang, UNC - Chapel Hill, United States
Wei Wang, University of California, Los Angeles, United States

Session Chair: Bernard Moret

Abstract Show

Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base-pair level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. In order to improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method.

Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity and introduces the notion of sig-mers that are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable to any state of the art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses less than 4% of the k-mers and less than 10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in less than 10 minutes per sample by using just a single thread on a commodity computer, which represents more than 100 speedup over the state of the art alignment-based methods, while delivering comparable or higher accuracy.

Availability:
The software is available at http://www.csbio.unc.edu/rs

PP24 Metabolite Identification through Multiple Kernel Learning on Fragmentation Trees

Room: 302
Date: Monday, July 14, 10:30 am - 10:55 am

Author(s):
Huibin Shen, Aalto University, Finland
Kai Dührkop, Friedrich-Schiller-University Jena, Germany
Sebastian Böcker, Friedrich-Schiller-University Jena, Germany
Juho Rousu, Aalto University, Finland

Session Chair: Yanay Ofran

Abstract Show

Motivation: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods has been proposed for the
identification of metabolites from tandem mass spectra. Fragmentation tree
methods explore the space of possible ways the metabolite can fragment, and
base the metabolite identification on scoring of these fragmentation trees.
Machine learning methods has been used to map mass spectra to molecular
fingerprints; predicted fingerprints, in turn, can be used to score candidate
molecular structures.

Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures.
We introduce a family of kernels capturing the similarity of fragmentation
trees, and combine these kernels using recently proposed multiple kernel
learning approaches. Experiments on two large reference datasets show that
the new methods significantly improve molecular fingerprint prediction
accuracy. These improvements result in better metabolite identification,
doubling the number of metabolites placed ranked at the top position of the
candidate list.

PP25 Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation

Room: 312
Date: Monday, July 14, 10:30 am - 10:55 am

Author(s):
Tarmo Äijö, Aalto University, Finland
Vincent Butty, Massachusetts Institute of Technology, United States
Zhi Chen, University of Turku, Finland
Verna Salo, University of Turku, Finland
Subhash Tripathi, University of Turku / Åbo Akademi University , Finland
Christopher Burge, Massachusetts Institute of Technology, United States
Riitta Lahesmaa, University of Turku, Finland
Harri Lähdesmäki, Aalto University,, Finland

Session Chair: Robert F. Murphy

Abstract Show

Motivation: Gene expression profiling using RNA-seq is a powerful technique for screening RNA species’ landscapes and their dynamics in an unbiased way. While several advanced methods exist for differential expression analysis of RNA-seq data, proper tools to analyze RNA-seq time-course have not been proposed.

Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and T cell activation (Th0). To quantify Th17 specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non- parametric Gaussian process to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experiment specific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional time point wise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected.

Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/

PP26 Relating the metatranscriptome and metagenome of the human gut

Room: 311
Date: Monday, July 14, 11:00 am - 11:25 am

Author(s):
Eric Franzosa, Harvard School of Public Health, United States
Xochitl Morgan, Harvard School of Public Health, United States
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Joshua Reyes, Harvard School of Public Health, United States
Ashlee Earl, The Broad Institute, United States
Georgia Giannoukos, The Broad Institute, United States
Dawn Ciulla, The Broad Institute, United States
Dirk Gevers, The Broad Institute, United States
Matthew Boylan, Division of Gastroenterology, United States
Andrew Chan, Division of Gastroenterology, United States
Jacques Izard, Department of Microbiology, United States
Wendy Garrett, Department of Immunology and Infectious Diseases, United States

Session Chair: Janet Kelso

Abstract Show

We have conducted one of the first human microbiome studies in a well-described large prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites. Systematic comparison of the gut metagenome and metatranscriptome revealed that a substantial fraction of microbial transcripts were not differentially regulated relative to their genomic abundances. Of the remainder, consistently under-expressed pathways included sporulation and amino acid biosynthesis, while upregulated pathways included ribosome biogenesis and methanogenesis. Across subjects, metatranscriptional profiles were significantly more individualized than DNA-level functional profiles, indicative of subject-specific whole-community regulation. This work also identified a subset of abundant oral microbes that routinely survive transit to the gut, but with minimal transcriptional activity there. Together, these results provide a community-wide profile of biomolecular regulatory processes in the gut, as well as validating one of the first protocols appropriate for large-scale functional profiling of the microbiome in human populations.

PP27 Power and Limitations of RNA-Seq: findings from the SEQC (MAQC-III) consortium

Room: 304
Date: Monday, July 14, 11:00 am - 11:25 am

Author(s):
David Kreil, Boku University Vienna, Austria

Session Chair: Bernard Moret

Abstract Show

In the US-FDA led SEQC/MAQC-III project, different sequencing platforms were tested at over ten sites using well-established reference RNA samples with built-in truths to assess the discovery and expression-profiling performances of platforms and analysis pipelines. The results demonstrate that novel exon-exon junctions can still be discovered beyond existing comprehensive annotations and at high sequencing depths. Extensive investigations encompassing diverse performance metrics characterizing reproducibility, accuracy, and information content were combined with comparisons to qPCR and microarray platforms showing that good inter-site and cross-platform concordances for differentially expressed genes are possible, which is particularly critical in clinical and regulatory settings. In general, however, performance is application, platform, and pipeline dependent, with transcript-level profiling affected more strongly. Together with data from applications of RNA-Seq from several preclinical and clinical problems, the entire SEQC data sets comprise >100 billion reads (10Tb) and provide a unique resource for testing future developments of RNA-Seq.

PP28 Computational Biology in Medicine: Novel Targets and Drug Repositioning Use Cases

Room: 302
Date: Monday, July 14, 11:00 am - 11:25 am

Author(s):
Pankaj Agarwal, GSK, United States
Philippe Sanseau, GSK, United States
Mark Hurle, GSK, United States

Session Chair: Yanay Ofran

Abstract Show

Identifying the protein to target with a medicine is a critical step in drug discovery, and it is commonly thought that innovation in drug discovery is limited because pharmaceutical companies tend to work on the same drug targets, leading to ‘me too’ drugs. However, we found that 42% of targets were innovative and not duplicative at all. In fact, competition on targets increased with more target validation as should be the case (Nat Rev Drug Discov. 2013 Aug;12(8):575-6). We also discuss systematic drug repositioning techniques based on computational analysis of data from transcriptomics (such as, Connectivity Map), side effects, phenotypic screens, and genome-wide association studies (Clin Pharmacol Ther. 2013 Apr;93(4):335-41). I will also present some key bioinformatics problems in medicine discovery.

PP29 Cell-selective labeling using amino acid precursors for proteomic studies of multicellular environments

Room: 312
Date: Monday, July 14, 11:00 am - 11:25 am

Author(s):
Nicholas Gauthier, Memorial Sloan Kettering Cancer Center, United States
Boumediene Soufi, University of Tübingen, Germany
William Walkowicz, Memorial Sloan-Kettering Cancer Center, United States
Virginia Pedicord, Memorial Sloan-Kettering Cancer Center, United States
Konstantinos Mavrakis, Memorial Sloan-Kettering Cancer Center, United States
Boris Macek, University of Tübingen, Germany
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Robert F. Murphy

Abstract Show

Tissue development, homeostasis, and pathogenesis involve complex signaling between many cell types through both secreted factors and direct cell-cell contact. We report a new technique to selectively and continuously label the proteomes of individual cell types in co-culture, named cell type-specific labeling using amino acid precursors (CTAP). In short, mammalian cell expression of exogenous amino acid biosynthesis enzymes from lower organisms allows specific populations of cells to produce their own supply of amino acids from supplemented amino acid precursors. The conversion of heavy isotope-labeled precursors to heavy labeled amino acids is restricted to enzyme-expressing populations, providing a way to genetically control protein labeling. Using quantitative mass spectrometry, we demonstrate the method’s ability to differentiate the cell-of-origin of intra- and intercellular proteins derived from multicellular cultures. Linking proteins to their cellular source using CTAP facilitates cell-cell communication studies and the discovery of cell type-specific biomarkers.

PP30 Pipasic: Similarity and Expression Correction for Strain-Level Identification and Quantification in Metaproteomics

Room: 311
Date: Monday, July 14, 11:30 am - 11:55 pm

Author(s):
Anke Penzlin, Robert Koch Institute, Germany
Martin S. Lindner, Robert Koch Institute, Germany
Joerg Doellinger, Robert Koch Institute, Germany
Piotr Wojtek Dabrowski, Robert Koch Institute, Germany
Nitsche Andreas, Robert Koch Institute, Germany
Bernhard Y. Renard, Robert Koch Institute, Germany

Session Chair: Janet Kelso

Abstract Show

Motivation: Metaproteomic analysis allows studying the interplay of organisms or functional groups and has become increasingly popular also for diagnostic purposes. However, difficulties arise due to the high sequence similarity between related organisms. Further, the state of conservation of proteins between species can be correlated with their expression level which can lead to significant bias in results and interpretation. These challenges are similar but not identical to the challenges arising in the analysis of metagenomic samples and require specific solutions.

Results: We introduce Pipasic (peptide intensity-weighted proteome abundance similarity correction) as a tool which corrects identification and spectral counting based quantification results using peptide similarity estimation and expression level weighting within a non-negative lasso framework. Pipasic has distinct advantages over approaches only regarding unique peptides or aggregating results to the lowest common ancestor, as demonstrated on examples of viral diagnostics and an acid mine drainage dataset.

PP31 Deep learning of the tissue-regulated splicing code

Room: 304
Date: Monday, July 14, 11:30 am - 11:55 pm

Author(s):
Michael Leung, University of Toronto, Canada
Hui Xiong, University of Toronto, Canada
Leo Lee, University of Toronto, Canada
Brendan Frey, University of Toronto, Canada

Session Chair: Bernard Moret

Abstract Show

Motivation: Alternative splicing is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on alternative splicing.

Methods: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters.

Results: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting alternative splicing patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.

PP32 DrugComboRanker: Drug Combination Discovery Based on Target Network Analysis

Room: 302
Date: Monday, July 14, 11:30 am - 11:55 pm

Author(s):
Lei Huang, Peking University, China
Fuhai Li, Houston Methodist Hospital Research Institute , United States
Jianting Sheng, Houston Methodist Hospital Research Institute, United States
Xiaofeng Xia, Houston Methodist Hospital Research Institute, United States
Jinwen Ma, Peking University
Ming Zhan, Houston Methodist Hospital Research Institute, United States
Stephen Wong, Houston Methodist Hospital Research Institute, United States

Session Chair: Yanay Ofran

Abstract Show

Motivation: Currently there are no curative anti-cancer drugs and drug resistance is often acquired after drug treatment. One of the reasons is that cancers are complex diseases, regulated by multiple signaling pathways and cross-talks among the pathways. It is expected that drug combinations can reduce drug resistance and improve patients’ outcomes. In clinical practice, the ideal and feasible drug combinations are combinations of existing FDA approved drugs or bioactive compounds that are already used on patients or have entered clinical trials and passed safety tests. These drug combinations could directly be used on patients with less concern of toxic effects. However, there is so far no effective computational approach to search effective drug combinations from the enormous number of possibilities.

Results: In this study, we propose a novel systematic computational tool DrugComboRanker to prioritize synergistic drug combinations and uncover their mechanisms of action. We first build a drug functional network based on their genomic profiles, and partition the network into numerous drug network communities by using a Bayesian non-negative matrix factorization approach. As drugs within overlapping community share common mechanisms of action, we next uncover potential targets of drugs by applying a recommendation system on drug communities. We meanwhile build disease-specific signaling networks based on patients’ genomic profiles and interactome data. We then identify drug combinations by searching drugs whose targets are enriched in the complementary signaling modules of the disease signaling network. The novel method was evaluated on lung adenocarcinoma and endocrine receptor (ER) positive breast cancer, and compared with other drug combination approaches. These case studies discovered a set of effective drug combinations top ranked in our prediction list, and mapped the drug targets on the disease signaling network to highlight the mechanisms of action of the drug combinations.

PP33 Automated detection and tracking of many cells by using 4D live-cell imaging data

Room: 312
Date: Monday, July 14, 11:30 am - 11:55 pm

Author(s):
Terumasa Tokunaga, The Institute of Statistical Mathematics, Japan
Osamu Hirose, Kanazawa University, Japan
Shotaro Kawaguchi, Kanazawa University, Japan
Yu Toyoshima, The University of Tokyo, Japan
Takayuki Teramoto, Kyushu University, Japan
Hisaki Ikebata, Graduate University of Advanced Studies, Japan
Sayuri Kuge, Kyushu University, Japan
Takeshi Ishihara, Kyushu University, Japan
Yuichi Iino, The University of Tokyo, Japan
Ryo Yoshida, The Institute of Statistical Mathematics, Japan

Session Chair: Robert F. Murphy

Abstract Show

Motivation:
Automated fluorescence microscopes produce massive amounts of images
observing cells often in four dimensions of space and time. This study
addresses two tasks of time-lapse imaging analyses; detection and
tracking of many imaged cells, especially intended for 4D live-cell
imaging of neuronal nuclei of C. elegans. Cells of interest appear
in little more generic forms than ellipsoids. They distribute densely
and move rapidly in a series of 3D images. In such cases, existing
tracking methods often fail due to that, for instance, many trackers
transit from one to the other of different objects during rapid moves.

Results:
The present method starts from converting each 3D image to a smooth
continuous function by performing the kernel density estimation. Cell
bodies in an image are assumed to lie in regions around multiple local
maxima of the density function. Then, the tasks of detecting and
tracking cells are addressed with two hill-climbing algorithms that we
derive. By applying the cell detection method to an image at the first
frame, the positions of trackers are initialized. The tracking algorithm
keeps attracting them to around local maxima changing over time in a
subsequent image sequence. To prevent the trackers from turnovers and
coalescences, we employ Markov random fields (MRFs) to model spatial and
temporal covariation of cells, and maximize the image forces and the MRF-
induced constraints on transitions of the trackers. The tracking
procedure is demonstrated with dynamic 3D images containing more than
one hundred neurons of C. elegans.

PP34 Primate Transcript and Protein Expression Levels Evolve under Compensatory Selection Pressures

Room: 311
Date: Monday, July 14, 12:00 pm - 12:25 pm

Author(s):
Zia Khan, University of Maryland, United States
Michael Ford, MS Bioworks, LLC, United States
Darren Cusanovich, University of Chicago, United States
Amy Mitrano, University of Chicago, United States
Jonathan Prichard, Stanford University, United States
Yoav Gilad, University of Chicago, United States

Session Chair: Janet Kelso

Abstract Show

Due to the technical and computational challenges of conducting comparative, genome-scale proteomics, essentially all studies of gene regulatory evolution across primates and other mammals have focused on mRNA levels rather than protein levels. Yet, proteins perform much of the work of the cell and are subject to regulation not revealed by mRNA levels alone. Using quantitative mass spectrometry and novel computational analysis methods, we obtained thousands of comparative mRNA and protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines. We used data from all three species to identify genes whose regulation might have evolved under natural selection, and considered jointly, our data allowed us to identify genes where lineage-specific changes might specifically affect post-transcriptional or post-translational regulation. Our analyses indicate that on an evolutionary timescale, there is surprising flexibility in primate mRNA levels, as these changes are often either buffered or compensated for at the protein level.

PP35 Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

Room: 304
Date: Monday, July 14, 12:00 pm - 12:25 pm

Author(s):
Yuanfang Guan, University of Michigan, United States
Hongdong Li, University of Michigan, United States
Rajasree Menon, University of Michigan, United States
Yuchen Wen, University of Michigan, United States
Gilbert S. Omenn, University of Michigan, United States
Matthias Kretzler, University of Michigan, United States
Yuanfang Guan, University of Michigan, United States

Session Chair: Bernard Moret

Abstract Show

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6.

PP36 Drug synergy screen and network modeling in dedifferentiated liposarcoma identifies CDK4 and IGF1R as synergistic drug targets

Room: 302
Date: Monday, July 14, 12:00 pm - 12:25 pm

Author(s):
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Evan Molinelli, Memorial Sloan Kettering Cancer Center, United States
Jayasree Nair, Memorial Sloan Kettering Cancer Center, United States
Tahir Sheikh, Memorial Sloan Kettering Cancer Center, United States
Rita Samy, Memorial Sloan Kettering Cancer Center, United States
Xiaohong Jing, Memorial Sloan Kettering Cancer Center, United States
Qin He, Memorial Sloan Kettering Cancer Center, United States
Anil Korkut, Memorial Sloan Kettering Cancer Center, United States
Aimee Crago, Memorial Sloan Kettering Cancer Center, United States
Samuel Singer, Memorial Sloan Kettering Cancer Center, United States
Gary Schwartz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Yanay Ofran

Abstract Show

In this study, entitled “Drug synergy screen and network modeling in dedifferentiated liposarcoma identifies CDK4 and IGF1R as synergistic drug targets”, we have successfully deployed a powerful perturbation-based systems biology approach to discover and characterize drug combinations with translational relevance for cancer treatment. We demonstrate its applicability in dedifferentiated liposarcoma via the discovery of a synergistic IGF1R and CDK4 combination therapy, which we will further test in a clinical setting. From a drug combination screen with 14 anti-cancer drugs, proteomic and phenotypic response profiles serve as input for our de novo network inference method that is subsequently used to predict pathway-based mechanisms of drug synergy. We believe that this study is highly relevant for the interdisciplinary and systems biology-oriented participants of the ISMB conference. We anticipate that such integrated approaches to combinatorial therapeutics will be widely used and provide opportunities to bridge basic cancer research and clinical drug development.

PP37 NaviCell: a web-based environment for navigation, curation, maintenance and data analysis in the context of large molecular interaction maps

Room: 312
Date: Monday, July 14, 12:00 pm - 12:25 pm

Author(s):
Emmanuel Barillot, Institut Cuire, France
Inna Kuperstein, Institut Cuire, France
David Cohen, Institut Cuire, Fr
Stuart Pook, Institut Cuire, Fr
Eric Viara, Institut Cuire, Fr
Laurence Calzone, Institut Cuire, Fr
Emmanuel Barillot, Institut Cuire, Fr
Andrei Zinovyev, Institut Cuire, Fr

Session Chair: Robert F. Murphy

Abstract Show

Biological knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner. NaviCell is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback. NaviCell can be used for studying molecular entities of interest in the context of signaling pathways and crosstalk between pathways within a global signaling network. It greatly facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive and user-friendly fashion due to an imbedded blogging system. In addition, NaviCell provides tools for omics data integration and overlaying several types of data; visualization and analysis in the context of signaling networks.

PP38 Efficient Modeling and Active Learning Discovery of Biological Responses

Room: 311
Date: Monday, July 14, 2:10 pm - 2:35 pm

Author(s):
Armaghan Naik, Carnegie Mellon University, United States
Joshua Kangas, Carnegie Mellon, United States
Christopher Langmead, Carnegie Mellon, United States

Session Chair: Reinhard Schneider

Abstract Show

High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective if potential effects of all compounds on all possible targets could be considered, yet the cost of complete experimentation is prohibitive. Here we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce accurate estimates of unmeasured experiments much faster than by selecting experiments at random.

PP39 Analyzing DNase I biases reveals a novel DNA methylation readout mechanism

Room: 304
Date: Monday, July 14, 2:10 pm - 2:35 pm

Author(s):
Harmen Bussemaker, Columbia University, United States
Allan Lazarovici, Columbia University, Us
Tianyin Zhou, University of Southern California, Us
Anthony Shafer, University of Washington, Us
Ana Carolina Dantas Machado, University of Southern California, Us
Richard Sandstrom, University of Washington, Us
Peter Sabo, University of Washington, Us
Yan Lu, University of Southern California, Us
Remo Rohs, University of Southern California, Us
John Stamatoyannopoulos, University of Washington, Us

Session Chair: Michal Linial

Abstract Show

We have uncovered a novel and general mechanism by which cytosine methylation can dramatically strengthen specific protein-DNA interactions. By analyzing DNase I digests of purified human genomic DNA, we discovered that (i) cleavage rate varies over a thousand-fold range with the surrounding sequence, and that cleavage near CpG dinucleotides is ten-fold higher when the cytosine is methylated. Combining all-atom computer simulation predictions of DNA shape with statistical analysis of massively parallel sequencing data, we were able to find a unified explanation for these phenomena. It turns out that cytosine methylation narrows the DNA minor groove, which in turn strengthens interactions with positively charged amino-acid side chains. Such minor groove contacts occur for a wide range of transcription factors, as well as nucleosomes. The novel structural mechanism put forward in this study therefore has the potential to significantly deepen our understanding of how epigenetic information is "read" by the cell.

PP40 Statistical challenges in whole-exome sequencing of 7000 individuals with schizophrenia and controls

Room: 302
Date: Monday, July 14, 2:10 pm - 2:35 pm

Author(s):
Menachem Fromer, Icahn School of Medicine at Mount Sinai, United States
Shaun Purcell, Icahn School of Medicine at Mount Sinai, United States

Session Chair: Dietlind Gerloff

Abstract Show

In dealing with large next-generation sequencing data in the study of disease, the sheer number of potentially relevant neutral genetic variants results in a decreased signal-to-noise ratio. In this talk, we will discuss how we dealt with these issues in two recently published studies of schizophrenia. In one, ~2500 schizophrenia cases and ~2500 matched controls were whole-exome-sequenced. The magnitude of neutral mutations largely overwhelms the number of genetic variants more likely related to disease, which we focus on by frequency filtering, biological impact, and pathway analysis. In the second study, de novo mutations were sought out in father-mother-child trios to find mutations not yet subjected to selective pressures. In this instance, the overwhelming majority of such potential mutations (seemingly arising as new in the children) are false positives. We carefully sifted through these using sequencing and other metrics to find the real ones most likely associated with disease.

PP41 Pareto-Optimal Phylogenetic Tree Reconciliation

Room: 312
Date: Monday, July 14, 2:10 pm - 2:35 pm

Author(s):
Ran Libeskind-Hadas, Harvey Mudd College, United States
Yi-Chieh Wu, Massachusetts Institute of Technology, United States
Mukul S. Bansal, University of Connecticut, United States
Manolis Kellis, Massachusetts Institute of Technology, United States

Session Chair: Russell Schwartz

Abstract Show

Motivation: Phylogenetic tree reconciliation is a widely-used method for reconstructing the evolutionary histories of gene families and species, hosts and parasites, and other dependent pairs of entities. Reconciliation is typically performed using maximum parsimony, in which each evolutionary event type is assigned a cost and the objective is to find a reconciliation of minimum total cost. It is generally understood that reconciliations are sensitive to event costs, but little is understood about the relationship between events costs and solutions. Moreover, choosing appropriate event costs is a notoriously difficult problem.

Results: We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies.

Availability: Our Python tools are freely available at www.cs.hmc.edu/~hadas/xscape
Contact: mukul@engr.uconn.edu

PP43 Constructing Hepitypes:Phasing Local Genotyping and DNA Methylation

Room: 304
Date: Monday, July 14, 2:40 pm - 3:05 pm

Author(s):
Wen-Yu Chung, National Kaohsiung University of Applied Sciences, Taiwan, China
Robert Schmitz, The Salk Institute for Biological Studies, United States
Tanya Biorac, Life Technologies Corp.-Ion Torrent, United States
Delia Ye, Life Technologies Corp.-Ion Torrent, United States
Miroslav Dudas, Life Technologies Corp.-Ion Torrent, United States
Gavin Meredith, Life Technologies Corp.-Ion Torrent, United States
Christopher Adams, Life Technologies Corp.-Ion Torrent, United States
Joseph Ecker, The Salk Institute for Biological Studies, United States
Michael Zhang, University of Texas at Dallas, United States

Session Chair: Michal Linial

Abstract Show

Whole-genome DNA methylation sequencing provides both methylation patterns and genetic information. We utilized base resolution methylomes to directly identify allelic linkage of DNA methylation and genomic variants. The paired association was further extended to construct hepitypes by the simultaneous phasing of genotype and methylation. Using such approach, the sequencing reads provide direct statistics of the interdependence between methylcytosines and nucleotide variations; consequently, the detailed patterns of genetic and epigenetic variations can be readily inferred by data. Moreover, the analysis is not limited by known single nucleotide variants. In addition to imprinted regions and SNV-in-CpG sites, we show numerous cis-regulatory sequence-associated DNA methylation sites. We extended this strategy to incorporate multiple nucleotide and methylation sites and ranked hepitypes according to the observed frequency. The top-ranked hepitypes indicate that methylated sites are often observed from the same allele.

PP44 Genome leaks

Room: 302
Date: Monday, July 14, 2:40 pm - 3:05 pm

Author(s):
Steven Brenner, University of California, Berkeley, United States

Session Chair: Dietlind Gerloff

Abstract Show

Genome science is reaching a critical juncture. More than 10,000 genetic variants have been associated with traits, allowing breakthroughs in basic biological research and medical applications. However, privacy concerns have excluded most researchers from directly analyzing the vast wealth of human genomic information. Nonetheless, risks of vast data breaches are rapidly rising—and further progress requires ever-larger cohorts. We currently inhibit research without effectively protecting human subjects; prospects for harm to both individuals and to medical research are growing.

An extended discussion of the genome leaks issues may be found at http://compbio.berkeley.edu/proj/leak/

PP45 Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling

Room: 312
Date: Monday, JUly 14, 2:40 pm - 3:05 pm

Author(s):
Yuval Tabach, Massachusetts General Hospital, United States
Gary Ruvkun, Massachusetts General Hospital, United States
Carmit Levy, Tel Aviv University, United States

Session Chair: Russell Schwartz

Abstract Show

Genes with common profiles of the presence and absence in disparate genomes tend to function in the same pathway. By mapping all human genes into about 1000 clusters of genes with similar patterns of conservation across eukaryotic phylogeny, we determined that sets of genes associated with particular diseases have similar phylogenetic profiles. By focusing on those human phylogenetic gene clusters that significantly overlap some of the thousands of human gene sets defined by their coexpression or annotation to pathways or other molecular attributes, we reveal the evolutionary map that connects molecular pathways and human diseases. The other genes in the phylogenetic clusters enriched for particular known disease-genes or molecular pathways identify candidate genes for roles in those same disorders and pathways. Focusing on proteins coevolved with the microphthalmia-associated transcription factor(MITF), we identified the Notch pathway suppressor of hairless (RBP-Jk/SuH) transcription factor, and showed that RBP-Jk functions as an MITF cofactor.

PP46 Gene network inference by probabilistic scoring of relationships from a factorized model of interactions

Room: 311
Date: Monday, July 14, 3:10 pm - 3:35 pm

Author(s):
Marinka Zitnik, University of Ljubljana, Slovenia
Blaz Zupan, University of Ljubljana, Slovenia

Session Chair: Reinhard Schneider

Abstract Show

Motivation: Epistasis analysis is an essential tool of classical genetics for inferring the order of function of genes in a common pathway. Typically, it considers single and double mutant phenotypes and for a pair of genes observes if a change in the first gene masks the effects of the mutation in the second gene. Despite the recent emergence of biotechnology techniques that can provide gene interaction data on a large, possibly genomic scale, very few methods are available for quantitative epistasis analysis and epistasis-based network reconstruction.

Results: We here propose a conceptually new probabilistic approach to gene network inference from quantitative interaction data. The approach is founded on epistasis analysis. Its features are joint treatment of the mutant phenotype data with a factorized model and probabilistic scoring of pairwise gene relationships that are inferred from the latent gene representation. The resulting gene network is assembled from scored pairwise relationships. In an experimental study, we show that the proposed approach can accurately reconstruct several known pathways and that it surpasses the accuracy of current approaches.

PP47 A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

Room: 304
Date: Monday, July 14, 3:10 pm - 3:35 pm

Author(s):
Erez Levanon, Bar-Ilan University, Israel
Yishay Pinto, Bar-Ilan University, Israel
Haim Cohen, Bar-Ilan University, Israel
Lily Bazk, Bar-Ilan University, Israel
Ami Haviv, Bar-Ilan University, Israel
Michal Barak, Bar-Ilan University, Israel
Jasmine Jacob-Hirsch, Bar-Ilan University, Israel
Patricia Deng, Stanford University, United States
Rui Zhang, Stanford University, United States
Jin Billy Li, Stanford University, United States
Gidi Rechavi, Chaim Sheba Medical Center, Israel

Session Chair: Michal Linial

Abstract Show

RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon is not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu that form double-stranded RNA undergo editing, although most sites exhibit editing at only low levels. We estimate that there are over 100 million human Alu editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

PP48 Privacy Preserving Protocol for Detecting Genetic Relatives Using Rare Variants

Room: 302
Date: Monday, July 14, 3:10 pm - 3:35 pm

Author(s):
Farhad Hormozdiari, University of California, Los Angeles, United States
Jong Wha Joo, University of California, Los Angeles, United States
Feng Guan, University of California, Los Angeles, United States
Akshay Wadia, University of California, Los Angeles, United States
Rafail Ostrosky, University of California, Los Angeles, United States
Amit Sahai, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Dietlind Gerloff

Abstract Show

Motivation: High-throughput sequencing technologies have impacted many
areas of genetic research. One such area is the identification of
relatives from genetic data. The standard approach for the
identification of genetic relatives collects the genomic data of
all individuals and stores it in a database. Then, each pair of individuals are compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test.

Results: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provides the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins
which was not possible using the existing methods. We also show in the
1000 genomes data with cryptic relationships that our method can detect these individuals.

PP49 Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules

Room: 312
Date: Monday, July 14, 3:10 pm - 3:35 pm

Author(s):
Sushmita Roy, University of Wisconsin, Madison, United States
Ilan Wapinski, Harvard Medical School, United States
Jenna Pfiffner, Broad institute, United States
Courtney French, University of California, United States
Amanda Socha, Darthmouth College, United States
Jay Konieczka, Broad institute, United States
Naomi Habib, Broad institute, United States
Manolis Kellis, MIT, United States
Dawn Thompson, Broad institute, United States
Aviv Regev, MIT & Broad institute, United States

Session Chair: Russell Schwartz

Abstract Show

Comparative functional genomics seeks to measure and compare functional measurements such as mRNA, chromatin states across multiple species. A major challenge is to develop effective tools to systematically compare these data across multiple species. In this talk, we will present a new computational approach, Arboretum to systematically identify modules of co-expressed genes in a species phylogeny. Arboretum is based on a probabilistic model of expression data that is applicable to complex phylogenies with multiple gene duplication and loss events. We applied Arboretum to study the evolution of transcriptional modules in yeast and mammalian species. In yeast, we find substantial conservation in the module expression patterns, although the specific genes in each module diverge in a life-style or clade-specific manner. We will also present some recent results on application of Arboretum to identify conservation and divergence of tissue-specific modules in mammalian species.

PP50 Functional Association Networks as Priors for Gene Regulatory Network Inference

Room: 311
Date: Monday, July 14, 3:40 pm - 4:05 pm

Author(s):
Matthew Studham, SciLifeLab, Sweden
Andreas Tjärnberg, SciLifeLab, Sweden
Torbjörn Nordling, SciLifeLab, Sweden
Sven Nelander, Uppsala University, Sweden
Erik Sonnhammer, SciLifeLab, Sweden

Session Chair: Reinhard Schneider

Abstract Show

Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data is inadequate to fully explain the network, informative priors have been shown to improve the accuracy of inferences. This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic data sets indicates that if the prior networks have enough causal information then they can improve GRN inference accuracy, and if not then accuracy may decrease. This opens the door to the possibility that functional association databases can be used as priors to make GRN inference more reliable.

PP51 Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression

Room: 304
Date: Monday, July 14, 3:40 pm - 4:05 pm

Author(s):
Steve Lianoglou, Memorial Sloan Kettering Cancer Center, United States
Christina Leslie, Memorial Sloan Kettering Cancer Center, United States
Julie Yang, Memorial Sloan Kettering Cancer Center, United States
Christine Mayr, Memorial Sloan Kettering Cancer Center, United States
Vidur Garg, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Michal Linial

Abstract Show

More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 3′ untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. We developed a sequencing method called 3′-seq to quantitatively map the 3′ ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that most tissue-restricted genes have single 3′ UTRs, whereas most ubiquitously transcribed genes generate multiple 3′ UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes typically change 3′ UTR isoform ratios to achieve tissue specificity. However, these regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs.

PP52 Detecting chromatin modifications in cancer samples: challenges and solutions

Room: 302
Date: Monday, July 14, 3:40 pm - 4:05 pm

Author(s):
Valentina Boeva, Institut Curie, France
Haitham Ashoor, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Sa
Aurelie Herault, UMR 144 CNRS, Subcellular Structure and Cellular Dynamics, Fr
Aurelie Kamoun, Institut Curie, Fr
Francois Radvanyi, Institut Curie, Fr
Vladimir B. Bajic, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Sa
Emmanuel Barillot, Institut Curie, Mines ParisTech, Fr

Session Chair: Dietlind Gerloff

Abstract Show

Changes in gene expression in cancer cells are usually associated with specific changes in epigenetic profiles (e.g., histone modification profiles). In order to characterize these changes, ChIP-seq experiments are often employed. Our recent work (Ashoor et al., 2013) has demonstrated that for detection of histone modifications in cancer cells one should apply specific methods that take into account possible DNA copy number aberrations. We apply the method we developed for the detection of histone modifications to reanalyze ENCODE ChIP-seq datasets generated for cancer cell lines. We show that current ENCODE histone modification profiles and called regions with histone modifications have a systematic copy number bias: irrespective of the particular histone modification, regions of genomic gain tend to contain more called histone modifications than regions of genomic loss. Our results suggest that the ENCODE cancer cell datasets should be reanalyzed in order to eliminate the copy number bias we have observed.

PP53 Proteomic Universal Correlate of Evolution

Room: 312
Date: Monday, July 14, 3:40 pm - 4:05 pm

Author(s):
David Horn, Tel-Aviv University, Israel

Session Chair: Russell Schwartz

Abstract Show

We introduce a novel unifying methodology for the investigation of Compositional Order (CO) of protein sequences. It accounts for all types of low-complexity regions and repetitive phenomena, including the existence of large periodic structures in protein sequences. We define new CO measures providing insights into the correlation of CO with protein function and with evolution. In particular, a large-scale analysis of 94 proteomes shows that the CO vocabulary of frequently appearing amino acid triplets serves as a measure of taxonomic ordering separating major clades from each other. It serves as a novel phylogenetic tool and suggests that major CO generation occurs during the creation of a completely new species, i.e. during macroevolutionary events. It provides an alternative to the traditional ordering of species based on effective population size x mutation rate, Neu, with which it anti-correlates well, signifying that increasing FT vocabulary is associated with low evolutionary pressure.

PP54 Novel Burkholderia mallei Virulence Factors Linked to Specific Host-Pathogen Protein Interactions

Room: 311
Date: Tuesday, July 15, 10:30 am - 10:55 am

Author(s):
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Vesna Memisevic, U.S. Army Medical Research and Materiel Command, United States
Nela Zavaljevski, U.S. Army Medical Research and Materiel Command, United States
Rembert Pieper, J. Craig Venter Institute, United States
Seesandra Rajagopala, J. Craig Venter Institute, United States
Keehwan Kwon, J. Craig Venter Institute, United States
Katherine Townsend, J. Craig Venter Institute, United States
Chenggang Yu, U.S. Army Medical Research and Materiel Command, United States
Xueping Yu, U.S. Army Medical Research and Materiel Command, United States
David DeShazer, U.S. Army Medical Research Institute of Infectious Diseases, United States
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Anders Wallqvist, U.S. Army Medical Research and Materiel Command, United States

Session Chair: Scott Markel

Abstract Show

Bacterial proteins required for virulence, i.e., virulence factors, are a key component of bacterial pathogenicity, as they control and promote pathogenic infection and intracellular survival. Here, we present a combined in silico, in vitro, and in vivo strategy to identify and characterize novel virulence factors of Burkholderia mallei, an infectious intracellular pathogen and the causative agent of glanders. First, we used bioinformatics approaches to identify 49 putative virulent factors involved in B. mallei pathogenicity. Using yeast two-hybrid assays against normalized whole human and whole murine proteome libraries, we identified interactions between each of the putative virulent factors and host proteins. The analysis of these interactions helped us identify and characterize three novel B. mallei virulence factors, as well as host processes and pathways that can be exploited for drug and vaccine design. Finally, using murine aerosol challenge model experiments we verified that three novel virulence factors did indeed attenuate virulence.

PP55 Mapping Functional Transcription Factor Networks from Gene Expression Data

Room: 304
Date: Tuesday, July 15, 10:30 am - 10:55 am

Author(s):
Michael Brent, Washington University, United States

Session Chair: Robert F. Murphy

Abstract Show

I will present an algorithm, NetProphet, for inferring transcriptional regulatory networks from expression profiling of transcription factor (TF) deletion mutants. I will then show that a network constructed from this type of expression data identifies direct binding targets more accurately than one constructed from a large chromatin immunoprecipitation (ChIP) data set. However, ChIP networks contain many edges with no evidence of functional effect on target expression levels; in our network, every edge is functional. Furthermore, gene expression experiments are much easier, more reliable, and more high throughput than ChIP experiments. We conclude that gene expression, not ChIP, is currently the optimal method for network reconstruction. Finally, we show some new biological discoveries we've made with NetProphet, and describe a large deletion and expression profiling experiment that is being driven by NetProphet. TFs likely to participate in a specific biological process are targeted for deletion using a method we call PhenoProphet.

PP56 Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions.

Room: 302
Date: Tuesday, July 15, 10:30 am - 10:55 am

Author(s):
Arne Elofsson, Stockholm University, Sweden
Sara Light, Stockholm University, Sweden
Rauan Sagit, Stockholm University, Sweden
Oxana Sachenkova, Stockholm University, Sweden
Diana Ekman, Stockholm University, Sweden

Session Chair: Predrag Radivojac

Abstract Show

Proteins evolve not only through point mutations but also by insertion
and deletion events, which affect the length of the protein. It is known that such indel events most frequently occur in
surface-exposed loops. However, detailed analysis of indel events in
distantly related and fast evolving proteins is hampered by the
difficulty involved in correctly aligning such sequences. We
circumvent this problem by first only analyzing homologous proteins
based on length variation rather than pairwise alignments. Using this
approach we find a surprisingly strong relationship between difference
in length and difference in the number of intrinsically disordered
residues, where up to 75% of the length variation can be
explained by changes in the number of intrinsically disordered
residues. Further, we find that disorder is common in both insertions
and deletions. A more detailed analysis reveals that indel events do
not induce disorder but rather that already disordered regions accrue
indels, suggesting that there is a lowered selective pressure for
indels within intrinsically disordered regions.

PP57 Graph Regularized Dual Lasso for Robust eQTL Mapping

Room: 312
Date: Tuesday, July 15, 10:30 am - 10:55 am

Author(s):
Wei Cheng, UNC at Chapel Hill, United States
Xiang Zhang, Case Western Reserve University, United States
Zhishan Guo, UNC at Chapel Hill, United States
Yu Shi, University of Science and Technology of China
Wei Wang, University of California, Los Angeles, United States

Session Chair: Toni Kazic

Abstract Show

As a promising tool for dissecting the genetic basis of complex traits, expression quantitative trait loci (eQTL) mapping has attracted increasing research interest. An important issue in eQTL mapping is how to effectively integrate networks representing interactions among genetic markers and genes. Recently, several Lasso-based methods have been proposed to leverage such network information. Despite their success, existing methods have three common limitations: 1) a preprocessing step is usually needed to cluster the networks; 2) the incompleteness of the networks and the noise in them are not considered; 3) other available information, such as location of genetic markers and pathway information, are not integrated.

To address the limitations of the existing methods, we propose Graph-regularized Dual Lasso (GDL), a robust approach for eQTL mapping. GDL integrates the correlation structures among genetic markers and traits simultaneously. It also takes into account the incompleteness of the networks and is robust to the noise. GDL utilizes graph-based regularizers to model the prior networks and does not require an explicit clustering step. Moreover, it enables further refinement of the partial and noisy networks. We further generalize GDL to incorporate the location of genetic makers and gene pathway information. We perform extensive experimental evaluations using both simulated and real datasets. Experimental results demonstrate that the proposed methods can effectively integrate various available priori knowledge and significantly outperform the state-of-the-art eQTL mapping methods.

PP58 Mathematical Modeling of Virus-Host Interactions

Room: 311
Date: Tuesday, July 15, 11:00 am - 11:25 am

Author(s):
Lars Kaderali, Technische Universität Dresden, Germany
Marco Binder, University of Heidelberg, Germany
Nurgazy Sulaimanov, University of Heidelberg, Germany
Diana Clausznitzer, Technische Universität Dresden, Germany
Manuel Schulze, Technische Universität Dresden, Germany
Cristian Hüber, University of Heidelberg, Germany
Simon Lenz, University of Heidelberg, Germany
Johannes Schloeder, University of Heidelberg, Germany
Martin Trippler, University Hospital Essen, Germany
Ralf Bartenschlager, University of Heidelberg, Germany
Volker Lohmann, University of Heidelberg, Germany

Session Chair: Scott Markel

Abstract Show

As obligate intracellular parasites, viruses rely on host factors for every single step of their lifecycle. This gives rise to complex interaction networks between virus and host cell, constituting a prime example necessitating a systems biology approach. I will show how we tightly integrated mathematical modeling, bioinformatics and wetlab experiments to decipher interactions between hepatitis C virus and its host cell. Through an iterative cycle between modeling and experiment, including genome-wide siRNA screening and expression profiling, we identified key host processes determining differences in infection between different cell lines, and set up predictive mathematical models that quantitatively and mechanistically explain differences in viral replication. Mathematical model analysis has implications for drug design for hepatitis C virus infection, which I will discuss in the presentation. I will furthermore present ongoing work on integrating cellular immune response, as well as extensions of the model to other viruses, and implications for antiviral treatment.

PP59 Chromatin landscapes and long-range interactions of retroviral and transposon integrations

Room: 304
Date: Tuesday, July 15, 11:00 am - 11:25 am

Author(s):
Jeroen de Ridder, Delft University of Technology, Netherlands
Johann de Jong, Netherlands Cancer Institute, Netherlands
Lodewyk Wessels, Netherlands Cancer Institute, Netherlands
Sepideh Babaei, Delft University of Technology, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands
Waseem Akhtar, Netherlands Cancer Institute, Netherlands

Session Chair: Robert F. Murphy

Abstract Show

The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools for cancer gene discovery and gene therapy. These integrating elements have distinct integration biases. To study these biases, we generated very large datasets consisting of ∼120000 to ∼180000 unselected genomic integrations for 3 types of integrating elements. We overlaid these integration profiles with ∼80 (epi)genomic features to generate bias maps at both local and genome-wide scales. We moreover overlay a large collection of retroviral cancer-causing insertions with genome-wide chromatin capture conformation (Hi-C) data. This enables the exploration of the occurrence of 3D hot-spots of recurrent mutations that are in spatial proximity of putative cancer genes. Taken together, our results provide an assessment of integration bias at unprecedented resolution and provide new insights into the mechanisms through which retroviral integrations deregulate cellular processes in cancer cells.

PP61 EPIQ - Efficient detection of SNP-SNP epistatic interactions for quantitative traits

Room: 312
Date: Tuesday, July 15, 11:00 am - 11:25 am

Author(s):
Yaara Arkin, Tel-Aviv University, Israel
Elior Rahmani, Tel Aviv University, Israel
Marcus E. Kleber, University of Heidelberg, Germany
Reijo Laaksonen, University of Tampere, Finland
Winfried Maerz, University of Heidelberg, Germany
Eran Halperin, Tel-Aviv University, Israel

Session Chair: Toni Kazic

Abstract Show

Motivation: Gene-gene interactions are of potential biological and medical interest, as they can shed light on both the inheritance mechanism of a trait and on the underlying biological mechanisms. Evidence of epistatic interactions has been reported in both humans and other organisms. Unlike single-locus genome wide association studies (GWAS), which proved efficient in detecting numerous genetic loci related with various traits, interaction-based GWAS have so far produced very few reproducible discoveries. Such studies introduce a great computational and statistical burden by necessitating a large number of hypotheses to be tested including all pairs of SNPs. Thus, many software tools have been developed for interaction-based case- control studies, some leading to reliable discoveries. For quantitative data, on the other hand, only a handful of tools exist, and the computational burden is still substantial.

Results: We present an efficient algorithm for detecting epistasis in quantitative GWAS, achieving a substantial runtime speedup by avoiding the need to exhaustively test all SNP pairs using metric embedding and random projections. Unlike previous metric embedding methods for case-control studies, we introduce a new embedding, where each SNP is mapped to two Euclidean spaces. We implemented our method in a tool named EPIQ (EPIstasis detection for Quantitative GWAS), and we show by simulations that EPIQ requires hours of processing time where other methods require days and sometimes weeks. Applying our method to a dataset from the Ludwigshafen Risk and Cardiovascular Health study discovered a pair of SNPs with a near-significant interaction (p=2.2×10^-13), in only 1.5 hours on 10 processors. Availability: https://github.com/yaarasegre/EPIQ

PP62 Accurate viral population assembly from ultra-deep sequencing data

Room: 311
Date: Tuesday, July 15, 11:30 am - 11:55 pm

Author(s):
Serghei Mangul, University of California, Los Angeles, United States
Nicholas Wu, University of California, Los Angeles, United States
Nicholas Mancuso, Georgia State University, United States
Alex Zelikovsky, Georgia State University, United States
Ren Sun, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States

Session Chair: Toni Kazic

Abstract Show

Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. In this paper we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population which allows for the detection of previously undiscovered rare variants.

The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allows VGA to assemble rare variants. VGA utilizes an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method which scales to millions of sequencing reads. The open source C++/Python implementation of VGA is freely available for download at http://genetics.cs.ucla.edu/vga/ Contact: serghei@cs.ucla.edu

PP63 Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data

Room: 304
Date: Tuesday, July 15, 11:30 am - 11:55 pm

Author(s):
Jianlin Cheng, University of Missouri Columbia, United States
Tuan Trieu, University of Missouri, Columbia, United States

Session Chair: Robert F. Murphy

Abstract Show

The three-dimensional (3D) structure of a genome is critical for studying genome folding, genome function, and spatial gene regulation, but it has not been well studied. In this presentation, I will first describe a novel chromosomal-contact driven method to take Hi-C chromosomal interaction data as input in order to reconstruct the 3D structures of chromosomes. The method will be followed by a live video demonstrating how the 3D shape of a chromosome is constructed from the Hi-C data of human B-cells. Then I will show that the 3D chromosomal structures reconstructed from the Hi-C data of human B-Cells not only satisfy the observed Hi-C chromosomal contact data and some known chromatin organization features well, but also predict new Hi-C contacts accurately according to the validation test. Finally, I will describe how to assemble chromosomal structures into the 3D shape of the whole genome and discuss its dynamics.

PP64 Why N-Terminal domains tend to be shorter than C-Terminal domains?

Room: 302
Date: Tuesday, July 15, 11:30 am - 11:55 pm

Author(s):
Ron Unger, Bar Ilan University, Israel
Etai Jacob, Bar-Ilan University, Il
Amnon Horovitz, Weizmann Institute , Il

Session Chair: Predrag Radivojac

Abstract Show

Computational analysis of proteomes in all kingdoms of life reveals a strong tendency for N-terminal domains in two-domain proteins to have shorter sequences than their neighboring
C-terminal domains. Given that folding rates are affected by chain length, we asked whether the tendency for N-terminal domains to be shorter than their neighboring C-terminal domains reflects selection for faster folding N-terminal domains. Calculations of
contact order, another predictor of folding rate, provide additional evidence that N-terminal domains tend to fold faster than their C-terminal neighboring domains. A possible explanation for this bias, which is more pronounced in prokaryotes than in eukaryotes, is that faster folding of N-terminal domains reduces the risk of protein aggregation during folding by preventing formation of non-native interdomain interactions. This explanation is supported by our finding that two-domain proteins with a shorter N-terminal domain are more abundant than those with a shorter C-terminal domain.

PP65 A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data

Room: 312
Date: Tuesday, July 15, 11:30 am - 11:55 pm

Author(s):
Iman Hajirasouliha, Brown University, United States
Ahmad Mahmoody, Brown University, United States
Ben Raphael, Brown University, United States

Session Chair: Toni Kazic

Abstract Show

High-throughput sequencing of tumor samples has shown that most tumors exhibit extensive intra-tumor heterogeneity, with multiple subpopulations of tumor cells containing different somatic mutations. Recent studies have quantified this intra- tumor heterogeneity by clustering mutations into subpopulations according to the observed counts of DNA sequencing reads containing the variant allele. However, these clustering approaches do not consider that the population frequencies of different tumor subpopulations are correlated by their shared ancestry in the same population of cells.

In this paper, we introduce the binary tree partition, a novel combinatorial formulation of the problem of constructing the subpopulations of tumor cells from the variant allele frequencies of somatic mutations. We show that finding a binary tree partition is an NP-complete problem; derive an approximation algorithm for an optimization version of the problem; and present a recursive algorithm to find a binary tree partition with errors in the input. We show that the resulting algorithm outperforms existing clustering approaches on simulated and real sequencing data.

PP66 Deciphering human disease mutations through the atomic-resolution protein interactome network

Room: 311
Date: Tuesday, July 15, 12:00 pm - 12:25 pm

Author(s):
Haiyuan Yu, Cornell University, United States
Yu Guo, Cornell University, United States
Jishnu Das, Cornell University, United States
Hao Ran Lee, Cornell University, United States
Xiaomu Wei, Cornell University, United States
Jin Liang, Cornell University, United States
Robert Fragoza, Cornell University, United States
Adithya Sagar, Cornell University, United States
Xiujuan Wang, Cornell University, United States
Matthew Mort, Cardiff University, United Kingdom
Peter Stenson, Cardiff University, United Kingdom
David Cooper, Cardiff University, United Kingdom
Andrew Grimson, Cornell University, United States
Steven Lipkin, Weill Cornell Medical College, United States
Andrew Clark, Cornell University, United States

Session Chair: Toni Kazic

Abstract Show

To better understand the molecular mechanisms and genetic basis of human disease, we combined the massive scale of network systems biology with the supreme resolution of traditional structural biology to generate the first comprehensive atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with structurally-defined interface residues for each interaction. We found that disease mutations are significantly enriched both among interface residues and other non-interface ones within the same domains, contradicting the previous assumption that only a few interface residues are mutation hot spots for disease. We further classified 94,476 disease-associated mutations according to their inheritance modes and found that the widely-accepted “guilt-by-association” principle does not apply to dominant mutations. Furthermore, recessive truncating mutations on the same interface are much more likely to cause the same disease, even if they are close to the N-terminus of the protein, indicating that a significant fraction of truncating mutations can generate functional protein products.

PP67 A statistical approach for inferring the 3D structure of the genome

Room: 304
Date: Tuesday, July 15, 12:00 pm - 12:25 pm

Author(s):
Nelle Varoquaux, Mines ParisTech, France
Ferhat Ay, University of Washington, United States
William Noble, University of Washington, United States
Jean-Philippe Vert, Mines ParisTech, France

Session Chair: Robert F. Murphy

Abstract Show

Motivation: Recent technological advances allow the measurement,
in a single Hi-C experiment, of the frequencies of physical contacts
among pairs of genomic loci at a genome-wide scale. The next
challenge is to infer, from the resulting DNA-DNA contact maps,
accurate three dimensional models of how chromosomes fold and
fit into the nucleus. Many existing inference methods rely upon
multidimensional scaling (MDS), in which the pairwise distances of
the inferred model are optimized to resemble pairwise distances
derived directly from the contact counts. These approaches, however,
often optimize a heuristic objective function and require strong
assumptions about the biophysics of DNA to transform interaction
frequencies to spatial distance, and thereby may lead to incorrect
structure reconstruction.

Methods: We propose a novel approach to infer a consensus three-
dimensional structure of a genome from Hi-C data. The method
incorporates a statistical model of the contact counts, assuming
that the counts between two loci follow a Poisson distribution whose
intensity decreases with the physical distances between the loci. The
method can automatically adjust the transfer function relating the
spatial distance to the Poisson intensity and infer a genome structure
that best explains the observed data.

Results: We compare two variants of our Poisson method, with or
without optimization of the transfer function, to four different MDS-
based algorithms—two metric MDS methods using different stress
functions, a nonmetric version of MDS, and ChromSDE, a recently
described, advanced MDS method—on a wide range of simulated
datasets. We demonstrate that the Poisson models reconstruct better
structures than all MDS-based methods, particularly at low coverage
and high resolution, and we highlight the importance of optimizing
the transfer function. On publicly available Hi-C data from mouse
embryonic stem cells, we show that the Poisson methods lead to
more reproducible structures than MDS-based methods when we
use data generated using different restriction enzymes, and when we
reconstruct structures at different resolutions.

Availability: A Python implementation of the proposed method is
available at http://cbio.ensmp.fr/pastis.

PP68 New Directions for Diffusion-Based Network Prediction of Protein Function: Incorporating Pathways with Confidence

Room: 302
Date: Tuesday, July 15, 12:00 pm - 12:25 pm

Author(s):
Mengfei Cao, Tufts University, United States
Christopher Pietras, Tufts University, United States
Xian Feng, Tufts University, United States
Kathryn Doroschak, University of Minnesota, United States
Thomas Schaffner, Tufts University, United States
Jisoo Park, Tufts University, United States
Hao Zhang, Tufts University, United States
Lenore Cowen, Tufts University, United States
Benjamin Hescott, Tufts University, United States

Session Chair: Predrag Radivojac

Abstract Show

Motivation: It has long been hypothesized that incorporating models of network
noise as well as edge directions and known pathway information
into the representation of protein-protein interaction networks might
improve their utility for functional inference. However, a simple way
to do this has not been obvious. We find that DSD,
our recent diffusion-based metric for measuring dissimilarity in protein-protein
interaction (PPI) networks, has natural extensions that incorporate
confidence, directions, and can even express coherent pathways by
calculating DSD on an augmented graph.

Results: We define three incremental versions of DSD which we term cDSD, caDSD,
and capDSD, where the capDSD matrix incorporates confidence, known
directed edges, and pathways into the measure of how similar each
pair of nodes is according to the structure of the PPI network. We
test four popular function prediction methods (majority
vote, weighted majority vote, multiway cut, and functional flow) using these different
matrices on the Baker's yeast PPI network in cross-validation. The
best performing method is weighted majority vote using capDSD.
We then test the performance of our augmented DSD
methods on an integrated heterogeneous set of protein association
edges from the STRING database. The superior performance of
capDSD in this context confirms that treating the pathways as
probabilistic units is more powerful than simply incorporating pathway
edges independently into the network.

Availability: All source code for calculating the confidences,
for extracting pathway information from KEGG XML files, and for
calculating the cDSD, caDSD and capDSD matrices is available from
http://dsd.cs.tufts.edu/capdsd

PP69 Reconstructing tumor evolution using simple somatic mutation frequencies

Room: 312
Date: Tuesday, July 15, 12:00 pm - 12:25 pm

Author(s):
Quaid Morris, University of Toronto, Canada
Wei Jiao, Ontario Institute for Cancer Research, Ca
Shankar Vembu, University of Toronto, Ca
Amit Deshwar, University of Toronto, Ca
Lincoln Stein, Ontario Institute for Cancer Research, Ca

Session Chair: Toni Kazic

Abstract Show

Tumors often contain multiple, genetically diverse subclonal populations of cells. To aid in the identification of driver mutations and improve understanding of tumor development, there is considerable interest in reconstructing the evolutionary history of these subclonal populations. I will describe when this it is possible to do this reconstruction using only the allelic frequencies of individual ‘simple somatic mutations (SSMs)’ (i.e., single nucleotide variants or small indels) from one or more tumor samples. I will also describe a new model, PhyloSub, that automatically performs this reconstruction. PhyloSub uses Bayesian inference, so it explicitly represents its uncertainty when multiple phylogenies are consistent with the frequency data. PhyloSub has promising results on real and simulated data, including one example where PhyloSub provides a near perfect reconstruction of three subclonal populations based on a single set of SSM frequencies from acute myeloid leukemia.

PP70 Genomic underpinnings for network patterns and evolution

CancelledRoom: 311
Date: Tuesday, July 15, 2:00 pm - 2:25 pm

Author(s):
Luay Nakhleh, Rice University, United States

Session Chair: Cenk Sahinalp

Abstract Show

In this talk, we discuss novel studies of the evolution of regulatory networks by tightly connecting them to the underlying genomes and shedding "the light of evolution" on the combined genome-network genotype. In particular, we conduct extensive population genetic simulations and show how network motifs arise due to neutral evolutionary forces when accounting for genomic features. Further, we use data on whole-genome duplication pairs in yeast to estimate the rate of evolution of protein interactions.

PP71 Inferring Gene Ontologies from Pairwise Similarity Data

Room: 304
Date: Tuesday, July 15, 2:00 pm - 2:25 pm

Author(s):
Michael Kramer, University of California, San Diego, United States
Janusz Dutkowski, University of California, San Diego, United States
Michael Yu, University of California, San Diego, United States
Vineet Bafna, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States

Session Chair: Lenore Cowen

Abstract Show

Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that:

(1) analyze a full matrix of gene–gene pairwise similarities from -omics data;
(2) infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and
(3) respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms.

Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference.

Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.

Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ~30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall).

Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.

PP72 Evaluating Synteny for Improved Comparative Studies

Room: 302
Date: Tuesday, July 15, 2:00 pm - 2:25 pm

Author(s):
Cristina G. Ghiurcuta, EPFL, Switzerland
Bernard M.E. Moret, EPFL, Switzerland

Session Chair: Alex Bateman

Abstract Show

Motivation: Comparative genomics aims to understand the structure and
function of genomes by translating knowledge gained about some genomes
to the object of study. Early approaches used pairwise comparisons, but
today researchers are attempting to leverage the larger potential of multiway
comparisons. Comparative genomics relies on the structuring of genomes into
syntenic blocks: blocks of sequence that exhibit conserved features across the
genomes. Syntenic blocks are required for complex computations to scale to
the billions of nucleotides present in many genomes; they enable comparisons
across broad ranges of genomes because they filter out much of the individual
variability; they highlight candidate regions for in-depth studies; and they
facilitate whole-genome comparisons through visualization tools. However, the
concept of syntenic block remains loosely defined. Tools for the identification
of syntenic blocks yield quite different results, thereby preventing a systematic
assessment of the next steps in an analysis. Current tools do not include
measurable quality objectives and thus cannot be benchmarked against
themselves. Comparisons among tools have also been neglected—what few
results are given use superficial measures unrelated to quality or consistency.
Results: We present a theoretical model as well as an experimental basis with
quality measures for comparing syntenic blocks and thus also for improving
or designing tools for the identification of syntenic blocks. We illustrate the
application of the model and the measures by applying them to syntenic blocks
produced by 3 different contemporary tools (DRIMM-Synteny, i-ADHoRe and
Cyntenator) on a dataset of 8 yeast genomes. Our findings highlight the need
for a well founded, systematic approach to the decomposition of genomes into
syntenic blocks. Our experiments demonstrate widely divergent results among
these tools, throwing into question the robustness of the basic approach in
comparative genomics. We have taken the first step towards a formal approach
to the construction of syntenic blocks by developing a simple quality criterion
based on sound evolutionary principles.

PP73 Network-based stratification of tumor mutations

Room: 312
Date: Tuesday, July 15, 2:00 pm - 2:25 pm

Author(s):
Matan Hofree, University of California, San Diego, United States
John P. Shen, University of California, San Diego, United States
Andrew Gross, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States

Session Chair: Paul Horton

Abstract Show

Classification of cancer is predominantly organ based and fails to account for considerable heterogeneity of clinical outcomes such as survival or response to therapy. Somatic tumor genomes provide a rich new source of data for uncovering subtypes, but have proven difficult to compare, as tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in multiple cancer cohorts from The Cancer Genome Atlas. In each case, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence.

PP74 Scale-space measures for graph topology link protein network architecture to function

Room: 311
Date: Tuesday, July 15, 2:30 pm - 2:55 pm

Author(s):
Marc Hulsman, Delft University of Technology, Netherlands
Christos Dimitrakopoulos, ETH Zurich, Switzerland
Jeroen de Ridder, Delft University of Technology, Netherlands

Session Chair: Cenk Sahinalp

Abstract Show

Motivation: The network architecture of physical protein interactions is an important determinant for the molecular functions that are carried out within each cell. To study this relation, the network architecture can be characterized by graph-topological characteristics such as shortest paths and network hubs. These characteristics have an important shortcoming: they do not take into account that interactions occur across different scales. This is important since some cellular functions may involve a single direct protein interaction (small scale) while others require more and/or indirect interactions, such as protein complexes (medium scale) and interactions between large modules of proteins (large scale).

Results: In this work, we derive generalized, scale-aware versions of known graph-topological measures based on diffusion kernels. We apply these to characterize the topology of networks across all scales simultaneously, generating a so called graph-topological scale-space. The comprehensive physical interaction network in yeast is used to show that scale-space based measures consistently give superior performance when distinguishing protein functional categories and three major types of functional interactions: genetic interaction, co-expression and perturbation interactions. Moreover, we demonstrate that graph-topological scale-spaces capture biologically meaningful features that provides new insights into the link between function and protein network architecture.

Availability: Matlab code to calculate the STMs is available from: http://bioinformatics.tudelft.nl/TSSA Contact: j.deridder@tudelft.nl

PP75 Using association rule mining to determine promising secondary phenotyping hypotheses

CancelledRoom: 304
Date: Tuesday, July 15, 2:30 pm - 2:55 pm

Author(s):
Anika Oellrich, Wellcome Trust Sanger Institute, United States
Julius Jacobsen, Wellcome Trust Sanger Institute, United Kingdom
Irene Papatheodorou, Wellcome Trust Sanger Institute, United Kingdom
The Sanger Mouse Genetics Project, Wellcome Trust Sanger Institute, United Kingdom
Damian Smedley, Wellcome Trust Sanger Institute, United Kingdom

Session Chair: Lenore Cowen

Abstract Show

Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help to identify the influences of genes and their modification on phenotypes. Gene–phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are about 20,000 genes in higher vertebrate genomes and the experimental verification of gene–phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing.

Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1,967 secondary phenotype hypotheses that cover 243 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed.
Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/ gene/secondaryphenotype/list.
Contact: ao5@sanger.ac.uk

PP76 Robust Clinical Outcome Prediction based on Bayesian Analysis of Transcriptional Profiles and Prior Causal Networks

Room: 302
Date: Tuesday, July 15, 2:30 pm - 2:55 pm

Author(s):
Kourosh Zarringhalam, UMass Boston/Pfizer, United States
Ahmed Enayetallah, Biogen Idec, United States
Padmalatha Reddy, Pfizer, United States
Daniel Ziemek, Pfizer, Germany

Session Chair: Alex Bateman

Abstract Show

Motivation: Understanding and predicting an individual’s response in a clinical trial is key to better treatments and cost effective medicine. Over the coming years, more and more large-scale omics datasets will become available to characterize patients with complex and heterogeneous diseases at a molecular level. Unfortunately, genetic, phenotypical, and environmental variation is much higher in a human trial population than currently modeled or measured in most animal studies. In our experience, this high variability can lead to failure of trained predictors in independent studies and undermines the credibility and utility of promising high-dimensional datasets.

Methods: We propose a method that utilizes patient-level genome- wide expression data in conjunction with causal networks based on prior knowledge. Our approach infers a differential expression profile for each patient and uses a Bayesian approach to infer corresponding upstream regulators. These regulators and their corresponding posterior probabilities of activity are used in a regularized regression framework to predict response.

Results: We validated our approach using two clinically relevant phenotypes, namely acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. To demonstrate pitfalls in translating trained predictors across independent trials, we analyze performance characteristics of our and alternative approaches on two independent datasets for each phenotype and show that the proposed approach is able to successfully incorporate causal prior knowledge to give robust performance estimates.

PP77 Detecting independent and recurrent copy number aberrations using interval graphs

Room: 312
Date: Tuesday, July 15, 2:30 pm - 2:55 pm

Author(s):
Hsin-Ta Wu, Brown University, United States
Iman Hajirasouliha, Brown University, United States
Benjamin Raphael, Brown University, United States

Session Chair: Paul Horton

Abstract Show

Somatic copy number aberrations are frequent in cancer genomes, but many of these are random, passenger events. A common strategy to distinguish functional aberrations from passengers is to identify those aberrations that are recurrent across multiple samples. However, the extensive variability in the length and position of copy number aberrations makes the problem of identifying recurrent aberrations notoriously difficult. We introduce a combinatorial approach to the problem of identifying independent and recurrent copy number aberrations, focusing on the key challenging of separating the overlaps in aberrations across individuals into independent events.

We derive independent and recurrent copy number aberrations as maximal cliques in an interval graph constructed from overlaps between aberrations. We efficiently enumerate all such cliques, and derive a dynamic programming algorithm to find an optimal selection of non-overlapping cliques, resulting in a very fast algorithm, which we call RAIG (Recurrent Aberrations from Interval Graphs). We show that RAIG outperforms other methods on simulated data and performs well on data from three cancer types from The Cancer Genome Atlas (TCGA). In contrast to existing approaches that employ various heuristics to select independent aberrations, RAIG optimizes a well-defined objective function. We show that this allows RAIG to identify rare aberrations that are likely functional, but are obscured by overlaps with larger passenger aberrations.

PP78 GraphProt: How to make sense out of CLIP-seq data

Room: 311
Date: Tuesday, July 15, 3:00 pm - 3:25 pm

Author(s):
Rolf Backofen, University of Freiburg, Germany
Sita Lange, University Freiburg, De
Daniel Maticzka, University Freiburg, De
Fabrizio Costa, University Freiburg, De

Session Chair: Cenk Sahinalp

Abstract Show

The paper deals with one of today's hottest topics in biology, namely the analysis of RNA-protein interactions. Recent studies revealed that hundreds of RNA-binding proteins (RPBs) regulate a plethora of post-transcriptional processes. The gold standard for identifying RBP targets are experimental CLIP-seq approaches. However, a large number of binding sites remain unidentified, which is a major yet underestimated problem. The reason is simply that CLIP-seq is sensitive to expression levels. Thus, available CLIP-seq experiments for a specific protein in liver cells cannot be used to infer targets say in kidney cells.

We provide a solution by learning an accurate protein-binding model based on an efficient graph-kernel approach that learns sequence-structure properties from several thousands binding sites. Transcripts targeted in any other cells can be identified with high specificity. E.g. we show that the up-regulation in an AGO-knockdown cannot be explained with existing AGO-CLIP-seq data, but it can when using our predictions.

PP79 Improved exome prioritization of disease genes through cross-species phenotype comparison

Room: 304
Date: Tuesday, July 15, 3:00 pm - 3:25 pm

Author(s):
Peter Robinson, Charite University Hospital, Germany
Sebastian Köhler, Charité, Germany
Anika Oellrich, Sanger Institute, United Kingdom
Kai Wang, UCS, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Nicole Washington, Lawrence Berkeley National Laboratory-, United States
Sebastian Bauer, Charité- , Germany
Dominik Seelow, Charité- , United States
Peter Krawitz, Charité, Germany
Christian Gilissen, Nijmegen, Netherlands
Melissa Haendel, U Oregon, United States
Damian Smedley, Sanger Institute- , United Kingdom

Session Chair: Lenore Cowen

Abstract Show

I will present an explanation of how cross-species phenotype analysis works. The International Mouse Phenotyping Consortium is currently creating KOs for all mammalian genes, resulting in an extremely useful resource for computational analysis. This is particularly interesting for human genetics, since about 3000 Mendelian disease genes are known in humans, but ca. 8000 genes have phenotypes in mouse ko models. Our work shows how to exploint this information to identify novel disease genes. We will present some examples of disease gene identifications in current projects.

PP80 MIRA: Mutual Information-Based Reporter Algorithm for Metabolic Networks

Room: 302
Date: Tuesday, July 15, 3:00 pm - 3:25 pm

Author(s):
A. Ercument Cicek, Carnegie Mellon University, United States
Kathryn Roeder, Carnegie Mellon University, United States
Gultekin Ozsoyoglu, Case Western Reserve University, United States

Session Chair: Alex Bateman

Abstract Show

Motivation: Discovering the transcriptional regulatory architecture of the metabolism has been an important topic to understand the implications of transcriptional fluctuations on metabolism. The reporter algorithm (RA) was proposed to determine the hot spots in metabolic networks, around which transcriptional regulation is focused due to a disease or a genetic perturbation. Using a z-score based scoring scheme, RA calculates the average statistical change in the expression levels of genes that are neighbors to a target metabolite in the metabolic network. The RA approach has been used in numerous studies to analyze cellular responses to the downstream genetic changes. In this paper, we propose a mutual information-based multivariate reporter algorithm (MIRA) with the goal of eliminating the following problems in detecting reporter metabolites: (1) conventional statistical methods suffer from small sample sizes, (2) as z-score ranges from minus to plus infinity, calculating average scores can lead to canceling out opposite effects, and (3) analyzing genes one by one, then aggregating results can lead to information loss. MIRA is a multivariate and combinatorial algorithm that calculates the aggregate transcriptional response around a metabolite using mutual information. We show that MIRA’s results are biologically sound, empirically significant and more reliable than RA.

Results: We apply MIRA to gene expression analysis of six knock-out strains of E. coli, and show that MIRA captures the underlying metabolic dynamics of the switch from aerobic to anaerobic respiration. We also apply MIRA, to an Autism Spectrum Disorder gene expression dataset. Results indicate that MIRA’s reports metabolites that highly overlap with recently found metabolic biomarkers in the autism literature. Overall, MIRA is a promising algorithm for detecting metabolic drug targets and understanding the relation between gene expression and metabolic activity.

PP81 Emerging landscape of oncogenic signatures across human cancers

Room: 312
Date: Tuesday, July 15, 3:00 pm - 3:25 pm

Author(s):
Giovanni Ciriello, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Bulent Arman Aksoy, Memorial Sloan Kettering Cancer Center, United States
Yasin Senbabaoglu, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States

Session Chair: Paul Horton

Abstract Show

Cancer therapy is challenged by the diversity of molecular implementations of oncogenic processes and by the resulting variation in therapeutic responses. Projects such as The Cancer Genome Atlas (TCGA) provide molecular tumor maps in unprecedented detail. The interpretation of these maps remains a major challenge. Here we distilled thousands of genetic and epigenetic features altered in cancers to ~500 selected functional events (SFEs). Using this simplified description,
we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class).
This distinction is clearest at the extremes of genomic instability, reflecting different oncogenic processes. The full hierarchy shows event signatures characteristic of cross-tissue tumor classes. Targetable functional events are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies.

PP82 From 1D to 3D and back: Genome scaffolding from DNA interaction frequency

Room: 311
Date: Tuesday, July 15, 3:30 pm - 3:55 pm

Author(s):
Noam Kaplan, University of Massachusetts Medical School, United States
Job Dekker, University of Massachusetts Medical School, United States

Session Chair: Cenk Sahinalp

Abstract Show

Despite the advancement of DNA sequencing technologies, assembly of complex genomes remains a major challenge. Surprisingly, the quality of published complex genomes has decreased, due to the growing use of short read sequencing.

We have developed a high-throughput scaffolding approach, based on the notion that loci that are near each other in the genomic sequence have a high probability of interacting with each other. We demonstrate that genome-wide in vivo chromatin interaction frequency measurements can be used as genomic distance proxies to accurately detect the positions of contigs over large distances without requiring any sequence overlap. Furthermore, we demonstrate our approach can karyotype and scaffold an entire genome de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods. Our approach can theoretically bridge any gap size, is simple, robust, scalable and applicable to any species.

PP83 A community effort to assess drug sensitivity prediction algorithms identifies approaches for improved performance

Room: 304
Date: Tuesday, July 15, 3:30 pm - 3:55 pm

Author(s):
James Costello, University of Colorado Anschutz Medical Campus, United States
Laura Heiser, OHSU, United States
Elisabeth Georgii, Aalto University, Finland
Michael Menden, EMBL, United Kingdom
Nicholas Wang, OHSU, United States
Mukesh Bansal, Columbia University, United States
Mohammad Ammad-ud-din, Aalto University, Finland
Petteri Hintsanen, University of Helsinki, Finland
Suleiman Khan, Aalto University, Finland
John-Patrick Mpindi, University of Helsinki, Finland
Olli Kallioniemi, University of Helsinki, Finland
Antti Honkela, University of Helsinki, Finland
Tero Aittokallio, University of Helsinki, Finland
Krister Wennerberg, University of Helsinki, Finland
James Collins, Boston University, United States
Dan Gallahan, NIH, United States
Dinah Singer, NIH, United States
Julio Saez-Rodriguez, EMBL, United Kingdom
Samuel Kaski, Aalto University, Finland
Joe Gray, OHSU, United States
Gustavo Stolovitzky, IBM, United States
Mehmet Gonen, Aalto University , Finland

Session Chair: Lenore Cowen

Abstract Show

Predicting the best treatment strategy from genomic information is a core goal of personalized medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling datasets measured in human breast cancer cell lines. Through a collaborative effort between the NCI and the DREAM project, we present a total of 44 drug sensitivity prediction algorithms. We identify characteristics of top-performing methodologies, namely modeling nonlinear relationships and the application of biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling datasets; however, performance was increased by including multiple, independent datasets. We present the top-performing methodology, Bayesian Multitask MKL, which implements kernelized regression, multiview learning, multitask learning and Bayesian inference. This study establishes benchmarks for drug sensitivity prediction and identifies features that can be leveraged for future method development. We provide detailed descriptions of all methods at:http://www.the-dream-project.org/

PP84 Metabolome-scale prediction of intermediate compounds in multi-step metabolic pathways with a recursive supervised approach

Room: 302
Date: Tuesday, July 15, 3:30 pm - 3:55 pm

Author(s):
Masaaki Kotera, Kyoto University, Japan
Yasuo Tabei, Japan Science and Technology Agency, Japan
Yoshihiro Yamanishi, Kyushu University, Japan
Ai Muto, Kyoto University, Japan
Yuki Moriya, Kyoto University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan

Session Chair: Alex Bateman

Abstract Show

Motivation: Metabolic pathway analysis is crucial not only in syste- matic metabolic engineering but also in rational drug design. Howe- ver, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale.

Results: In this paper we develop a novel method to predict the multi-step reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as ”multi-step reaction sequence likeness”, i.e., whether or not a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multi-step reaction sequences, based on chemical substructure fingerprints of compounds. In the results, we demonstrate the usefulness of our pro- posed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set, and discuss characteristic featu- res of the extracted chemical substructure transformation patterns in multi-step reaction sequences. Our comprehensively predicted reac- tion networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways.

PP85 MicroRNA-gene association as a prognostic biomarker in cancer exposes disease mechanisms

Room: 312
Date: Tuesday, July 15, 3:30 pm - 3:55 pm

Author(s):
Rotem Ben-Hamo, Bar Ilan University, Israel

Session Chair: Paul Horton

Abstract Show

This work demonstrates a new metric to uncover clinical stratifications hidden in the association between microRNAs and genes. We will explain the methods and algorithms used in this paper and highlight the importance of finding these underlying mechanisms that may be at the core of progression disease progression. The potential of microRNAs to act both as therapeutic agents and as disease biomarkers places this family of molecules at the forefront of biomedical interest, and the identification of genomic regulatory mechanisms, their affiliation with clinical outcome and the association between specific modifications in genome sequences that may explain gain and loss of such regulatory activity, combine to suggest specific disease mechanisms and possible means of intervention in the course of the disease. This discovery has been made possible by employing regulation as a quantifiable metric, combined with the availability of whole genome sequences.