Attention Conference Presenters - please review the Speaker Information Page available here.
All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.
A full schedule of Paper Presentations can be found here.
Presenting Authors are are shown in bold:
Date: Sunday, July 13, 10:30 a.m. - 10:55 a.m.
Author(s):
Steven Brenner, University of California, Berkeley, United States
Courtney E. French, University of California, Berkeley, United States
Gang Wei, Fudan University, China
Angela Brooks, Broad Institute of MIT AND Harvard, United States
Thomas Gallagher, Ohio State University, United States
Li Yang, Partner Institute FOR Computational Biology, China
Brenton Graveley, University of Connecticut Health Center, United States
Sharon Amacher, Ohio State University, United States
Steven Brenner, University of California, Berkeley, United States
Session Chair: Bonnie Berger
Nonsense-mediated mRNA decay (NMD) is an RNA surveillance system that degrades isoforms containing a premature termination codon (PTC). NMD coupled with alternative splicing is a mechanism of post-transcriptional gene regulation. The canonical model of defining a PTC in mammals is the 50nt rule: a termination codon more than 50 nucleotides upstream of an exon-exon junction is premature and triggers degradation. There is evidence that this rule holds in Arabidopsis but not in other eukaryotes such as Drosophila. There is also evidence that a longer 3’ UTR triggers NMD in plants, flies, and mammals.
To survey the targets of NMD genome-wide in human, zebrafish, and fly, we performed RNA-Seq analysis on cells where NMD has been inhibited via knockdown of UPF1, a critical NMD protein. We found that thousands of genes produce alternative isoforms degraded by NMD in the three species. We found that the 50nt rule is a strong predictor of NMD degradation in human, and has an effect in zebrafish and in fly. In contrast, we found little correlation between the likelihood of degradation by NMD and 3' UTR length in any of the three species.
Date: Sunday, July 13, 10:30 a.m. - 10:55 a.m.
Author(s):
Wen Wang, University of Minnesota, United States
Gang Fang, Icahn School of Medicine at Mount Sinai, United States
Vanja Paunic, University of Minnesota, United States
Xiaoye Liu, University of Minnesota, United States
Benjamin Oatley, University of Minnesota, United States
Majda Haznadar, University of Minnesota, United States
Michael Steinbach, University of Minnesota, United States
Brian Van Ness, University of Minnesota, United States
Nathan Pankratz, University of Minnesota, United States
Vipin Kumar, University of Minnesota, United States
Chad Myers, University of Minnesota, United States
Session Chair: Jason Ernst
Genetic interactions (epistasis) are important factors in complex diseases that may contribute to unexplained heritability in genome-wide association studies (GWAS). However, existing methods for identifying genetic interactions, which mainly focus on testing individual locus pairs, lack statistical power. We proposed a novel computational approach for discovering disease-specific, pathway-pathway genetic interactions from GWAS data. The key motivation, derived from the extensive analysis of genetic interaction networks in yeast, is that genetic interactions tend to occur between functionally compensatory modules rather than between isolated pairs of genes. We developed a method that explicitly searches for such large structures, guided by established sets of genes belonging to characterized pathways. We applied this approach to a Parkinson's disease (PD) GWAS study and found 50 statistically significant (FDR ?0.25) pathway level interactions, suggesting large genetic interaction structures indeed exist and can be discovered by leveraging structural properties with prior information on pathways. Interestingly, many of the discovered interactions are associated with reduced disease risk while a substantially smaller number are associated with increased disease risk. A significant fraction of them are validated in two independent cohorts. Our study highlights specific insights derived from analysis of the PD interactions and, more broadly, provides a general framework for systematic detection of genetic interactions from GWAS studies.
Date: Sunday, July 13, 11:00 a.m. - 11:25 a.m.
Author(s):
Daniel Himmelstein, University of California, San Francisco, United States
Sergio Baranzini, University of California, United States
Session Chair: Jason Ernst
The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important tasks will be translating this information into a multiscale understanding of pathogenic variants, and increasing the power of existing and future studies through prioritization. We show that heterogeneous network link prediction accomplishes both these tasks. First we constructed a network with 22 node types and 24 edge types from high-throughput publicly-available resources. From this network we extracted features describing the topology between specific genes and diseases. Using a machine learning approach that relies on GWAS-discovered associations for positives, we predicted the probability of association between each protein-coding gene and each of 23 diseases. These predictions achieved a testing AUROC of 0.845 and a 200-fold enrichment in precision at 10% recall. We compared the informativeness of each included network component. The full model outperformed any individual domain, highlighting the benefit of integrative approaches. For multiple sclerosis (MS), we predicted 5 novel susceptibility genes, 4 of which (JAK2, TNFAIP3, REL, RUNX3) achieved Bonferroni validation on a 9,772-case GWAS masked from our analysis. Regions containing two of these genes were uncovered in a recent MS ImmunoChip-based study highlighting our ability to identify the causal gene within a locus.
Date: Sunday, July 13, 3:05 p.m. - 3:30 p.m.
Author(s):
Maxwell Libbrecht, University of Washington, United States
Maxwell Libbrecht, University of Washington, United States
Michael Hoffman, Princess Margaret Cancer Centre, United States
Ferhat Ay, University of Washington, United States
David Gilbert, Florida State University, United States
Jeffrey Bilmes, University of Washington, United States
William Noble, University of Washington, United States
Session Chair: Dana Pe'er
Semi-automated genome annotation algorithms are widely used to summarize functional genomics data (such as ChIP-seq) into human-interpretable form. We present a single solution to two seemingly quite different problems that existing algorithms fail to address: (1) performing genome annotation in multiple cell types and (2) integrating 3D genome architecture data into the annotation. Our solution uses an analytic framework based on the idea of a pairwise prior, which states that we have a prior belief that certain pairs of genomic positions should be more likely to receive the same label in our annotation. We developed a novel convex optimization method, called graph-based regularization (GBR) which admits efficient inference in the presence of a pairwise prior. We applied GBR in both settings mentioned above and, by comparing our annotations to functional genomics experiments not used in training, we demonstrated that GBR improves the quality of the resulting annotations in both cases
Date: Sunday, July 13, 3:35 p.m. - 4:00 p.m.
Author(s):
Jason Ernst, UCLA, United States
Tarjei Mikkelsen, Broad Institute, United States
Manolis Kellis, Massachusetts Institute of Technology, United States
Session Chair: Dana Pe'er
Massively parallel reporter assay designs have been demonstrated that test a large number of regulatory elements or discover specific activating and repressive bases for a small number of regulatory elements, but effectively doing both simultaneously has been a limitation. Here, we overcome this limitation, and present a new Bayesian tiling deconvolution approach, which combines experimental tiling of regulatory regions using 31 sequences of length 145bp at 5bp intervals with computational deconvolution of the resulting signal to infer a nucleotide-level view of regulatory activity across thousands of regulatory regions. By exploiting the multiple overlapping sequences in a probabilistic framework, our method is also robust to noisy or missing measurements, and enables high resolution inferences with a very small number of tested sequences per target region. This enables the de novo discovery of individual binding sites, and inference of their activating or repressive action in a single experiment across thousands of candidate regions. We apply this method in two cell types to more than 15,000 regions in the human genome selected based on chromatin data to provide the first nucleotide-level view of activating and repressive sites across a sizeable fraction of the regulatory human genome.
Date: Sunday, July 13, 4:05 p.m. - 4:30 p.m.
Author(s):
Sourav Bandyopadhyay, University of California, San Francisco, United States
Session Chair: Dana Pe'er
There is an urgent need in oncology to link molecular aberrations in tumors with therapeutics that can be administered in a personalized fashion. One approach identifies synthetic-lethal genetic interactions or emergent dependencies that cancer cells acquire in the presence of specific mutations. Using engineered isogenic cells, we generated an unbiased, quantitative chemical-genetic interaction map that measures the influence of 51 aberrant cancer genes on 90 drug responses. The dataset strongly predicts drug responses found from profiling cancer cell lines, indicating that it accurately models more complex cellular contexts. Applied to triple-negative breast cancer, we interrogate several clinically actionable synthetic lethal interactions with the MYC oncogene, providing new drug and biomarker pairs for clinical investigation. This scalable approach enables the prediction of drug responses from patient data and can be used to accelerate the development of new genotype-directed therapies.
Date: Monday, July 14, 2:10 p.m. - 2:35 p.m.
Author(s):
Hatice Osmanbeyoglu, MSKCC, United States
Hatice Ulku Osmanbeyoglu, MSKCC, United States
Raphael Pelossof, MSKCC, United States
Jacqueline F. Bromberg, MSKCC, United States
Christina Leslie, MSKCC, United States
Session Chair: Sourav Bandyopadhyay
Cancer cells acquire genetic and epigenetic alterations that often lead to dysregulation of oncogenic signal transduction pathways, which in turn alters downstream transcriptional programs. Numerous methods attempt to deduce aberrant signaling pathways in tumors from mRNA data alone, but these pathway analysis approaches remain qualitative and imprecise. Here, we present a statistical method to link upstream signaling to downstream transcriptional response by exploiting reverse phase protein arrays and mRNA expression arrays in The Cancer Genome Atlas breast cancer project. Formally, we use an algorithm called affinity regression to learn an interaction matrix between upstream signal transduction proteins and downstream transcription factors (TFs) that explains target gene expression. The trained model can then predict the TF activity given a tumor sample’s protein expression profile or infer the signaling protein activity given a tumor sample’s gene expression profile. Breast cancers are comprised of molecularly distinct subtypes that respond differently to pathway-targeted therapies. We trained our model on the breast cancer data set and identified subtype-specific and common TF regulators of gene expression. Finally, inferred protein activity predicted clinical outcome within the METABRIC Luminal A cohort, identifying high- and low-risk patient groups within this heterogeneous subtype.
Date: Monday, July 14, 2:40 p.m. - 3:05 p.m.
Author(s):
Sara Gosline, MIT, United States
Sara Gosline, Massachusetts Institute of Technology, United States
Coyin Oh, Massachusetts Institute of Technology, United States
Ernest Fraenkel, Massachusetts Institute of Technology, United States
Session Chair: Sourav Bandyopadhyay
microRNAs (miRNAs) cause changes in gene expression through repression of target mRNA and are highly dysregulated in cancer. However, many effects of mIRNA changes cannot be attributed to direct miRNA-mRNA interactions. As such, we propose an integrative approach that characterizes the effect miRNAs can have on protein-protein interaction networks with the hopes of identifying proteins and pathways that correlate with patient prognosis.
Date: Monday, July 14, 3:10 p.m. - 3:35 p.m.
Author(s):
Kjong-Van Lehmann, Memorial Sloan-Kettering Cancer Hospital, United States
Andre Kahles, Memorial Sloan Kettering Cancer Center, United States
Cyriac Kandoth, Memorial Sloan Kettering Cancer Center, United States
William Lee, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Robert Klein, Memorial Sloan Kettering Cancer Center, United States
Oliver Stegle, EBI, United Kingdom
Gunnar Rätsch, Memorial Sloan-Kettering Cancer Center, United States
Session Chair: Sourav Bandyopadhyay
While population structure can be one of the most severe confounding factors in QTL analysis, tumor samples open up many new additional challenges. Tumor specific somatic mutations and recurrence patterns are known to explain large amounts of the observed transcriptome variation and sample heterogeneity can lead to spurious associations. We have developed a new strategy to perform a common variant association study (CVAS) using mixed models on tumor samples, which enables us to account for tumor specific genotypic and phenotypic heterogeneity as well as population structure. We apply this strategy to investigate the relationship between germline and somatic variants as well as splicing patterns and expression changes in order to discover determinants of transcriptome variation. Due to sample size constraints, many QTL studies have been limited to the analysis of cis-associated variants. We use whole genome, exome and RNA-seq data from the TCGA project to overcome this limitation and discover trans-associated variants as well. A rare variant association study (RVAS) using variants from whole genome and exome sequencing data is being utilized to investigate the basis of rare mutations.
Date: Monday, July 14, 3:40 p.m. - 4:05 p.m.
Author(s):
Matthew Scotch, Arizona State University, United States
Matthew Scotch, Arizona State University, United States
Daniel Magee, Arizona State University, United States
Rachel Beard, Arizona State University, United States
Session Chair: Sourav Bandyopadhyay
Egypt has become an epicenter of highly pathogenic avian influenza H5N1 influenza transmission. Like many viruses, the diffusion of H5N1 is a highly complicated process that depends on a large number of factors, most of which are poorly understood. We adopted a Bayesian phylogeographic GLM as developed by Lemey et al. in which viral diffusion patterns are reconstructed while predictors are simultaneously assessed.
Date: Tuesday, July 15, 10:30 a.m. - 10:55 a.m.
Author(s):
Nilgun Donmez, Simon Fraser University, Canada
Salem Malikic, Simon Fraser University, Canada
Andrew McPherson, British Columbia Cancer Agency, Canada
Nilgun Donmez, Vancouver Prostate Centre, Canada
Cenk Sahinalp, Indiana University, United States
Session Chair: Teresa Przytycka
Most human tumors exhibit a large degree of heterogeneity that is not only apparent in histology but also presents itself in various features such as genomic copy number alterations and structural rearrangements as well as other aberrations. While the origins of the intra-tumor heterogeneity are still debated, research suggests that this diversity is likely to have clinical implications and may be linked to metastatic potential and drug response.
Although the multi-clonal nature is virtually common to most tumor samples, determining the clonal subpopulations is a challenging process. Currently, single-cell sequencing has a prohibitive cost in the scales that would be necessary to representatively sample a tumor tissue. Furthermore, methods such as Fluorescence in Situ Hybridization (FISH) or Silver in Situ Hybridization (SISH) can only assess a small number of probes in individual cells of a tumor sample.
In silico separation of the clonal subpopulations may provide a viable alternative to these aforementioned methods. Despite the importance of clonal diversity and its clinical implications, relatively few computational methods have been developed to date.
To address the problem of accurately determining subclonal frequencies in tumors as well as their evolutionary history, we have developed a novel combinatorial algorithm, named CITUP (Clonality Inference in Tumors Using Phylogeny), that determines subclonal frequencies in tumors as well as their evolutionary history. CITUP has the ability to exploit multiple samples from the same patient to achieve more accurate estimates and works on a variety of point mutations such as small indels and single nucleotide variants, as well as structural alterations. Through an efficient and robust multi-dimensional clustering approach, our method can handle a large number of mutations per patient. In addition to its exact Quadratic Integer Programming (QIP) formulation, CITUP also employs an approximate iterative module which achieves comparable accuracy to the QIP module for faster solutions.
Using extensive simulations where we experiment with a variety of phylogenetic trees with differing number of subclones and model parameters, we evaluated the performance of CITUP and compared it to the performance of other state-of-the-art tools. In these simulations, we used a comprehensive set of evaluation measures ranging from the ability to infer the correct evolutionary trajectory of the tumor to identifying mutational profile and relative abundance of the subclones. These measures show that CITUP consistently outperforms the other tools in estimating the subclonal frequencies and inferring phylogenetic relationships.
Date: Tuesday, July 15, 11:00 a.m. - 11:25 a.m.
Author(s):
Sarah Calvo, Broad Institute, United States
Yang Li, Harvard University, United States
Roee Gutman, Brown University, United States
Jun Liu, Harvard University, United States
Vamsi Mootha, HHMI and Massachusetts General Hospital, United States
Session Chair: Teresa Przytycka
One approach to predict gene function is to identify modules of genes that have been lost together multiple times across evolution. We developed CLIME, a principled “phylogenetic profiling” algorithm that clusters an input gene-set into modules based on shared evolutionary history, and then expands each module with additional genes that likely arose under the inferred model of evolution. CLIME models evolution of the input gene set using a Bayesian mixture of tree-based hidden Markov models (simultaneously learning module number and membership via Markov Chain Monte Carlo sampling for Dirichlet process mixture models). Using data from 138 diverse eukaryotic species, we applied CLIME to 1000 human pathways/complexes as well as to the entire genomes of three model organisms (yeast, malaria parasite, and red alga). These analyses revealed unexpected evolutionary modularity even in well-studied pathways and many novel, co-evolving components.
Date: Tuesday, July 15, 11:30 a.m. - 11:55 p.m.
Author(s):
Yoshinori Fukasawa, University of Tokyo, Japan
Yoshinori Fukasawa, University of Tokyo, Japan
Kenichiro Imai, The National Institute of Advanced Industrial Science and Technology, Japan
Junko Tsuji, University of Tokyo, Japan
Szu-Chin Fu, University of Tokyo, Japan
Kentaro Tomii, The National Institute of Advanced Industrial Science and Technology, Japan
Paul Horton, The National Institute of Advanced Industrial Science and Technology, Japan
Session Chair: Teresa Przytycka
Here, we report MitoFates, a predictor to accelerate the discovery of mitochondrial proteins. In developing MitoFates we introduced novel presequence features: a modified hydrophobic moment, novel motifs and refined PWM for the cleavage site. We combined those with classical features and presented them to an SVM.
According to our benchmarks on a non-redundant test set of proteins, MitoFates achieves significantly higher performance than the well known predictors TargetP, Predotar and MitoProtII.
To investigate the utility of MitoFates, we looked for undiscovered mitochondrial proteins from the human proteome. MitoFates predicts 1231 genes, and 633 of these were annotated as “mitochondria” in neither UniProt nor GO. Interestingly, these include candidate regulators of Parkin translocation to damaged mitochondria, a trigger of degradation of dysfunctional mitochondria. This suggests that careful investigation of other predictions will be helpful in elucidating the functions of mitochondria in health and disease.
Date: Tuesday, July 15, 12:00 p.m. - 12:25 p.m.
Author(s):
Eric Franzosa, Harvard School of Public Health, United States
Eric Franzosa, Harvard School of Public Health, United States
Katherine Huang, The Broad Institute, United States
James Meadow, University of Oregon, United States
Dirk Gevers, The Broad Institute, United States
Katherine Lemon, The Forsyth Insitute, United States
Brendan Bohannan, University of Oregon, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Session Chair: Morris Quaid
Recent large-scale investigations of the human microbiome have revealed great variability in body site-specific microbial community structure across healthy individuals. However, it remains unknown if this variability is sufficient to uniquely identify individuals within a large population, or if it is sufficiently stable to continue uniquely identifying individuals at later times. We investigated these questions by developing a hitting set-based coding algorithm and applying it to individuals from the Human Microbiome Project cohort. Specifically, our approach defined metagenomic fingerprints: sets of microbial taxa or genes that distinguished individuals from a background population, with features prioritized based on predicted stability. Fingerprints based on clade-specific marker genes were able to distinguish almost all individuals. However, at most body sites, these fingerprints uniquely identified their owners in only ~30% of cases when re-assessed after a period of 30-300 days (due to microbial strain loss). The gut microbiome was an exception, as over 80% of its marker gene-based fingerprints remained stable and unique at later times. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work places an upper bound on the identifiability of human-associated microbial communities over mid-to-long time scales, a result with important ethical implications for future microbiome study design.
Date: Tuesday, July 15, 2:00 p.m. - 2:25 p.m.
Author(s):
Stefka Tyanova, Max Planck Institute of Biochemistry, Germany
Kathrine Sylvestersen, The Novo Nordisk Foundation Center for Protein Research, Proteomics, Denmark
Matthias Mann, Max Planck Institute of Biochemistry, Germany
Michael Lund Nielsen, The Novo Nordisk Foundation Center for Protein Research, Proteomics, Denmark
Juergen Cox, Max Planck Institute of Biochemistry, Germany
Session Chair: Morris Quaid
We propose a novel computational approach for efficient integration of genomic, transcriptomic and proteomic data. We investigate the disease phenotypes characterizing the set of recombinant rat strains HXB/BXH, which is of large relevance to metabolic and cardiovascular diseases. We employ proteomic and transcriptomic quantitative measurements of the founding and the recombinant strains in combination with a genetic markers map of the recombinant strains. First, the molecular feature spaces at the proteome and transcriptome levels are orthogonally transformed and components accounting for the variability explaining the phenotype of interest are extracted. This defines a quantitative measure of the disease phenotype along the recombinant strains. To incorporate genetic information, the map of alleles from the recombinant strains is transformed to a numeric matrix assigning 1 if the recombinant region comes from the diseased founding strain and 0 otherwise. Support Vector Machine Regression (SVR) is then used to build a model that can correctly assign the phenotypic association of the strains based on their genetic characteristics. To identify disease-related genetic loci, we tested different feature selection strategies based on mutual information and SVR and measured their performance for various combinations of features. We identified a small number of genetic markers that are strongly associated with the disease phenotypes.
Date: Tuesday, July 15, 3:00 p.m. - 3:25 p.m.
Author(s):
Ariel Feiglin, Bar-Ilan University, Israel
Ariel Feiglin, Bar-Ilan University, Israel
Olga Leiderman, Bar-Ilan University, Israel
Ron Unger, Bar-Ilan University, Israel
Yanay Ofran, Bar-Ilan University, Israel
Session Chair: Morris Quaid
Predicting whether a given protein and drug interact, is an important yet greatly unresolved goal. We introduce a fast and computationally inexpensive approach for determining whether proteins and drugs bind each other. This is accomplished by training a machine learning algorithm to differentiate between docking results of real protein-drug pairs and docking results of pairs that do not interact. The features used for training include structural and biophysical features of specific poses. However, the “secret ingredient”, is the use of features derived from the distribution of the docking scores across all proposed binding modes for a given protein-drug pair. We used this approach to identify real protein-drug interactions from a pool of 488 real complexes and 194,770 presumably false ones with precision of 0.6 (i.e. 60% of the predicted interactions were true) at a recall of 0.2. This is >500 fold better than random and >30 fold better than the precision that would be obtained by using only the docking score of the best pose. Applying this method to a large dataset of proteins and FDA approved drugs, we identified novel protein-drug interactions and validated them experimentally. We also show that our predicted interactions are significantly enriched in a large dataset of known protein-drug interactions.
Date: Tuesday, July 15, 3:30 p.m. - 3:55 p.m.
Author(s):
GAURAV CHOPRA, University of California, San Francisco, United States
Ram Samudrala, University of Washington, United States
Session Chair: Morris Quaid
Date: Sunday, July 13, 10:30 am - 10:55 am
Author(s):
Mark Leiserson, Brown University, United States
Dima Blokh, Tel-Aviv University, Israel
Roded Sharan, Tel-Aviv University, Israel
Benjamin Raphael, Brown University, United States
Session Chair: Terry Gaasterland
Date: Sunday, July 13, 10:30 am - 10:55 am
Author(s):
Son Pham, University of California, San Diego, United States
Mikhail Kolmogorov, Saint-Petersburg Academic University, Russia
Benedict Paten, University of California, Santa Cruz, United States
Brian Raney, University of California, Santa Cruz, United States
Session Chair: Serafim Batzoglou
yet assembling the former from the data currently generated by
high-throughput short read sequencing machines still results in
hundreds of contigs. To improve assembly quality, recent studies
have utilized longer Pacific Biosciences (PacBio) reads or jumping
libraries to connect contigs into larger scaffolds or help assemblers
resolve ambiguities in repetitive regions of the genome. However,
their popularity in contemporary genomic research is still limited
by high cost and error rates.
In this work, we explore the
possibility of improving assemblies by using complete genomes
from closely related species/strains. We present Ragout, a genome
rearrangement approach, to address this problem. In contrast with
most reference-guided algorithms, where only one reference genome
is used, Ragout uses multiple references along with the evolutionary
relationship among these references in order to determine the correct
order of the contigs. Additionally, Ragout uses the assembly graph
and multi-scale synteny blocks to reduce assembly gaps caused
by small contigs from the input assembly. In simulations as well
as real datasets, we believe that for common bacterial species,
where many complete genome sequences from related strains have
been available, the current high-throughput short read sequencing
paradigm is sufficient to obtain a single high-quality scaffold for
each chromosome. The Ragout software is freely available at:
https://github.com/fenderglass/Ragout.
Date: Sunday, July 13, 11:00 a.m. - 11:25 a.m.
Author(s):
Ashutosh Malhotra, Fraunhofer institute for algorithms and scientific computing, Germany
Martin Hofmann-Apitius, Fraunhofer institute for algorithms and scientific computing, Germany
Erfan Younesi, Fraunhofer institute for algorithms and scientific computing, Germany
Session Chair: Terry Gaasterland
Date: Sunday, July 13, 11:00 am - 11:25 am
Author(s):
Ergude Bao, University of California, Riverside, United States
Tao Jiang, University of California, Riverside, United States
Thomas Girke, University of California, Riverside, United States
Session Chair: Serafim Batzoglou
Here we introduce AlignGraph, an algorithm for extending and joining de novo assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and pre-assembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the paired-end multi-positional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7-62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9-94.5% and 80.3-165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references.
Date: Sunday, July 13, 11:00 am - 11:25 am
Author(s):
Christoph Bernau, Leibniz Supercomputing Center, Germany
Markus Riester, Harvard School of Public Health, United States
Anne-Laure Boulesteix, LMU Munich, Germany
Giovanni Parmigiani, Dana-Farber Cancer Institute, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Levi Waldron, City University of New York School of Public Health, United States
Lorenzo Trippa, Dana-Farber Cancer Institute,, United States
Session Chair: Bonnie Berger
in high-dimensional settings have been developed in the statistical
and machine learning literature. Learning algorithms and the
prediction models they generate are typically evaluated on the basis
of cross-validation error estimates in a few examplary datasets.
However, in most applications, the ultimate goal of prediction
modeling is to provide accurate predictions for independent samples
processed in different laboratories, and cross-validation within
examplary datasets may not adequately reflect performance in this
context.
Methods: Systematic cross-study validation is performed in
simulations and in a collection of eight estrogen-receptor positive
breast cancer microarray gene expression datasets, with the objective
of predicting distant metastasis-free survival (DMFS). An evaluation
statistic, in this paper the C-index, is computed for all pairwise
combinations of training and validation datasets. We evaluate several
alternatives for summarizing the pairwise validation statistics, and
compare these to conventional cross-validation.
Results: We develop a systematic approach to “cross-study
validation” to replace or supplement conventional cross-validation for
evaluation of high-dimensional prediction models when independent
datasets are available. In data-driven simulations and in our
application to survival prediction with eight breast cancer microarray
datasets, standard cross-validation suggests inflated discrimination
accuracy for all competing algorithms when compared to cross-study
validation. Furthermore, the ranking of learning algorithms differs,
suggesting that algorithms performing best in cross-validation may
be suboptimal when evaluated through independent validation.
Availability: The survHD: Survival in High Dimensions package
(http://www.bitbucket.org/lwaldron/survhd) will be made available
through Bioconductor.
Date: Sunday, July 13, 11:30 am - 11:55 pm
Author(s):
David Amar, Tel Aviv University, Israel
Session Chair: Terry Gaasterland
Date: Sunday, July 13, 11:30 am - 11:55 pm
Author(s):
Andrey D. Prjibelski, St. Petersburg Academic University, Russian Federation
Irina Vasilinetc, St. Petersburg Academic University, Russia
Anton Bankevich, St. Petersburg Academic University, Russia
Alexey Gurevich, St. Petersburg Academic University, Russia
Tatiana Krivosheeva, St. Petersburg Academic University, Russia
Sergey Nurk, St. Petersburg Academic University, Russia
Son Pham, University of California, San Diego, United States
Anton Korobeynikov, St. Petersburg Academic University, Russia
Alla Lapidus, St. Petersburg Academic University, Russia
Pavel Pevzner, University of California, San Diego, United States
Session Chair: Serafim Batzoglou
Date: Sunday, July 13, 12:00 pm - 12:25 pm
Author(s):
Andrei Todor, University of Florida, United States
Haitham Gabr, University of Florida, United States
Alin Dobra, University of Florida, United States
Tamer Kahveci, University of Florida, United States
Session Chair: Terry Gaasterland
transcription of genes. Understanding how gene regulation is
affected by such aberrations is of utmost importance. One promising
strategy towards this objective is to compute whether signals can
reach to the transcription factors through the transcription
regulatory network. Due to the uncertainty of the regulatory
interactions, this is a #P-complete problem and thus solving it for
very large transcription regulatory networks remains to be a
challenge.
Results: We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can
arrive at any given set of target genes (i.e., transcription
factors) when the topology of the underlying signaling network is
uncertain. Our method tackles this problem for large networks while
providing a provably accurate result. Our method follows a
divide-and-conquer strategy. We break down the given network into a
sequence of non-overlapping subnetworks such that reachability can
be computed autonomously and sequentially on each subnetwork. We
represent each interaction using a small polynomial. The product of
these polynomials express different scenarios when a signal can or
cannot reach to target genes from the source genes. We introduce
polynomial collapsing operators for each subnetwork. These operators
reduce the size of the resulting polynomial and thus the
computational complexity dramatically. We show that our method
scales to entire human regulatory networks in only seconds, while
the existing methods fail beyond a few tens of genes and
interactions. We demonstrate that our method can successfully
characterize key reachability characteristics of the entire
transcriptions regulatory networks of patients affected by eight
different subtypes of leukemia, as well as those from healthy
control samples.
Date: Sunday, July 13, 12:00 pm - 12:25 pm
Author(s):
Adam Phillippy, National Biodefense Analysis and Countermeasures Center, United States
Sergey Koren, National Biodefense Analysis and Countermeasures Center, United States
Gregory Harhay, U.S. Department of Agriculture, United States
Timothy Smith, U.S. Department of Agriculture, United States
James Bono, U.S. Department of Agriculture, United States
Dayna Harhay, U.S. Department of Agriculture, United States
Scott McVey, U.S. Department of Agriculture, United States
Diana Radune, National Biodefense Analysis and Countermeasures Center, United States
Nicholas Bergman, National Biodefense Analysis and Countermeasures Center, United States
Session Chair: Serafim Batzoglou
Date: Sunday, July 13, 3:05 pm - 3:30 pm
Author(s):
Richard Leslie, University of Massachusetts Medical School, United States
Christopher O'Donnell, National Institute's of Health, United States
Andrew Johnson, National Institute's of Health, United States
Session Chair: Fran Lewitter
Cardiovascular disease and related risk factors predominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR=0.18[0.16-0.20]) and Y (OR=0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P<0.05) levels. We show this gene length correlation decays at increasingly more stringent P-value thresholds. Potential pleiotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. CONCLUSION. Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new insights. The GRASP database is available at http://apps.nhlbi.nih.gov/grasp.
Date: Sunday, July 13, 3:05 pm - 3:30 pm
Author(s):
Mikael Boden, The University of Queensland, Australia
Minh Cao, The University of Queensland, Au
Edward Tasker, Monash University, Au
Sailaja Vishwanathan, Monash University, Au
Sridevi Sureshkumar, Monash University, Au
Sureshkumar Balasubramanian, Monash University, Au
Kai Willadsen, The University of Queensland, Au
Michael Imelfort, The University of Queensland, Au
Session Chair: Cenk Sahinalp
Date: Sunday, July 13, 3:05 pm - 3:30 pm
Author(s):
Inbal Sela-Culang, Bar-Ilan University, Israel
Yanay Ofan, Bar Ilan University, Israel
Vered Kunik, Bar Ilan University, Israel
Anat Burkovitz, Bar Ilan University, Israel
Guy Nimrod, Bar Ilan University, Israel
Mohammed Rafii-El-Idrissi Benhnia, La Jolla Institute for Allergy and Immunology, United States
Michael H. Matho, La Jolla Institute for Allergy and Immunology, United States
Thomas Kaever, La Jolla Institute for Allergy and Immunology, United States
Matt Maybeno, La Jolla Institute for Allergy and Immunology, United States
Andrew Schlossman, La Jolla Institute for Allergy and Immunology, United States
Dirk Zajonc, La Jolla Institute for Allergy and Immunology, United States
Shane Crotty, La Jolla Institute for Allergy and Immunology, United States
Bjoern Peters, La Jolla Institute for Allergy and Immunology, United States
Sheng Li, University of Texas Health Science Center, United States
Yan Xiang, University of Texas Health Science Center, United States
Session Chair: Toni Kazic
We will review a series of studies that revealed key mechanisms that enable Abs to perform these tasks. We will present a novel prediction approach that utilizes these findings, combined with simple competition assays, to predict where on an Ag a given Ab will bind. The accuracy of these predictions is verified experimentally using crystallography and other methods. To conclude, we will bring more examples, and discuss the power of combining sophisticated predictions with simple experiments.
Date: Sunday, July 13, 3:35 pm - 4:00 pm
Author(s):
Levi Waldron, City University of New York, United States
Benjamin Haibe-Kains, Princess Margaret Cancer Centre, Ca
Aedin Culhane, Dana-Farber Cancer Institute, Us
Markus Riester, Dana-Farber Cancer Institute, Us
Thomas Risch, Dana-Farber Cancer Institute, Us
Svitlana Tyekucheva, Dana-Farber Cancer Institute, Us
Ina Jazic, Dana-Farber Cancer Institute, Us
Xin Victoria Wang, Dana-Farber Cancer Institute, Us
Mahnaz Ahmadifar, Dana-Farber Cancer Institute, Us
Benjamin Frederick Ganzfried, Dana-Farber Cancer Institute, Us
Giovanni Parmigiani, Dana-Farber Cancer Institute, Us
Curtis Huttenhower, Havard School of Public Health, Us
Michael Birer, Massachusetts General Hospital, Us
Christoph Bernau, LMU Munich, De
Session Chair: Fran Lewitter
Date: Sunday, July 13, 3:35 pm - 4:00 pm
Author(s):
Fabrizio Costa, University of Freiburg, Germany
Dominic Rose, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany
Pavankumar Videm, University of Freiburg, Germany
Session Chair: Cenk Sahinalp
RNA splicing, translation, gene regulation. However the vast
majority of ncRNAs still have no functional annotation. One prominent
approach for putative function assignment is clustering of
transcripts according to sequence and secondary structure. However
sequence information is changed by post-transcriptional
modifications, and secondary structure is only a proxy for the true
three dimensional conformation of the RNA polymer. A different type
of information that does not suffer from these issues and that can
be used for the detection of RNA classes, is the pattern of
processing and its traces in small RNA-seq reads data.
Here we introduce BlockClust, an efficient approach to detect
transcripts with similar processing patterns. We propose a novel way
to encode expression profiles in compact discrete structures, which
can then be processed using fast graph kernel techniques. We perform
both unsupervised clustering and develop family specific
discriminative models; finally we show how the proposed approach is
both scalable, accurate and robust across different organisms,
tissues and cell lines.
Date: Sunday, July 13, 3:35 pm - 4:00 pm
Author(s):
Jing Ren, University of Technology Sydney, Australia
Qian Liu, University of Technology Sydney, Australia
John Ellis, University of Technology Sydney, Australia
Jinyan Li, University of Technology Sydney, Australia
Session Chair: Toni Kazic
Date: Sunday, July 13, 4:05 pm - 4:30 pm
Author(s):
Teresa Przytycka, National Institutes of Health, United States
DongYeon Cho, NIH, Us
Session Chair: Fran Lewitter
Pathway-centric approaches have emerged as methods that can empower studies of cancer heterogeneity. I will describe two approaches we have recently developed. First, combining the utility of algorithmic techniques with the power of network-centric approaches, we designed a novel approach that allows unsupervised detection of subnetworks that are dysregulated in a subgroup of patients. The second, complementary approach, builds in topic modeling and utilizes a mixture model. Our model is based on two components (i) a measure of phenotypic similarity between the patients (ii) a list of features - possible disease causes such as mutations, copy number variations. This works complements the appreciation of cancer diversity wight the ability to represent it.
Date: Sunday, July 13, 4:05 pm - 4:30 pm
Author(s):
Roy Straver, VU University Medical Center Amsterdam, Netherlands
Erik Sistermans, VU University Medical Center Amsterdam, Nl
Henne Holstege, VU University Medical Center Amsterdam, Nl
Daphne van Beek, VU University Medical Center Amsterdam, Nl
Allerdien Visser, VU University Medical Center Amsterdam, Nl
Cees Oudejans, VU University Medical Center Amsterdam, Nl
Marcel Reinders, Delft University of Technology, Nl
Session Chair: Cenk Sahinalp
Date: Sunday, July 13, 4:05 pm - 4:30 pm
Author(s):
Yichao Zhou, Tsinghua University, China
Wei Xu, Tsinghua University, China
Bruce R. Donald, Duke University, United States
Jianyang Zeng, Tsinghua University, China
Session Chair: Toni Kazic
important topic in protein engineering. Under the assumption of a rigid
backbone and a finite set of discrete conformations for side-chains,
various methods have been proposed to address this problem. A
popular method is to combine the Dead-End Elimination (DEE) and
A* tree search algorithms, which provably finds the Global Minimum
Energy Conformation (GMEC) solution.
Results: In this paper, we improve the efficiency of computing A*
heuristic functions for protein design and also propose a variant of A*
algorithm in which the search process can be performed on GPUs in
a massively parallel fashion . In addition, we made some efforts to
address the memory exceeding problems in A* search. As a result,
our enhancements can achieve a significant speedup of the original
A* search for protein design in four orders of magnitude on big scale
test data, while still maintaining an acceptable memory overhead. Our
parallel A* search algorithm can be combined with iMinDEE, a recent
DEE criterion for rotamer pruning to further improve structure-based
computational protein design with the consideration of side-chain
flexibility.
Date: Sunday, July 13, 4:35 pm - 5:00 pm
Author(s):
Nagarajan Natarajan, University of Texas at Austin, United States
Inderjit Dhillon, University of Texas at Austin, United States
Session Chair: Fran Lewitter
Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better - it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed method (second best) that has less than 15\% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e., genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature curated by Bornigen.
Availability: Source code and datasets at http://www.cs.utexas edu/~naga86/research/IMC
Date: Sunday, July 13, 4:35 pm - 5:00 pm
Author(s):
Ladislav Rampášek, University of Toronto, Canada
Aryan Arbabi, University of Toronto, Canada
Michael Brudno, University of Toronto, Canada
Session Chair: Cenk Sahinalp
methodologies to identify genomic variation within a fetus through the
non-invasive sequencing of maternal blood plasma. These methods
are based on the observation that maternal plasma contains a
fraction of DNA (typically 5-15%) originating from the fetus, and such
methodologies have already been used for the detection of whole-
chromosome events (aneuploidies), and to a more limited extent for
smaller (typically several megabases long) Copy Number Variants
(CNVs).
Results: Here we present a probabilistic method for non-invasive
analysis of de novo CNVs in fetal genome based on maternal plasma
sequencing. Our novel method combines three types of information
within a unified Hidden Markov Model: the imbalance of allelic ratios
at SNP positions, the use of parental genotypes to phase nearby
SNPs, and depth of coverage to better differentiate between various
types of CNVs and improve precision. Our simulation results, based
on in silico introduction of novel CNVs into plasma samples with 13%
fetal DNA concentration, demonstrate a sensitivity of 90% for CNVs
>400 kilobases (with 13 calls in an unaffected genome), and 40% for
50-400kb CNVs (with 108 calls in an unaffected genome).
Availability: Implementation of our model and data simulation
method is available at http://github.com/compbio-UofT/fCNV
Date: Sunday, July 13, 4:35 pm - 5:00 pm
Author(s):
Donald C. Comeau, U.S. National Library of Medicine, United States
Rezarta Islamaj Doğan, U.S. National Library of Medicine, United States
Paolo Ciccarese, Harvard University, United States
Kevin Bretonnel Cohen, University of Colorado, United States
Martin Krallinger, Spanish National Cancer Research Centre, Spain
Florian Leitner, Spanish National Cancer Research Centre, Spain
Zhiyong Lu, U.S. National Library of Medicine, United States
Yifan Peng, University of Delaware, United States
Fabio Rinaldi, University of Zurich, Ch
Manabu Torii, University of Delaware, United States
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Karin Verspoor, The University of Melbourne, Australia
Thomas C. Wiegers, North Carolina State University, United States
Cathy H. Wu, University of Delaware, United States
W. John Wilbur, U.S. National Library of Medicine, United States
Session Chair: Toni Kazic
Date: Monday, July 14, 10:30 am - 10:55 am
Author(s):
Kristoffer Forslund, European Molecular Biology Laboratory, Germany
Shinichi Sunagawa, EMBL, Germany
Jens Roat Kultima, EMBL, Germany
Daniel Mende, EMBL, Germany
Manimozhiyan Arumugam, Copenhagen University, Germany
Athanasios Typas, EMBL, Germany
Peer Bork, EMBL, Germany
Session Chair: Janet Kelso
Date: Monday, July 14, 10:30 am - 10:55 am
Author(s):
Zhaojun Zhang, UNC - Chapel Hill, United States
Wei Wang, University of California, Los Angeles, United States
Session Chair: Bernard Moret
Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity and introduces the notion of sig-mers that are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable to any state of the art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses less than 4% of the k-mers and less than 10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in less than 10 minutes per sample by using just a single thread on a commodity computer, which represents more than 100 speedup over the state of the art alignment-based methods, while delivering comparable or higher accuracy.
Availability:
The software is available at http://www.csbio.unc.edu/rs
Date: Monday, July 14, 10:30 am - 10:55 am
Author(s):
Huibin Shen, Aalto University, Finland
Kai Dührkop, Friedrich-Schiller-University Jena, Germany
Sebastian Böcker, Friedrich-Schiller-University Jena, Germany
Juho Rousu, Aalto University, Finland
Session Chair: Yanay Ofran
identification of metabolites from tandem mass spectra. Fragmentation tree
methods explore the space of possible ways the metabolite can fragment, and
base the metabolite identification on scoring of these fragmentation trees.
Machine learning methods has been used to map mass spectra to molecular
fingerprints; predicted fingerprints, in turn, can be used to score candidate
molecular structures.
Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures.
We introduce a family of kernels capturing the similarity of fragmentation
trees, and combine these kernels using recently proposed multiple kernel
learning approaches. Experiments on two large reference datasets show that
the new methods significantly improve molecular fingerprint prediction
accuracy. These improvements result in better metabolite identification,
doubling the number of metabolites placed ranked at the top position of the
candidate list.
Date: Monday, July 14, 10:30 am - 10:55 am
Author(s):
Tarmo Äijö, Aalto University, Finland
Vincent Butty, Massachusetts Institute of Technology, United States
Zhi Chen, University of Turku, Finland
Verna Salo, University of Turku, Finland
Subhash Tripathi, University of Turku / Åbo Akademi University , Finland
Christopher Burge, Massachusetts Institute of Technology, United States
Riitta Lahesmaa, University of Turku, Finland
Harri Lähdesmäki, Aalto University,, Finland
Session Chair: Robert F. Murphy
Results: In this study, we use RNA-seq to measure gene expression during the early human T helper 17 (Th17) cell differentiation and T cell activation (Th0). To quantify Th17 specific gene expression dynamics, we present a novel statistical methodology, DyNB, for analyzing time-course RNA-seq data. We use non- parametric Gaussian process to model temporal correlation in gene expression and combine that with negative binomial likelihood for the count data. To account for experiment specific biases in gene expression dynamics, such as differences in cell differentiation efficiencies, we propose a method to rescale the dynamics between replicated measurements. We develop an MCMC sampling method to make inference of differential expression dynamics between conditions. DyNB identifies several known and novel genes involved in Th17 differentiation. Analysis of differentiation efficiencies revealed consistent patterns in gene expression dynamics between different cultures. We use qRT-PCR to validate differential expression and differentiation efficiencies for selected genes. Comparison of the results with those obtained via traditional time point wise analysis shows that time-course analysis together with time rescaling between cultures identifies differentially expressed genes which would not otherwise be detected.
Availability: An implementation of the proposed computational methods will be available at http://research.ics.aalto.fi/csb/software/
Date: Monday, July 14, 11:00 am - 11:25 am
Author(s):
Eric Franzosa, Harvard School of Public Health, United States
Xochitl Morgan, Harvard School of Public Health, United States
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Joshua Reyes, Harvard School of Public Health, United States
Ashlee Earl, The Broad Institute, United States
Georgia Giannoukos, The Broad Institute, United States
Dawn Ciulla, The Broad Institute, United States
Dirk Gevers, The Broad Institute, United States
Matthew Boylan, Division of Gastroenterology, United States
Andrew Chan, Division of Gastroenterology, United States
Jacques Izard, Department of Microbiology, United States
Wendy Garrett, Department of Immunology and Infectious Diseases, United States
Session Chair: Janet Kelso
Date: Monday, July 14, 11:00 am - 11:25 am
Author(s):
David Kreil, Boku University Vienna, Austria
Session Chair: Bernard Moret
Date: Monday, July 14, 11:00 am - 11:25 am
Author(s):
Pankaj Agarwal, GSK, United States
Philippe Sanseau, GSK, United States
Mark Hurle, GSK, United States
Session Chair: Yanay Ofran
Date: Monday, July 14, 11:00 am - 11:25 am
Author(s):
Nicholas Gauthier, Memorial Sloan Kettering Cancer Center, United States
Boumediene Soufi, University of Tübingen, Germany
William Walkowicz, Memorial Sloan-Kettering Cancer Center, United States
Virginia Pedicord, Memorial Sloan-Kettering Cancer Center, United States
Konstantinos Mavrakis, Memorial Sloan-Kettering Cancer Center, United States
Boris Macek, University of Tübingen, Germany
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Session Chair: Robert F. Murphy
Date: Monday, July 14, 11:30 am - 11:55 pm
Author(s):
Anke Penzlin, Robert Koch Institute, Germany
Martin S. Lindner, Robert Koch Institute, Germany
Joerg Doellinger, Robert Koch Institute, Germany
Piotr Wojtek Dabrowski, Robert Koch Institute, Germany
Nitsche Andreas, Robert Koch Institute, Germany
Bernhard Y. Renard, Robert Koch Institute, Germany
Session Chair: Janet Kelso
Results: We introduce Pipasic (peptide intensity-weighted proteome abundance similarity correction) as a tool which corrects identification and spectral counting based quantification results using peptide similarity estimation and expression level weighting within a non-negative lasso framework. Pipasic has distinct advantages over approaches only regarding unique peptides or aggregating results to the lowest common ancestor, as demonstrated on examples of viral diagnostics and an acid mine drainage dataset.
Date: Monday, July 14, 11:30 am - 11:55 pm
Author(s):
Michael Leung, University of Toronto, Canada
Hui Xiong, University of Toronto, Canada
Leo Lee, University of Toronto, Canada
Brendan Frey, University of Toronto, Canada
Session Chair: Bernard Moret
Methods: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters.
Results: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting alternative splicing patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.
Date: Monday, July 14, 11:30 am - 11:55 pm
Author(s):
Lei Huang, Peking University, China
Fuhai Li, Houston Methodist Hospital Research Institute , United States
Jianting Sheng, Houston Methodist Hospital Research Institute, United States
Xiaofeng Xia, Houston Methodist Hospital Research Institute, United States
Jinwen Ma, Peking University
Ming Zhan, Houston Methodist Hospital Research Institute, United States
Stephen Wong, Houston Methodist Hospital Research Institute, United States
Session Chair: Yanay Ofran
Results: In this study, we propose a novel systematic computational tool DrugComboRanker to prioritize synergistic drug combinations and uncover their mechanisms of action. We first build a drug functional network based on their genomic profiles, and partition the network into numerous drug network communities by using a Bayesian non-negative matrix factorization approach. As drugs within overlapping community share common mechanisms of action, we next uncover potential targets of drugs by applying a recommendation system on drug communities. We meanwhile build disease-specific signaling networks based on patients’ genomic profiles and interactome data. We then identify drug combinations by searching drugs whose targets are enriched in the complementary signaling modules of the disease signaling network. The novel method was evaluated on lung adenocarcinoma and endocrine receptor (ER) positive breast cancer, and compared with other drug combination approaches. These case studies discovered a set of effective drug combinations top ranked in our prediction list, and mapped the drug targets on the disease signaling network to highlight the mechanisms of action of the drug combinations.
Date: Monday, July 14, 11:30 am - 11:55 pm
Author(s):
Terumasa Tokunaga, The Institute of Statistical Mathematics, Japan
Osamu Hirose, Kanazawa University, Japan
Shotaro Kawaguchi, Kanazawa University, Japan
Yu Toyoshima, The University of Tokyo, Japan
Takayuki Teramoto, Kyushu University, Japan
Hisaki Ikebata, Graduate University of Advanced Studies, Japan
Sayuri Kuge, Kyushu University, Japan
Takeshi Ishihara, Kyushu University, Japan
Yuichi Iino, The University of Tokyo, Japan
Ryo Yoshida, The Institute of Statistical Mathematics, Japan
Session Chair: Robert F. Murphy
Automated fluorescence microscopes produce massive amounts of images
observing cells often in four dimensions of space and time. This study
addresses two tasks of time-lapse imaging analyses; detection and
tracking of many imaged cells, especially intended for 4D live-cell
imaging of neuronal nuclei of C. elegans. Cells of interest appear
in little more generic forms than ellipsoids. They distribute densely
and move rapidly in a series of 3D images. In such cases, existing
tracking methods often fail due to that, for instance, many trackers
transit from one to the other of different objects during rapid moves.
Results:
The present method starts from converting each 3D image to a smooth
continuous function by performing the kernel density estimation. Cell
bodies in an image are assumed to lie in regions around multiple local
maxima of the density function. Then, the tasks of detecting and
tracking cells are addressed with two hill-climbing algorithms that we
derive. By applying the cell detection method to an image at the first
frame, the positions of trackers are initialized. The tracking algorithm
keeps attracting them to around local maxima changing over time in a
subsequent image sequence. To prevent the trackers from turnovers and
coalescences, we employ Markov random fields (MRFs) to model spatial and
temporal covariation of cells, and maximize the image forces and the MRF-
induced constraints on transitions of the trackers. The tracking
procedure is demonstrated with dynamic 3D images containing more than
one hundred neurons of C. elegans.
Date: Monday, July 14, 12:00 pm - 12:25 pm
Author(s):
Zia Khan, University of Maryland, United States
Michael Ford, MS Bioworks, LLC, United States
Darren Cusanovich, University of Chicago, United States
Amy Mitrano, University of Chicago, United States
Jonathan Prichard, Stanford University, United States
Yoav Gilad, University of Chicago, United States
Session Chair: Janet Kelso
Date: Monday, July 14, 12:00 pm - 12:25 pm
Author(s):
Yuanfang Guan, University of Michigan, United States
Hongdong Li, University of Michigan, United States
Rajasree Menon, University of Michigan, United States
Yuchen Wen, University of Michigan, United States
Gilbert S. Omenn, University of Michigan, United States
Matthias Kretzler, University of Michigan, United States
Yuanfang Guan, University of Michigan, United States
Session Chair: Bernard Moret
Date: Monday, July 14, 12:00 pm - 12:25 pm
Author(s):
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Evan Molinelli, Memorial Sloan Kettering Cancer Center, United States
Jayasree Nair, Memorial Sloan Kettering Cancer Center, United States
Tahir Sheikh, Memorial Sloan Kettering Cancer Center, United States
Rita Samy, Memorial Sloan Kettering Cancer Center, United States
Xiaohong Jing, Memorial Sloan Kettering Cancer Center, United States
Qin He, Memorial Sloan Kettering Cancer Center, United States
Anil Korkut, Memorial Sloan Kettering Cancer Center, United States
Aimee Crago, Memorial Sloan Kettering Cancer Center, United States
Samuel Singer, Memorial Sloan Kettering Cancer Center, United States
Gary Schwartz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Session Chair: Yanay Ofran
Date: Monday, July 14, 12:00 pm - 12:25 pm
Author(s):
Emmanuel Barillot, Institut Cuire, France
Inna Kuperstein, Institut Cuire, France
David Cohen, Institut Cuire, Fr
Stuart Pook, Institut Cuire, Fr
Eric Viara, Institut Cuire, Fr
Laurence Calzone, Institut Cuire, Fr
Emmanuel Barillot, Institut Cuire, Fr
Andrei Zinovyev, Institut Cuire, Fr
Session Chair: Robert F. Murphy
Date: Monday, July 14, 2:10 pm - 2:35 pm
Author(s):
Armaghan Naik, Carnegie Mellon University, United States
Joshua Kangas, Carnegie Mellon, United States
Christopher Langmead, Carnegie Mellon, United States
Session Chair: Reinhard Schneider
Date: Monday, July 14, 2:10 pm - 2:35 pm
Author(s):
Harmen Bussemaker, Columbia University, United States
Allan Lazarovici, Columbia University, Us
Tianyin Zhou, University of Southern California, Us
Anthony Shafer, University of Washington, Us
Ana Carolina Dantas Machado, University of Southern California, Us
Richard Sandstrom, University of Washington, Us
Peter Sabo, University of Washington, Us
Yan Lu, University of Southern California, Us
Remo Rohs, University of Southern California, Us
John Stamatoyannopoulos, University of Washington, Us
Session Chair: Michal Linial
Date: Monday, July 14, 2:10 pm - 2:35 pm
Author(s):
Menachem Fromer, Icahn School of Medicine at Mount Sinai, United States
Shaun Purcell, Icahn School of Medicine at Mount Sinai, United States
Session Chair: Dietlind Gerloff
Date: Monday, July 14, 2:10 pm - 2:35 pm
Author(s):
Ran Libeskind-Hadas, Harvey Mudd College, United States
Yi-Chieh Wu, Massachusetts Institute of Technology, United States
Mukul S. Bansal, University of Connecticut, United States
Manolis Kellis, Massachusetts Institute of Technology, United States
Session Chair: Russell Schwartz
Results: We address this problem by giving an efficient algorithm for computing Pareto-optimal sets of reconciliations, thus providing the first systematic method for understanding the relationship between event costs and reconciliations. This, in turn, results in new techniques for computing event support values and, for cophylogenetic analyses, performing robust statistical tests. We provide new software tools and demonstrate their use on a number of datasets from evolutionary genomic and cophylogenetic studies.
Availability: Our Python tools are freely available at www.cs.hmc.edu/~hadas/xscape
Contact: mukul@engr.uconn.edu
Date: Monday, July 14, 2:40 pm - 3:05 pm
Author(s):
Wen-Yu Chung, National Kaohsiung University of Applied Sciences, Taiwan
Robert Schmitz, The Salk Institute for Biological Studies, United States
Tanya Biorac, Life Technologies Corp.-Ion Torrent, United States
Delia Ye, Life Technologies Corp.-Ion Torrent, United States
Miroslav Dudas, Life Technologies Corp.-Ion Torrent, United States
Gavin Meredith, Life Technologies Corp.-Ion Torrent, United States
Christopher Adams, Life Technologies Corp.-Ion Torrent, United States
Joseph Ecker, The Salk Institute for Biological Studies, United States
Michael Zhang, University of Texas at Dallas, United States
Session Chair: Michal Linial
Date: Monday, July 14, 2:40 pm - 3:05 pm
Author(s):
Steven Brenner, University of California, Berkeley, United States
Session Chair: Dietlind Gerloff
An extended discussion of the genome leaks issues may be found at http://compbio.berkeley.edu/proj/leak/
Date: Monday, JUly 14, 2:40 pm - 3:05 pm
Author(s):
Yuval Tabach, Massachusetts General Hospital, United States
Gary Ruvkun, Massachusetts General Hospital, United States
Carmit Levy, Tel Aviv University, United States
Session Chair: Russell Schwartz
Date: Monday, July 14, 3:10 pm - 3:35 pm
Author(s):
Marinka Zitnik, University of Ljubljana, Slovenia
Blaz Zupan, University of Ljubljana, Slovenia
Session Chair: Reinhard Schneider
Results: We here propose a conceptually new probabilistic approach to gene network inference from quantitative interaction data. The approach is founded on epistasis analysis. Its features are joint treatment of the mutant phenotype data with a factorized model and probabilistic scoring of pairwise gene relationships that are inferred from the latent gene representation. The resulting gene network is assembled from scored pairwise relationships. In an experimental study, we show that the proposed approach can accurately reconstruct several known pathways and that it surpasses the accuracy of current approaches.
Date: Monday, July 14, 3:10 pm - 3:35 pm
Author(s):
Erez Levanon, Bar-Ilan University, Israel
Yishay Pinto, Bar-Ilan University, Israel
Haim Cohen, Bar-Ilan University, Israel
Lily Bazk, Bar-Ilan University, Israel
Ami Haviv, Bar-Ilan University, Israel
Michal Barak, Bar-Ilan University, Israel
Jasmine Jacob-Hirsch, Bar-Ilan University, Israel
Patricia Deng, Stanford University, United States
Rui Zhang, Stanford University, United States
Jin Billy Li, Stanford University, United States
Gidi Rechavi, Chaim Sheba Medical Center, Israel
Session Chair: Michal Linial
Date: Monday, July 14, 3:10 pm - 3:35 pm
Author(s):
Farhad Hormozdiari, University of California, Los Angeles, United States
Jong Wha Joo, University of California, Los Angeles, United States
Feng Guan, University of California, Los Angeles, United States
Akshay Wadia, University of California, Los Angeles, United States
Rafail Ostrosky, University of California, Los Angeles, United States
Amit Sahai, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States
Session Chair: Dietlind Gerloff
areas of genetic research. One such area is the identification of
relatives from genetic data. The standard approach for the
identification of genetic relatives collects the genomic data of
all individuals and stores it in a database. Then, each pair of individuals are compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test.
Results: In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provides the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins
which was not possible using the existing methods. We also show in the
1000 genomes data with cryptic relationships that our method can detect these individuals.
Date: Monday, July 14, 3:10 pm - 3:35 pm
Author(s):
Sushmita Roy, University of Wisconsin, Madison, United States
Ilan Wapinski, Harvard Medical School, United States
Jenna Pfiffner, Broad institute, United States
Courtney French, University of California, United States
Amanda Socha, Darthmouth College, United States
Jay Konieczka, Broad institute, United States
Naomi Habib, Broad institute, United States
Manolis Kellis, MIT, United States
Dawn Thompson, Broad institute, United States
Aviv Regev, MIT & Broad institute, United States
Session Chair: Russell Schwartz
Date: Monday, July 14, 3:40 pm - 4:05 pm
Author(s):
Matthew Studham, SciLifeLab, Sweden
Andreas Tjärnberg, SciLifeLab, Sweden
Torbjörn Nordling, SciLifeLab, Sweden
Sven Nelander, Uppsala University, Sweden
Erik Sonnhammer, SciLifeLab, Sweden
Session Chair: Reinhard Schneider
Date: Monday, July 14, 3:40 pm - 4:05 pm
Author(s):
Steve Lianoglou, Memorial Sloan Kettering Cancer Center, United States
Christina Leslie, Memorial Sloan Kettering Cancer Center, United States
Julie Yang, Memorial Sloan Kettering Cancer Center, United States
Christine Mayr, Memorial Sloan Kettering Cancer Center, United States
Vidur Garg, Memorial Sloan Kettering Cancer Center, United States
Session Chair: Michal Linial
Date: Monday, July 14, 3:40 pm - 4:05 pm
Author(s):
Valentina Boeva, Institut Curie, France
Haitham Ashoor, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Sa
Aurelie Herault, UMR 144 CNRS, Subcellular Structure and Cellular Dynamics, Fr
Aurelie Kamoun, Institut Curie, Fr
Francois Radvanyi, Institut Curie, Fr
Vladimir B. Bajic, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Sa
Emmanuel Barillot, Institut Curie, Mines ParisTech, Fr
Session Chair: Dietlind Gerloff
Date: Monday, July 14, 3:40 pm - 4:05 pm
Author(s):
David Horn, Tel-Aviv University, Israel
Session Chair: Russell Schwartz
Date: Tuesday, July 15, 10:30 am - 10:55 am
Author(s):
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Vesna Memisevic, U.S. Army Medical Research and Materiel Command, United States
Nela Zavaljevski, U.S. Army Medical Research and Materiel Command, United States
Rembert Pieper, J. Craig Venter Institute, United States
Seesandra Rajagopala, J. Craig Venter Institute, United States
Keehwan Kwon, J. Craig Venter Institute, United States
Katherine Townsend, J. Craig Venter Institute, United States
Chenggang Yu, U.S. Army Medical Research and Materiel Command, United States
Xueping Yu, U.S. Army Medical Research and Materiel Command, United States
David DeShazer, U.S. Army Medical Research Institute of Infectious Diseases, United States
Jaques Reifman, U.S. Army Medical Research and Materiel Command, United States
Anders Wallqvist, U.S. Army Medical Research and Materiel Command, United States
Session Chair: Scott Markel
Date: Tuesday, July 15, 10:30 am - 10:55 am
Author(s):
Michael Brent, Washington University, United States
Session Chair: Robert F. Murphy
Date: Tuesday, July 15, 10:30 am - 10:55 am
Author(s):
Arne Elofsson, Stockholm University, Sweden
Sara Light, Stockholm University, Sweden
Rauan Sagit, Stockholm University, Sweden
Oxana Sachenkova, Stockholm University, Sweden
Diana Ekman, Stockholm University, Sweden
Session Chair: Predrag Radivojac
and deletion events, which affect the length of the protein. It is known that such indel events most frequently occur in
surface-exposed loops. However, detailed analysis of indel events in
distantly related and fast evolving proteins is hampered by the
difficulty involved in correctly aligning such sequences. We
circumvent this problem by first only analyzing homologous proteins
based on length variation rather than pairwise alignments. Using this
approach we find a surprisingly strong relationship between difference
in length and difference in the number of intrinsically disordered
residues, where up to 75% of the length variation can be
explained by changes in the number of intrinsically disordered
residues. Further, we find that disorder is common in both insertions
and deletions. A more detailed analysis reveals that indel events do
not induce disorder but rather that already disordered regions accrue
indels, suggesting that there is a lowered selective pressure for
indels within intrinsically disordered regions.
Date: Tuesday, July 15, 10:30 am - 10:55 am
Author(s):
Wei Cheng, UNC at Chapel Hill, United States
Xiang Zhang, Case Western Reserve University, United States
Zhishan Guo, UNC at Chapel Hill, United States
Yu Shi, University of Science and Technology of China
Wei Wang, University of California, Los Angeles, United States
Session Chair: Toni Kazic
To address the limitations of the existing methods, we propose Graph-regularized Dual Lasso (GDL), a robust approach for eQTL mapping. GDL integrates the correlation structures among genetic markers and traits simultaneously. It also takes into account the incompleteness of the networks and is robust to the noise. GDL utilizes graph-based regularizers to model the prior networks and does not require an explicit clustering step. Moreover, it enables further refinement of the partial and noisy networks. We further generalize GDL to incorporate the location of genetic makers and gene pathway information. We perform extensive experimental evaluations using both simulated and real datasets. Experimental results demonstrate that the proposed methods can effectively integrate various available priori knowledge and significantly outperform the state-of-the-art eQTL mapping methods.
Date: Tuesday, July 15, 11:00 am - 11:25 am
Author(s):
Lars Kaderali, Technische Universität Dresden, Germany
Marco Binder, University of Heidelberg, Germany
Nurgazy Sulaimanov, University of Heidelberg, Germany
Diana Clausznitzer, Technische Universität Dresden, Germany
Manuel Schulze, Technische Universität Dresden, Germany
Cristian Hüber, University of Heidelberg, Germany
Simon Lenz, University of Heidelberg, Germany
Johannes Schloeder, University of Heidelberg, Germany
Martin Trippler, University Hospital Essen, Germany
Ralf Bartenschlager, University of Heidelberg, Germany
Volker Lohmann, University of Heidelberg, Germany
Session Chair: Scott Markel
Date: Tuesday, July 15, 11:00 am - 11:25 am
Author(s):
Jeroen de Ridder, Delft University of Technology, Netherlands
Johann de Jong, Netherlands Cancer Institute, Netherlands
Lodewyk Wessels, Netherlands Cancer Institute, Netherlands
Sepideh Babaei, Delft University of Technology, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands
Waseem Akhtar, Netherlands Cancer Institute, Netherlands
Session Chair: Robert F. Murphy
Date: Tuesday, July 15, 11:00 am - 11:25 am
Author(s):
Yaara Arkin, Tel-Aviv University, Israel
Elior Rahmani, Tel Aviv University, Israel
Marcus E. Kleber, University of Heidelberg, Germany
Reijo Laaksonen, University of Tampere, Finland
Winfried Maerz, University of Heidelberg, Germany
Eran Halperin, Tel-Aviv University, Israel
Session Chair: Toni Kazic
Results: We present an efficient algorithm for detecting epistasis in quantitative GWAS, achieving a substantial runtime speedup by avoiding the need to exhaustively test all SNP pairs using metric embedding and random projections. Unlike previous metric embedding methods for case-control studies, we introduce a new embedding, where each SNP is mapped to two Euclidean spaces. We implemented our method in a tool named EPIQ (EPIstasis detection for Quantitative GWAS), and we show by simulations that EPIQ requires hours of processing time where other methods require days and sometimes weeks. Applying our method to a dataset from the Ludwigshafen Risk and Cardiovascular Health study discovered a pair of SNPs with a near-significant interaction (p=2.2×10-13), in only 1.5 hours on 10 processors. Availability: https://github.com/yaarasegre/EPIQ
Date: Tuesday, July 15, 11:30 am - 11:55 pm
Author(s):
Serghei Mangul, University of California, Los Angeles, United States
Nicholas Wu, University of California, Los Angeles, United States
Nicholas Mancuso, Georgia State University, United States
Alex Zelikovsky, Georgia State University, United States
Ren Sun, University of California, Los Angeles, United States
Eleazar Eskin, University of California, Los Angeles, United States
Session Chair: Toni Kazic
The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allows VGA to assemble rare variants. VGA utilizes an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. Results on both synthetic and real datasets show that our method able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method which scales to millions of sequencing reads. The open source C++/Python implementation of VGA is freely available for download at http://genetics.cs.ucla.edu/vga/ Contact: serghei@cs.ucla.edu
Date: Tuesday, July 15, 11:30 am - 11:55 pm
Author(s):
Jianlin Cheng, University of Missouri Columbia, United States
Tuan Trieu, University of Missouri, Columbia, United States
Session Chair: Robert F. Murphy
Date: Tuesday, July 15, 11:30 am - 11:55 pm
Author(s):
Ron Unger, Bar Ilan University, Israel
Etai Jacob, Bar-Ilan University, Il
Amnon Horovitz, Weizmann Institute , Il
Session Chair: Predrag Radivojac
C-terminal domains. Given that folding rates are affected by chain length, we asked whether the tendency for N-terminal domains to be shorter than their neighboring C-terminal domains reflects selection for faster folding N-terminal domains. Calculations of
contact order, another predictor of folding rate, provide additional evidence that N-terminal domains tend to fold faster than their C-terminal neighboring domains. A possible explanation for this bias, which is more pronounced in prokaryotes than in eukaryotes, is that faster folding of N-terminal domains reduces the risk of protein aggregation during folding by preventing formation of non-native interdomain interactions. This explanation is supported by our finding that two-domain proteins with a shorter N-terminal domain are more abundant than those with a shorter C-terminal domain.
Date: Tuesday, July 15, 11:30 am - 11:55 pm
Author(s):
Iman Hajirasouliha, Brown University, United States
Ahmad Mahmoody, Brown University, United States
Ben Raphael, Brown University, United States
Session Chair: Toni Kazic
In this paper, we introduce the binary tree partition, a novel combinatorial formulation of the problem of constructing the subpopulations of tumor cells from the variant allele frequencies of somatic mutations. We show that finding a binary tree partition is an NP-complete problem; derive an approximation algorithm for an optimization version of the problem; and present a recursive algorithm to find a binary tree partition with errors in the input. We show that the resulting algorithm outperforms existing clustering approaches on simulated and real sequencing data.
Date: Tuesday, July 15, 12:00 pm - 12:25 pm
Author(s):
Haiyuan Yu, Cornell University, United States
Yu Guo, Cornell University, United States
Jishnu Das, Cornell University, United States
Hao Ran Lee, Cornell University, United States
Xiaomu Wei, Cornell University, United States
Jin Liang, Cornell University, United States
Robert Fragoza, Cornell University, United States
Adithya Sagar, Cornell University, United States
Xiujuan Wang, Cornell University, United States
Matthew Mort, Cardiff University, United Kingdom
Peter Stenson, Cardiff University, United Kingdom
David Cooper, Cardiff University, United Kingdom
Andrew Grimson, Cornell University, United States
Steven Lipkin, Weill Cornell Medical College, United States
Andrew Clark, Cornell University, United States
Session Chair: Toni Kazic
Date: Tuesday, July 15, 12:00 pm - 12:25 pm
Author(s):
Nelle Varoquaux, Mines ParisTech, France
Ferhat Ay, University of Washington, United States
William Noble, University of Washington, United States
Jean-Philippe Vert, Mines ParisTech, France
Session Chair: Robert F. Murphy
in a single Hi-C experiment, of the frequencies of physical contacts
among pairs of genomic loci at a genome-wide scale. The next
challenge is to infer, from the resulting DNA-DNA contact maps,
accurate three dimensional models of how chromosomes fold and
fit into the nucleus. Many existing inference methods rely upon
multidimensional scaling (MDS), in which the pairwise distances of
the inferred model are optimized to resemble pairwise distances
derived directly from the contact counts. These approaches, however,
often optimize a heuristic objective function and require strong
assumptions about the biophysics of DNA to transform interaction
frequencies to spatial distance, and thereby may lead to incorrect
structure reconstruction.
Methods: We propose a novel approach to infer a consensus three-
dimensional structure of a genome from Hi-C data. The method
incorporates a statistical model of the contact counts, assuming
that the counts between two loci follow a Poisson distribution whose
intensity decreases with the physical distances between the loci. The
method can automatically adjust the transfer function relating the
spatial distance to the Poisson intensity and infer a genome structure
that best explains the observed data.
Results: We compare two variants of our Poisson method, with or
without optimization of the transfer function, to four different MDS-
based algorithms—two metric MDS methods using different stress
functions, a nonmetric version of MDS, and ChromSDE, a recently
described, advanced MDS method—on a wide range of simulated
datasets. We demonstrate that the Poisson models reconstruct better
structures than all MDS-based methods, particularly at low coverage
and high resolution, and we highlight the importance of optimizing
the transfer function. On publicly available Hi-C data from mouse
embryonic stem cells, we show that the Poisson methods lead to
more reproducible structures than MDS-based methods when we
use data generated using different restriction enzymes, and when we
reconstruct structures at different resolutions.
Availability: A Python implementation of the proposed method is
available at http://cbio.ensmp.fr/pastis.
Date: Tuesday, July 15, 12:00 pm - 12:25 pm
Author(s):
Mengfei Cao, Tufts University, United States
Christopher Pietras, Tufts University, United States
Xian Feng, Tufts University, United States
Kathryn Doroschak, University of Minnesota, United States
Thomas Schaffner, Tufts University, United States
Jisoo Park, Tufts University, United States
Hao Zhang, Tufts University, United States
Lenore Cowen, Tufts University, United States
Benjamin Hescott, Tufts University, United States
Session Chair: Predrag Radivojac
noise as well as edge directions and known pathway information
into the representation of protein-protein interaction networks might
improve their utility for functional inference. However, a simple way
to do this has not been obvious. We find that DSD,
our recent diffusion-based metric for measuring dissimilarity in protein-protein
interaction (PPI) networks, has natural extensions that incorporate
confidence, directions, and can even express coherent pathways by
calculating DSD on an augmented graph.
Results: We define three incremental versions of DSD which we term cDSD, caDSD,
and capDSD, where the capDSD matrix incorporates confidence, known
directed edges, and pathways into the measure of how similar each
pair of nodes is according to the structure of the PPI network. We
test four popular function prediction methods (majority
vote, weighted majority vote, multiway cut, and functional flow) using these different
matrices on the Baker's yeast PPI network in cross-validation. The
best performing method is weighted majority vote using capDSD.
We then test the performance of our augmented DSD
methods on an integrated heterogeneous set of protein association
edges from the STRING database. The superior performance of
capDSD in this context confirms that treating the pathways as
probabilistic units is more powerful than simply incorporating pathway
edges independently into the network.
Availability: All source code for calculating the confidences,
for extracting pathway information from KEGG XML files, and for
calculating the cDSD, caDSD and capDSD matrices is available from
http://dsd.cs.tufts.edu/capdsd
Date: Tuesday, July 15, 12:00 pm - 12:25 pm
Author(s):
Quaid Morris, University of Toronto, Canada
Wei Jiao, Ontario Institute for Cancer Research, Ca
Shankar Vembu, University of Toronto, Ca
Amit Deshwar, University of Toronto, Ca
Lincoln Stein, Ontario Institute for Cancer Research, Ca
Session Chair: Toni Kazic
Date: Tuesday, July 15, 2:00 pm - 2:25 pm
Author(s):
Luay Nakhleh, Rice University, United States
Session Chair: Cenk Sahinalp
Date: Tuesday, July 15, 2:00 pm - 2:25 pm
Author(s):
Michael Kramer, University of California, San Diego, United States
Janusz Dutkowski, University of California, San Diego, United States
Michael Yu, University of California, San Diego, United States
Vineet Bafna, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States
Session Chair: Lenore Cowen
(1) analyze a full matrix of gene–gene pairwise similarities from -omics data;
(2) infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and
(3) respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms.
Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference.
Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.
Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ~30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall).
Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.
Date: Tuesday, July 15, 2:00 pm - 2:25 pm
Author(s):
Cristina G. Ghiurcuta, EPFL, Switzerland
Bernard M.E. Moret, EPFL, Switzerland
Session Chair: Alex Bateman
function of genomes by translating knowledge gained about some genomes
to the object of study. Early approaches used pairwise comparisons, but
today researchers are attempting to leverage the larger potential of multiway
comparisons. Comparative genomics relies on the structuring of genomes into
syntenic blocks: blocks of sequence that exhibit conserved features across the
genomes. Syntenic blocks are required for complex computations to scale to
the billions of nucleotides present in many genomes; they enable comparisons
across broad ranges of genomes because they filter out much of the individual
variability; they highlight candidate regions for in-depth studies; and they
facilitate whole-genome comparisons through visualization tools. However, the
concept of syntenic block remains loosely defined. Tools for the identification
of syntenic blocks yield quite different results, thereby preventing a systematic
assessment of the next steps in an analysis. Current tools do not include
measurable quality objectives and thus cannot be benchmarked against
themselves. Comparisons among tools have also been neglected—what few
results are given use superficial measures unrelated to quality or consistency.
Results: We present a theoretical model as well as an experimental basis with
quality measures for comparing syntenic blocks and thus also for improving
or designing tools for the identification of syntenic blocks. We illustrate the
application of the model and the measures by applying them to syntenic blocks
produced by 3 different contemporary tools (DRIMM-Synteny, i-ADHoRe and
Cyntenator) on a dataset of 8 yeast genomes. Our findings highlight the need
for a well founded, systematic approach to the decomposition of genomes into
syntenic blocks. Our experiments demonstrate widely divergent results among
these tools, throwing into question the robustness of the basic approach in
comparative genomics. We have taken the first step towards a formal approach
to the construction of syntenic blocks by developing a simple quality criterion
based on sound evolutionary principles.
Date: Tuesday, July 15, 2:00 pm - 2:25 pm
Author(s):
Matan Hofree, University of California, San Diego, United States
John P. Shen, University of California, San Diego, United States
Andrew Gross, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States
Session Chair: Paul Horton
Date: Tuesday, July 15, 2:30 pm - 2:55 pm
Author(s):
Marc Hulsman, Delft University of Technology, Netherlands
Christos Dimitrakopoulos, ETH Zurich, Switzerland
Jeroen de Ridder, Delft University of Technology, Netherlands
Session Chair: Cenk Sahinalp
Results: In this work, we derive generalized, scale-aware versions of known graph-topological measures based on diffusion kernels. We apply these to characterize the topology of networks across all scales simultaneously, generating a so called graph-topological scale-space. The comprehensive physical interaction network in yeast is used to show that scale-space based measures consistently give superior performance when distinguishing protein functional categories and three major types of functional interactions: genetic interaction, co-expression and perturbation interactions. Moreover, we demonstrate that graph-topological scale-spaces capture biologically meaningful features that provides new insights into the link between function and protein network architecture.
Availability: Matlab code to calculate the STMs is available from: http://bioinformatics.tudelft.nl/TSSA Contact: j.deridder@tudelft.nl
Date: Tuesday, July 15, 2:30 pm - 2:55 pm
Author(s):
Anika Oellrich, Wellcome Trust Sanger Institute, United States
Julius Jacobsen, Wellcome Trust Sanger Institute, United Kingdom
Irene Papatheodorou, Wellcome Trust Sanger Institute, United Kingdom
The Sanger Mouse Genetics Project, Wellcome Trust Sanger Institute, United Kingdom
Damian Smedley, Wellcome Trust Sanger Institute, United Kingdom
Session Chair: Lenore Cowen
Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene–phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1,967 secondary phenotype hypotheses that cover 243 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed.
Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/ gene/secondaryphenotype/list.
Contact: ao5@sanger.ac.uk
Date: Tuesday, July 15, 2:30 pm - 2:55 pm
Author(s):
Kourosh Zarringhalam, UMass Boston/Pfizer, United States
Ahmed Enayetallah, Biogen Idec, United States
Padmalatha Reddy, Pfizer, United States
Daniel Ziemek, Pfizer, Germany
Session Chair: Alex Bateman
Methods: We propose a method that utilizes patient-level genome- wide expression data in conjunction with causal networks based on prior knowledge. Our approach infers a differential expression profile for each patient and uses a Bayesian approach to infer corresponding upstream regulators. These regulators and their corresponding posterior probabilities of activity are used in a regularized regression framework to predict response.
Results: We validated our approach using two clinically relevant phenotypes, namely acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. To demonstrate pitfalls in translating trained predictors across independent trials, we analyze performance characteristics of our and alternative approaches on two independent datasets for each phenotype and show that the proposed approach is able to successfully incorporate causal prior knowledge to give robust performance estimates.
Date: Tuesday, July 15, 2:30 pm - 2:55 pm
Author(s):
Hsin-Ta Wu, Brown University, United States
Iman Hajirasouliha, Brown University, United States
Benjamin Raphael, Brown University, United States
Session Chair: Paul Horton
We derive independent and recurrent copy number aberrations as maximal cliques in an interval graph constructed from overlaps between aberrations. We efficiently enumerate all such cliques, and derive a dynamic programming algorithm to find an optimal selection of non-overlapping cliques, resulting in a very fast algorithm, which we call RAIG (Recurrent Aberrations from Interval Graphs). We show that RAIG outperforms other methods on simulated data and performs well on data from three cancer types from The Cancer Genome Atlas (TCGA). In contrast to existing approaches that employ various heuristics to select independent aberrations, RAIG optimizes a well-defined objective function. We show that this allows RAIG to identify rare aberrations that are likely functional, but are obscured by overlaps with larger passenger aberrations.
Date: Tuesday, July 15, 3:00 pm - 3:25 pm
Author(s):
Rolf Backofen, University of Freiburg, Germany
Sita Lange, University Freiburg, De
Daniel Maticzka, University Freiburg, De
Fabrizio Costa, University Freiburg, De
Session Chair: Cenk Sahinalp
We provide a solution by learning an accurate protein-binding model based on an efficient graph-kernel approach that learns sequence-structure properties from several thousands binding sites. Transcripts targeted in any other cells can be identified with high specificity. E.g. we show that the up-regulation in an AGO-knockdown cannot be explained with existing AGO-CLIP-seq data, but it can when using our predictions.
Date: Tuesday, July 15, 3:00 pm - 3:25 pm
Author(s):
Peter Robinson, Charite University Hospital, Germany
Sebastian Köhler, Charité, Germany
Anika Oellrich, Sanger Institute, United Kingdom
Kai Wang, UCS, United States
Christopher Mungall, Lawrence Berkeley National Laboratory, United States
Suzanna Lewis, Lawrence Berkeley National Laboratory, United States
Nicole Washington, Lawrence Berkeley National Laboratory-, United States
Sebastian Bauer, Charité- , Germany
Dominik Seelow, Charité- , United States
Peter Krawitz, Charité, Germany
Christian Gilissen, Nijmegen, Netherlands
Melissa Haendel, U Oregon, United States
Damian Smedley, Sanger Institute- , United Kingdom
Session Chair: Lenore Cowen
Date: Tuesday, July 15, 3:00 pm - 3:25 pm
Author(s):
A. Ercument Cicek, Carnegie Mellon University, United States
Kathryn Roeder, Carnegie Mellon University, United States
Gultekin Ozsoyoglu, Case Western Reserve University, United States
Session Chair: Alex Bateman
Results: We apply MIRA to gene expression analysis of six knock-out strains of E. coli, and show that MIRA captures the underlying metabolic dynamics of the switch from aerobic to anaerobic respiration. We also apply MIRA, to an Autism Spectrum Disorder gene expression dataset. Results indicate that MIRA’s reports metabolites that highly overlap with recently found metabolic biomarkers in the autism literature. Overall, MIRA is a promising algorithm for detecting metabolic drug targets and understanding the relation between gene expression and metabolic activity.
Date: Tuesday, July 15, 3:00 pm - 3:25 pm
Author(s):
Giovanni Ciriello, Memorial Sloan Kettering Cancer Center, United States
Martin Miller, Memorial Sloan Kettering Cancer Center, United States
Bulent Arman Aksoy, Memorial Sloan Kettering Cancer Center, United States
Yasin Senbabaoglu, Memorial Sloan Kettering Cancer Center, United States
Nikolaus Schultz, Memorial Sloan Kettering Cancer Center, United States
Chris Sander, Memorial Sloan Kettering Cancer Center, United States
Session Chair: Paul Horton
we derived a hierarchical classification of 3,299 TCGA tumors from 12 cancer types. The top classes are dominated by either mutations (M class) or copy number changes (C class).
This distinction is clearest at the extremes of genomic instability, reflecting different oncogenic processes. The full hierarchy shows event signatures characteristic of cross-tissue tumor classes. Targetable functional events are suggestive of class-specific combination therapy. These results may assist in the definition of clinical trials to match actionable oncogenic signatures with personalized therapies.
Date: Tuesday, July 15, 3:30 pm - 3:55 pm
Author(s):
Noam Kaplan, University of Massachusetts Medical School, United States
Job Dekker, University of Massachusetts Medical School, United States
Session Chair: Cenk Sahinalp
We have developed a high-throughput scaffolding approach, based on the notion that loci that are near each other in the genomic sequence have a high probability of interacting with each other. We demonstrate that genome-wide in vivo chromatin interaction frequency measurements can be used as genomic distance proxies to accurately detect the positions of contigs over large distances without requiring any sequence overlap. Furthermore, we demonstrate our approach can karyotype and scaffold an entire genome de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods. Our approach can theoretically bridge any gap size, is simple, robust, scalable and applicable to any species.
Date: Tuesday, July 15, 3:30 pm - 3:55 pm
Author(s):
James Costello, University of Colorado Anschutz Medical Campus, United States
Laura Heiser, OHSU, United States
Elisabeth Georgii, Aalto University, Finland
Michael Menden, EMBL, United Kingdom
Nicholas Wang, OHSU, United States
Mukesh Bansal, Columbia University, United States
Mohammad Ammad-ud-din, Aalto University, Finland
Petteri Hintsanen, University of Helsinki, Finland
Suleiman Khan, Aalto University, Finland
John-Patrick Mpindi, University of Helsinki, Finland
Olli Kallioniemi, University of Helsinki, Finland
Antti Honkela, University of Helsinki, Finland
Tero Aittokallio, University of Helsinki, Finland
Krister Wennerberg, University of Helsinki, Finland
James Collins, Boston University, United States
Dan Gallahan, NIH, United States
Dinah Singer, NIH, United States
Julio Saez-Rodriguez, EMBL, United Kingdom
Samuel Kaski, Aalto University, Finland
Joe Gray, OHSU, United States
Gustavo Stolovitzky, IBM, United States
Mehmet Gonen, Aalto University , Finland
Session Chair: Lenore Cowen
Date: Tuesday, July 15, 3:30 pm - 3:55 pm
Author(s):
Masaaki Kotera, Kyoto University, Japan
Yasuo Tabei, Japan Science and Technology Agency, Japan
Yoshihiro Yamanishi, Kyushu University, Japan
Ai Muto, Kyoto University, Japan
Yuki Moriya, Kyoto University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan
Session Chair: Alex Bateman
Results: In this paper we develop a novel method to predict the multi-step reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as ”multi-step reaction sequence likeness”, i.e., whether or not a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multi-step reaction sequences, based on chemical substructure fingerprints of compounds. In the results, we demonstrate the usefulness of our pro- posed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set, and discuss characteristic featu- res of the extracted chemical substructure transformation patterns in multi-step reaction sequences. Our comprehensively predicted reac- tion networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways.
Date: Tuesday, July 15, 3:30 pm - 3:55 pm
Author(s):
Rotem Ben-Hamo, Bar Ilan University, Israel
Session Chair: Paul Horton