Posters

20th Annual International Conference on
Intelligent Systems for Molecular Biology

Paper Presentation Schedule

AA01 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA02 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA03 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA04 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA05 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA06 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA07 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA08 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA09 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA10 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA11 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA12 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA13 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA14 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA15 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA16 -

Date: Tuesday, July 17, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

AA17 -

Date: Monday, July 16, 12:40 p.m. - 2:30 p.m., Room: Exhibit Hall B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN1 - Seeing forward by looking back

Date: Sunday, July 15, 9:00 a.m. – 10:00 a.m.Room: Ballroom
Presenting author: Richard H. Lathrop, Lawrence Hunter , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN2 - Data integration for understanding dynamic biological systems

Date: Sunday, July 15, 4:30 p.m. – 5:30 p.mRoom: Ballroom
Presenting author: Ziv Bar-Joseph , Carnegie Mellon University, United States

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN3 - Analysis of transcriptome structure and chromatin landscapes

Date: Monday, July 16, 9:00 a.m. – 10:00 a.m.Room: Ballroom
Presenting author: Barbara Wold , California Institute of Technology, Pasadena, United States

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN4 - Progress, challenges and opportunities in population genome sequencing

Date: Monday, July 16, 4:30 p.m. – 5:30 p.m.Room: Ballroom
Presenting author: Richard Durbin , Wellcome Trust Sanger Institute, United Kingdom

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN5 - Integrative Structural Biology

Date: Tuesday, July 17, 9:00 a.m. - 10:00 a.m.Room: Ballroom
Presenting author: Andrej Sali , University of California, San Francisco, United States

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

KN6 - The other Third: Coming to grips with membrane proteins

Date: Tuesday, July 17, 4:30 p.m. – 5:30 p.m.Room: Ballroom
Presenting author: Gunnar von Heijne , Stockholm University, Sweden

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

LBR01 - Identifying tissue specificity of protein complexes based on a global map of human expression data

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 202B/C
Presenting author: Daniela Börnigen , Harvard University, Harvard School of Public Health, United States

Additional authors:
Daniela Börnigen, Harvard School of Public Health, Harvard University, United States
Tune Pers, Technical University of Denmark, Denmark
Lieven Thorrez, KU Leuven, Belgium
Curtis Huttenhower, Harvard School of Public Health, Harvard University, United States
Yves Moreau, KU Leuven, Belgium
Søren Brunak, Technical University of Denmark, Denmark

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

Disease-causing human genetic variants are often highly tissue specific, but for most disease genes the primarily affected tissue is unknown. We hypothesized that the degree of coordinated expression between genes coding for distinct protein complex subunits might pinpoint the tissues in which linked diseases are manifested.

We thus developed a method to predict the tissue involvement of disease-linked protein complexes. For each susceptibility gene, we ranked tissues according the gene’s concordant expression with its protein interaction partners under normal conditions. The analysis thus combined a high-quality human interactome, its constituent set of protein complexes, a global map of human gene expression data in healthy tissues, and a predefined set of disease-linked genes.

We validated our hypothesis using this method by comparing our predictive tissue ranking with a literature-based gold standard ranking of 260 unique protein disease associations across 35 tissues. Our predictions achieved an average AUC of 0.78 over all tissues, with some (such as adipose or placental) tissues obtaining AUCs over 0.9. These were due to less heterogeneous cell types within the tissues, in contrast to tissues such as the blood or lymphatic system in which tissue specific disease involvement proved more difficult to predict. Our overall accuracy, however, suggests that the degree of coordinated expression of a disease gene and its protein interaction partners indeed provides insight into as to which tissue is most likely to be affected or causal in human disease.

Keyword:

TOP

LBR02 - Quantifying the Systemic Consequences of Point Mutations in Proteins through Pathway Dynamics and Protein Structures

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 202B/C
Presenting author: Tammy Cheng , Cancer Research UK London Research Institute, uk

Additional authors:
Tammy Cheng, Cancer Research UK London Research Institute, United Kingdom
Lucas Goehring, Max Planck Institute, Germany
Linda Jeffery, Cancer Research UK London Research Institute, United Kingdom
Yu-En Lu, University of Cambridge, United Kingdom
Jacqueline Hayles, Cancer Research UK London Research Institute, United Kingdom
Béla Novák , University of Oxford, United Kingdom
Paul Bates, Cancer Research UK London Research Institute, United Kingdom

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

Gauging the systemic effect of point mutations in proteins is an important topic in the current post GWAS era. However, it is not a trivial task to understand how a change at the protein structure level eventually affects a cell's phenotypic outcome. This is because complex, multi-scale information, ranging from proteins to pathways, is usually required for obtaining analytical results with physiological meaning. With respect to the fact that the idea of integrating both protein and pathway dynamics to estimate the systemic impact of point mutations in proteins remain predominantly unexplored, we investigate the practicality of this approach by formulating mathematical models to study point mutations that involve the cell cycle control mechanism (G2 to Mitosis transition) in yeast and the neuro-cardio-facial-cutaneous syndrome associated with the human MAPK signalling pathway.

Keyword:

TOP

LBR03 - Regulatory Network Structure as the Dominant Determinant of Transcription Factor Evolutionary Rate in Yeast

Date: Sunday, July 15 , 11:45 a.m. - 12:10 p.m.Room: 202B/C
Presenting author: Jasmin Coulombe-Huntington , Boston University, United States

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. However, since trans-level variation is known to account for much of the gene expression variation between strains, studying the evolution of trans-factors is crucial to understanding regulatory network evolution. Here, we systematically asses the different genomic and network-level determinants of transcription factor (TF) evolutionary rate in yeast and how they compare to those of generic proteins. We develop a novel method to demonstrate that transcription factors possess significantly distinct trends relating evolutionary rate to various genomic features, such as mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We then go on to show that the strongest predictor of transcription factor evolutionary rate is the median evolutionary rate of its target genes, followed by the fraction of target genes which are species-specific. After decomposing the regulatory network into positive and negative edges, we found that this effect is limited to activating regulatory relationships. This work is the first to establish the modularity of TF-target protein evolution and highlights key evolutionary differences between positive and negative regulation systems. We have also demonstrated that systems-level properties can leave evolutionary traces of comparable effect size to physical features such as interaction degree and expression level and that TF evolution in particular is best understood through a regulatory network-level perspective.

Keyword:

TOP

LBR04 - Global and specific Regulation of mRNA Decay analyzed by Dynamic Transcriptome Analysis

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 202B/C
Presenting author: Achim Tresch , Ludwig-Maximilians-University Munich, Germany

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

To measure eukaryotic mRNA turnover, we developed comparative Dynamic Transcriptome Analysis (cDTA). cDTA provides absolute rates of mRNA synthesis and decay in Saccharomyces cerevisiae (Sc) cells with the use of Schizosaccharomyces pombe (Sp) as internal standard. We apply cDTA to Sc mutants of its transcription- and degradation machinery. We find that mutants with a decreased degradation show also a decreased transcriptional activity. Surprisingly, this negative feedback is mutual, i.e., mutants that are globally impaired in their RNA synthesis have a globally decreased decay. Extended kinetic modeling reveals that this mutual feedback is achieved by a factor that inhibits synthesis and a factor that enhances degradation.

Keyword:

TOP

LBR05 - Fractionation, rearrangement and subgenome dominance

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 202A
Presenting author: David Sankoff , University of Ottawa, Canada

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Fractionation, the loss of duplicate genes after whole genome duplication (WGD), causes more gene order disruption than classical chromosomal rearrangements such as inversion or reciprocal translocation. WGD and fractionation are particularly prevalent in flowering plants. Gene order disruption follows from the partly random choice of which of the two copies is deleted, This artificially inflates the inferred amount of chromosomal rearrangement observed between the WGD descendant and an unduplicated sister genome. Our work is designed to computationally detect, characterize and correct for this impediment to the study of evolution.

We developed the "consolidation algorithm" to assess and correct for the gross errors in rearrangement inference caused by fractionation. In simulations our procedure almost completely wipes out this distortion.

In applying our method to the poplar genome, an ancient tetraploid, compared to a diploid sister genome, grapevine, we discovered that the majority of the apparent rearrangement is actually attributable to fractionation. Examining the consolidated regions detected by our algorithm, there are a number of regions much longer than those in the simulations, suggesting a non-independence of deletion events affecting neighboring genes, and clear tendency for genes to be deleted in one of the two homeologs, as would be predicted by the recent theory of subgenome dominance

Keyword:

TOP

LBR06 - Internal pseudo-symmetry in proteins

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 202A
Presenting author: Andreas Prlic , University of California San Diego, United States

Additional authors:
Andreas Prlic, UCSD, United States
Spencer Bliven, UCSD, United States
Philippe Youkharibache, InPharmatics Corporation, United States
Peter Rose, UCSD, United States
Phil Bourne, UCSD, United States

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Symmetry in the quaternary structure of proteins is frequently associated with function. For example, symmetry plays a prominent role in models of enzyme activity. While the observation of symmetry in quaternary structure goes back to the very first protein structures, more and more cases of pseudo-symmetry within protein domains have been described. It is hypothesized that such symmetries can be linked to function and folding of proteins. Here, we attempt to verify this hypothesis by both systematically detecting pseudo-symmetry via a new algorithm and by manually investigating crafted alignments of symmetric proteins. The new algorithm detects internal pseudo-symmetry and repeats in protein chains and is available in the software CE-Symm. By applying it systematically we can detect such structural features in many examples that have previously not been described. We investigate the hypothesis that symmetry is related to function by manually analyzing many of the detected cases. Our results show that symmetry plays an important functional role not only in quaternary structure, but also within protein chains. We can identify local alignments between distant folds, in which symmetric subunits, here called “protodomains” are conserved. This allows us to gain novel insights into distant evolutionary relationships. Knowledge of internal symmetry is important for a better understanding of evolution, function and folding and newly resolved protein structures should be investigated for hidden internal pseudo-symmetries.

Keyword:

TOP

LBR07 - Technology to identify global dynamics of protein interaction networks

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 202A
Presenting author: Nozomu Yachie , University of Toronto, Canada

Additional authors:
Nozomu Yachie, University of Toronto, Canada
Sedide Ozturk, University of Toronto, Canada
Joseph Mellor, University of Toronto, Canada
Atina Cote, University of Toronto, Canada
Anna Karkhanina, University of Toronto, Canada
Haiyuan Yu, Cornell University, United States
Pascal Braun, Dana Farber Cancer Institute, United States
David Hill, Dana Farber Cancer Institute, United States
Marc Vidal, Dana Farber Cancer Institute, United States
Frederick Roth, University of Toronto, Canada

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Cancer and other genetic diseases are mediated by a web of macromolecular interactions that are regulated dynamically (for example, through post-transcriptional modification). Thus, a technology that captures the regulated dynamics of a global-scale protein interaction network would be important to accelerate our understanding of complex diseases. In vivo assays such as affinity purification followed by mass spectrometry (AP-MS) capture interactions under one condition, while in vitro assays such as Y2H capture interactions that could occur under different conditions, so long as these interactions do not require a third co-factor or post-translational modifier. No current method has the ability to economically produce many “conditional interactome” maps, each in the presence of different co-factors or modifiers. Here we describe a new technology BFG-Y2H (Barcode Fusion Genetics-Y2H) which exploits the efficiencies of deep short-read sequencing and offers the potential to map dozens of genome-scale conditional interactomes for a given species by one researcher within one year with the cost of less than $1,000 per interactome.

Keyword:

TOP

LBR08 - Assembling Acute Myeloid Leukemia RNA-seq Data to Infer Alternative Polyadenylation Site Usage

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 202A
Presenting author: Inanc Birol , Genome Sciences Centre BC Cancer Agency, Canada

Additional authors:
Inanc Birol, BC Cancer Agency, Canada

Session Chair: Olga Vitek

Presentation Overview: Show/Hide

Alternative polyadenylation in 3’ UTRs is known to affect post-transcriptional gene regulation, and can be dysregulated in tumour cells. Thus identification of alternative polyadenylation site usage and measurement of expression levels of the resulting 3’ UTRs will be valuable for understanding tumor biology. In this study, we use RNA-seq data from the Illumina HiSeq 2000 platform to characterize the transcriptome repertoires of several Acute Myeloid Leukemia (AML) samples with and without NPM1 insertions, and investigate the association of this biomarker with 3’ UTR usage and expression.
To interrogate RNA-seq data for unbiased 3’ UTR reconstruction, we expanded the functionality of Trans-ABySS, our de novo transcriptome assembly tool. Trans-ABySS assembles RNA-seq data using a range of read-to-read overlap stringency levels to account for the sensitivity-specificity balance while reconstructing transcripts with a range of expression levels. Our preliminary analysis of AML transcriptomes indicate that our approach can assemble one or more 3’ UTRs for about 80% of genes that are expressed at 10-fold or more coverage, and offer a number of novel 3’ UTR predictions, which we will study further to assess their relationships to disease biology.

Keyword:

TOP

LBR09 - CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction

Date: Monday, July 16 , 10:45 a.m. - 11:10 a.m.Room: 202A
Presenting author: Steven Brenner , University of California, Berkeley, United States

Additional authors:
Steven Brenner, University of California, Berkeley, United States

Session Chair: Chad Myers

Presentation Overview: Show/Hide

The Critical Assessment of Genome Interpretation (CAGI, 'kā-jē) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. In this assessment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterizations by independent assessors. The CAGI experiment culminates with a community workshop and publications to disseminate results, assess our collective ability to make accurate and meaningful phenotypic predictions, and better understand progress in the field. A long-term goal for CAGI is to improve the accuracy of phenotype and disease predictions in clinical settings.
This presentation will focus on the practical implications of CAGI 2011 results on a diversity of challenges. The presentation will summarize the state-of-the-art in identifying the impact of variants in a metabolic enzyme and in an oncogene, and thus the appropriate use of such methods in basic and clinical research. CAGI has revealed the relative strengths of different prediction approaches, and the best will be described.
CAGI also explored genome-scale data, showing unexpected successes in predicting Crohn’s disease from exomes, as well as disappointing failures in using genome and transcriptome data to distinguish discordant monozygotic twins with asthma. Predictors had promising complementary approaches in predicting distinct response of breast cancer cell lines to a panel of drugs. Predictors also made measurable progress in predicting a diversity of phenotypes present in the personal genome project participants.
Current information including additional challenges is available at the CAGI website at http://genomeinterpretation.org.

Keyword:

TOP

LBR10 - Chromatin Structure and Genomic Context Influence Mitochondrial DNA Insertion in Mammalian Nuclear Genomes

Date: Monday, July 16, 11:15 a.m. - 11:40 a.m.Room: 202A
Presenting author: Junko Tsuji , University of Tokyo, Japan

Additional authors:
Martin Frith, National Institute of Advanced Industrial Science and Technology, Japan
Kentaro Tomii, National Institute of Advanced Industrial Science and Technology, Japan
Paul Horton, National Institute of Advanced Industrial Science and Technology, Japan

Session Chair: Chad Myers

Presentation Overview: Show/Hide

It is known that remnants of partial or whole copies of mitochondrial DNAs are found in nuclear genomes. Such mtDNA-like sequences are called‚ NUMTs (Nuclear MiTochondrial sequences), and are integrated in the double-strand break sites of the nuclear genomes via non-homologous end joining repair. Several computational studies have investigated NUMTs, however those studies have not used appropriate methodology for sensitive detection of NUMTs and precise delineation of their boundaries. We developed a carefully considered protocol to redefine NUMT datasets of four mammalian species (human, rhesus, mouse, and rat). The issues we considered include appropriate alignment parameters, correct handling of circular mtDNA, masking of low complexity sequences, post-insertion duplication of NUMTs, long indels and validation of E-value thresholds. By analyzing the redefined datasets, we found new characteristics of NUMT integration sites. Most of the inferred insertion points of NUMTs in all organisms tested occur in the vicinity of retrotransposons (82.9-90.4%), and the insertion sites show the significant level of over-representation of A+T oligomers (p<0.0001). As well as such genomic contexts, chromatin structures also influenced the NUMT insertion. We found that NUMT insertion sites show a strong tendency to have high predicted DNA curvature, and often occur in experimentally defined nucleosome depleted regions. In light of the above results, the mtDNA insertion events are surely influenced by observed specific chromatin structures and genomic contexts.

Keyword:

TOP

LBR11 - Computing with Chromatin Modification

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: 202A
Presenting author: Barbara Bryant , Constellation Pharmaceuticals, United States

Additional authors:
Barbara Bryant, Constellation Pharmaceuticals, United States
Greg Tucker-Tellogg, National University of Singapore, Singapore

Session Chair: Chad Myers

Presentation Overview: Show/Hide

In living cells, DNA is wrapped around histone octamers to make the nucleosomes that comprise chromatin. The histones and DNA can be modified with chemical groups that are added, removed and recognized by multi-functional molecular complexes. Here we present a computational model, in which chromatin modifications are information units that can be written onto a one-dimensional chromatin memory. Chromatin-modifying complexes are modeled as read-write rules that operate on several adjacent nucleosomes. We illustrate the use of this “chromatin computer” by writing programs to solve problems that cannot be solved with finite state automata or logic circuits. We show the execution of these programs on a chromatin computer simulator, and provide animated snapshots of the intermediate states of the nucleosome memory. We model additional features of biological chromatin, resulting in more efficient computation. This formalism is useful both analytically, to model chromatin biology, and theoretically, as a programming paradigm.

Keyword:

TOP

LBR12 - Transcription factor target gene identification based on ChIP-seq data

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: 202A
Presenting author: Andreas Beyer , TU Dresden, Germany

Additional authors:
Andreas Beyer, TU Dresden, Germany
Weronika Sikora-Wohlfeld, TU Dresden, Germany
Marit Ackermann, TU Dresden, Germany
Eleni Christodoulou, TU Dresden, Germany

Session Chair: Chad Myers

Presentation Overview: Show/Hide

Chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has been instrumental for elucidating transcriptional networks by measuring the genome-wide binding of proteins at high resolution. Despite the precision of these experiments it is not trivial to identify the genes that are regulated through the observed bindings. A lot of recent research has been devoted to the correct identification of binding sites, but very little to predicting target genes. Here we present a comprehensive evaluation of computational methods used to define target genes of transcription factors (TFs) based on ChIP-seq data. In order to systematically analyze target gene prediction we structured the process into three steps and we evaluated alternatives for each of these steps. Using 66 ChIP-seq and 23 expression datasets we could show that parameter-free methods (not requiring any tunable parameters) better adapt to the specificities of a particular ChIP-seq dataset. Our analysis revealed a potential bias when comparing ChIP-seq and perturbation expression data sets due to unregulated genes. We show that target genes with the highest TF association scores tend to respond later than medium scoring targets, which partly explains the poor overlap typically observed between ChIP-seq and expression data. Finally, we investigated the clustering of TF target genes in the genome, revealing 95 regions with highly significant enrichment of targets of 42 different factors.

Keyword:

TOP

LBR13 - Fast and accurate metagenomic profiling of microbial community composition using unique clade-specific marker genes

Date: Monday, July 16, 2:30 p.m. - 2:55 p.m.Room: 202A
Presenting author: Nicola Segata , Harvard School of Public Health, United States

Additional authors:
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Annalisa Ballarini, University of Trento, It
Vagheesh Narasimhan, Harvard School of Public Health, United States
Olivier Jousson, University of Trento, It
Curtis Huttenhower, Harvard School of Public Health, United States

Session Chair: Predrag Radivojac

Presentation Overview: Show/Hide

Identifying which organisms populate a microbial community and in what proportions is crucial for characterizing human-associated microbiomes. Shotgun sequencing allows biological function and phylogenetic composition to be assayed simultaneously, but existing taxonomic profiling methods are impractical for the scope of current datasets. We propose MetaPhlAn, a novel approach incorporating clade-specific marker genes identified computationally using 2,887 reference genomes. The resulting catalog of 400,000 genes permits unambiguous taxonomic assignments from metagenomic data more accurately and >50 times faster than current approaches. The method was evaluated on terabases of short reads in addition to ten synthetic metagenomes, achieving correlations with true organismal relative abundances over 0.99 for high-complexity and log-normally distributed communities. Applied to the 691 metagenomes of the Human Microbiome Project, MetaPhlAn profiled the microbial species populating all 15 assayed body sites together with their abundance pattern signatures. Specifically, on 51 vaginal microbiomes, MetaPhlAn agreed closely with 16S-based results and further identified the Lactobacillus species forming five distinct microbiome types. An analysis of marine ecosystems confirmed detection of archaeal organisms and MetaPhlAn's applicability and accuracy even in communities with limited numbers of sequenced reference genomes. Finally, MetaPhlAn allowed us to perform a meta-analysis integrating 263 samples from the HMP and MetaHIT projects, providing the largest metagenomic community profiling to date of the human gut microbiota. This dataset highlights a range of dominant Bacteroides species among these American and European cohorts, and it suggests complexity at the species level beyond that captured by the recently proposed gut enterotypes.
MetaPhlAn is available at http://huttenhower.sph.harvard.edu/metaphlan.

Keyword:

TOP

LBR14 - Optimizing functional genomics screening strategies for drug target prediction

Date: Monday, July 16 , 3:00 p.m. - 3:25 p.m.Room: 202A
Presenting author: Raamesh Deshpande , University of Minnesota, United States

Session Chair: Predrag Radivojac

Presentation Overview: Show/Hide

Developing new drugs is a lengthy and expensive process. In comparison, many
compounds have been identified from natural sources but their activity on living cells has not been haracterized. Recent studies have proven the utility of chemical genomics based on yeast functional genomics tools for the discovery of compounds’ modes of action. Specifically, the chemical genetic interactions of a particular compound across a large nonessential deletion strain collection should mimic the genetic interactions of the corresponding target. One limitation of this approach, however, is that it requires a relatively high volume of compound given the size of the deletion collection to be queried. As a solution to this problem, we propose a method to identify a small subset of the deletion collection that is the most informative in discovering compounds’ modes of action. We have applied this method in the context of yeast and identified a diagnostic strain set comprising around 5% of the non-essential deletion mutant collection. We show that even with a small fraction of the genome, this diagnostic set performs comparably to complete chemicalgenetic profiles. We also demonstrate that our method provides substantial improvement over baseline strategies based on selection of either random genes or hubs. Large-scale chemical genomic screens of natural compound libraries based on this diagnostic set of genes are currently in progress.

Keyword:

TOP

LBR15 - Structure-Based Ligand Discovery for Solute Carrier Transporters

Date: Monday, July 16, 3:30 p.m. - 3:55 p.m.Room: 202A
Presenting author: Avner Schlessinger , University of California, San Francisco, United States

Session Chair: Predrag Radivojac

Presentation Overview: Show/Hide

Polypharmacology is a phenomenon in which a drug binds multiple rather than a single target with significant affinity. The effect of polypharmacology on therapy can be positive (effective therapy) and/or negative (side effects). Solute Carrier (SLC) Transporters are membrane proteins that control the uptake and efflux of various solutes such as amino acids, sugars, and drugs. SLCs can be drug targets themselves or be responsible for absorption, targeting, and disposition of drugs. We describe an integrated structure-based approach for identifying protein-small molecule interactions. Particularly, we use comparative modeling, virtual screening, and experimental validation (with kinetic measurements of uptake), to identify interactions between SLC transporters and small molecules ligands, including prescription drugs and metabolites. For example, we discovered that several existing prescription drugs interact with the norepinephrine transporter, NET, which may explain some of the pharmacological effects (i.e., efficacy and/or side effects) of these drugs. We also apply our approach to related transporters, to identify rules for substrate specificity in a key membrane transporter family of the nervous system. Our systems pharmacology approach is generally applicable to structural characterization of protein families other than SLCs, including receptors, ion-channels, and enzymes, as well as their interactions with small molecule ligands.

Keyword:

TOP

LBR16 - Data-driven Prediction of Drug Effects and Interactions

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: 202A
Presenting author: Nicholas Tatonetti , Stanford University, United States

Session Chair: Predrag Radivojac

Presentation Overview: Show/Hide

Adverse drug events remain a leading cause of morbidity and mortality around the world and many are not detected during clinical trials. Fortunately, regulatory agencies and other institutions maintain large collections of adverse event reports, and these databases present an opportunity to study drug effects from patient population data. However, confounding factors such as concomitant medications, patient demographics, and reasons for prescribing a drug often are uncharacterized in spontaneous reporting systems (for example, patient medical histories), and these omissions can limit the use of quantitative signal detection methods used in the analysis of such data. Here, we present an adaptive data-driven approach for correcting these factors in cases for which the covariates are unknown or unmeasured and combine this approach with existing methods to improve analyses of drug effects using three test data sets. We also present a comprehensive database of drug effects (OFFSIDES) and a database of drug-drug interaction side effects (TWOSIDES). To demonstrate the biological use of these new resources, we used them to identify drug targets, predict drug indications, and discover drug class interactions. We then corroborated 47 (P < 0.0001) of the drug class interactions using an independent analysis of electronic medical records. Patients taking combined treatment of selective serotonin reuptake inhibitors and thiazides had a significantly increased incidence of prolonged QT. We conclude that confounding effects from covariates in observational clinical data can be controlled in data analyses and thus improve the detection and prediction of adverse drug effects and interactions.

Keyword:

TOP

LBR17 - MalaCards – the integrated Human Malady Compendium

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: 202B/C
Presenting author: Marilyn Safran , Weizmann Institute of Science, Israel

Additional authors:
Marilyn Safran, Weizmann Institute of Science , Israel

Session Chair: Yinyin Yuan

Presentation Overview: Show/Hide

We introduce MalaCards, an integrated database of human maladies and their annotations (malacards.weizmann.ac.il), modeled on the architecture and richness of the popular GeneCards human genes database, (www.genecards.org). MalaCards mines varied sources to generate a ‘card’ for each disease via: 1. Identifying sources of nomenclature/annotation, targets for disease data mining; 2. Developing algorithms for merging heterogeneous disease names, and defining unique identifiers. For example, alzheimer’s disease, ad, dementia alzheimer’s type, are merged under Alzheimer Disease, acronym AD, ID=ALZ001, with others listed as aliases (see malacards.weizmann.ac.il/card/index/ALZ001); 3. Engineering scripts to mine annotations; 4. Building MalaCards V1.01(alpha), with thousands of user-friendly ‘cards’ for all incorporated maladies, containing a variety of sections; 5. Implementing a strategy whereby detailed gene-disease relationships within GeneCards are used to create disease-specific content, leveraging the GeneCards relational database and search engine; 6. Constructing a second-tier annotator, based on GeneDecks Set Distiller, a GeneCards suite member. For example, diseases related to the key disease are computed to be those maximally associated with the set of found genes. Similarly, we obtain drugs/compounds, publications and mouse phenotypes contextually related to the disease; 7. Formulating scores for prioritizing derived annotations; 8. Initiating QA based on extensive knowledge within the Crown Human Genome Center. As our R&D continues, we plan to expand the list of annotation sources and sections, and include genetic variation details. This will be enhanced by collaborations with researchers outside of our group, and expanded by the initiation of systems biology tools, towards the goal of enabling novel biomedical discoveries.

Keyword:

TOP

LBR18 - Simultaneous host-parasite transcriptomes provide insight into malarial host-parasite interactomes

Date: Tuesday, July 17, 11:15 a.m. - 11:40 a.m.Room: 202B/C
Presenting author: Adam Reid , Wellcome Trust Sanger Institute, uk

Additional authors:
Adam Reid, Wellcome Trust Sanger Institute, United Kingdom
Matthew Berriman, Wellcome Trust Sanger Institute, United Kingdom

Session Chair: Yinyin Yuan

Presentation Overview: Show/Hide

Molecular interactions are key to the ability of a parasite to enter and persist in its host. However our understanding of the genes and proteins involved in these interactions is no more than partial in even the most well understood systems. We have applied the popular concept of using correlated gene expression profiles to identify molecular interactions in one species to the interspecific (host-parasite) case. We show for the first time that genes in different species with correlated expression are more likely to encode proteins which interact or are otherwise involved in host-parasite interaction. We go on to examine predicted host-parasite interactions between the malaria parasite and both its mammalian host and insect vector.

Keyword:

TOP

LBR19 - A Predictive Gene Expression Model for quantifying Plasmodium falciparum red blood cell stages

Date: Tuesday, July 17, 11:45 a.m. - 12:10 p.m.Room: 202B/C
Presenting author: Vagheesh Narasimhan , Harvard University, United States

Additional authors:
Vagheesh Narasimhan, Harvard University, United States
Regina Joice, Harvard University, United States
Curtis Huttenhower, Harvard University, United States
Matthias Marti, Harvard University, United States
Jacqui Montgomery, Malawi-Liverpool-Wellcome Trust, United Kingdom
Karl Seydel, Michigan State University, United States
Daouda Ndiaye, University of Cheikh Anta Diop, Sn
Johanna Daily, Albert Einstein College of Medicine, United States
Kim Williamson, Loyola University, Chicago, United States
Terrie Taylor, Michigan State University, United States
Danny Milner, Harvard University, United States

Session Chair: Yinyin Yuan

Presentation Overview: Show/Hide

P. falciparum, the parasitic causative agent of malaria, undergoes a complex staged life cycle during its infection of human hosts. The transcriptional expression program of this cycle has been well-modeled, but not that of the small minority of these stages that are transmissible among hosts and thus offer a potential target for preventative interventions. We have thus developed a quantitative model for determining the proportions of transmissible morphological stages of P. falciparum in a mixed population based on transcript levels. Our model consists of a constrained linear regression, in which each transcript's total measured expression level is the sum of parasites' contributions from each life cycle stage. The model was trained and initially cross-validated using a set of five published in vitro microarray time courses in which stage distributions were determined by light and fluorescence microscopy. To apply this method in vivo, we selected the minimum number of markers needed to quantify the stage distribution using a combination of model selection, stage-specificity, and qRT-PCR primer design. We then assessed the model on microarray and qRT-PCR expression measurements from blood samples of 40 malaria patients from Valingara, Senegal, revealing that only a small subset of patients carry transmissible parasite stages. In addition to the model's ability to capture enriched biomolecular processes within transmissible malaria stages, we believe the field-applicable qRT-PCR assay may be a useful tool for future control of malaria transmission through stage-specific targeted interventions.

Keyword:

TOP

LBR20 - Achieving better agreement among microarray disease studies through automatic correction for latent variables

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: 202B/C
Presenting author: Maria Chikina , Mount Sinai Medical School, United States

Additional authors:
Stuart Sealfon, Mount Sinai School of Medicine, United States

Session Chair: Yinyin Yuan

Presentation Overview: Show/Hide

Microarray studies with human subjects often have limited sample sizes, hampering the ability of differential expression analysis to make trustworthy predictions of biomarkers associated with disease. Existing techniques for meta-analysis address this problem by aggregating the results of multiple datasets to gain statistical power, but the performance of this kind approach is limited by the fact that human gene expression is influenced by many non-random factors such as genetics, sample preparations, tissue heterogeneity, etc. that may contribute to the lack of inter-study agreement.

We show that it is in fact possible to carry out an automatic correction of individual datasets to reduce the effect of such `latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition will show better agreement once each is corrected, and allowing for more trustworthy aggregated predictions. We demonstrate our approach, which involves a crucial modification of the method of "surrogate variable analysis", on studies of multiple sclerosis. We find improved agreement across varying study designs, platforms, and tissues, and are able to make a number of novel predictions. Our analysis implicates several metabolic pathways contributing to the emerging understanding of metabolic involvement in MS pathology.

Keyword:

TOP

LBR21 - Matrix geometry determines optimal cancer migration strategy and modulates response to therapeutic agents

Date: Tuesday, July 17 , 2:30 p.m. - 2:55 p.m.Room: 202B/C
Presenting author: Melda Tozluoglu , Cancer Research UK, uk

Session Chair: Christina Curtis

Presentation Overview: Show/Hide

Cell motility is required for many biological processes, including cancer metastasis. The molecular requirements for migration and morphology of migrating cells can vary considerably depending on matrix geometry; therefore, predicting the optimal migration strategy or the effect of experimental perturbation is difficult. Here, we present a model of single cell motility that encompasses actin polymerisation based protrusions, cell cortex asymmetry, membrane blebbing, local heterogeneity, cell-extracellular matrix adhesion, and varying extracellular matrix geometries. This is used to explore the theoretical requirements for rapid migration in different matrix geometries. Confined matrix geometries cause profound changes in the relationship of adhesion and contractility to cell velocity; indeed cell-matrix adhesion is dispensable for migration in discontinuous confined environments. The utility of the model is shown by predicting the effect of different drugs and integrin depletion in vivo based only on simple in vitro measurements. Multiphoton intravital imaging of melanoma is used to verify bleb-driven migration of both melanoma and endothelial cells at tumour margins, and the predicted response to drugs.

Keyword:

TOP

LBR22 - Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer

Date: Tuesday, July 17, 3:00 p.m. - 3:25 p.m.Room: 202B/C
Presenting author: Yves Lussier , University of Illinois at Chicago, United States

Additional authors:
Yves Lussier, University of Illinois at Chicago, United States

Session Chair: Christina Curtis

Presentation Overview: Show/Hide

Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic interpretation of expression arrays remains challenging. Additionally, there is surprisingly little gene overlap among distinct clinically validated signatures. These “causality challenges” hinder the adoption of signatures as compared to functionally well-characterized single gene biomarkers. To increase the utility of multi-gene signatures in survival studies, we developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional Analysis of Individual Microarray Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. Comparing head and neck squamous cell carcinoma (HNSCC) samples with non-tumor controls, the precision and recall of deregulated FAIME-derived mechanisms of pathways and molecular functions are comparable to those produced by conventional cohort-wide methods (e.g. GSEA). The overlap of “Oncogenic FAIME Features of HNSCC” among three HNSCC datasets is more significant than the gene overlap. These Oncogenic FAIME Features of HNSCC accurately discriminated tumors from control tissues and stratify recurrence-free survival in patients from two independent studies. Previous approaches depending on group assignment of individual samples before learning a classifier are limited by design to discrete-class prediction. In contrast, FAIME calculates mechanism profiles for individual patients without requiring group assignment in validation sets. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes in clinical outcome studies.

Keyword:

TOP

LBR23 - Exploring the subclonal architecture of breast cancer

Date: Tuesday, July 17, 3:30 p.m. - 3:55 p.m.Room: 202B/C
Presenting author: David Wedge , Wellcome Trust Sanger Institute, uk

Session Chair: Christina Curtis

Presentation Overview: Show/Hide

Although the existence of substantial genetic heterogeneity within a tumour is now widely accepted, fundamental questions remain about the dynamics of Darwinian evolution in cancer. Our work aims to answer some of these questions using a variety of bioinformatic algorithms to characterise the subclonal architecture of 21 breast cancers from their whole-genome sequences.

We gain substantial statistical power to discriminate copy number aberrations (CNAs) present in a small fraction of tumor cells through the application of haplotype phasing. Further, by combining novel segmentation algorithms, including a Hierarchical Dirichlet Process - Hidden Markov Model, with constraints that reflect the known structure of the sequence data, we are able to detect CNAs present in less than 5% of the sampled cells.

We model the patterns of clonal and subclonal single nucleotide mutations using a Bayesian Dirichlet process, which simultaneously identifies the number of subclones, the fraction of tumour cells within each subclone and the mutation burden within each subclone. Using novel methods to phase mutations relative to each other and to heterozygous SNP loci, this information is used to discern the phylogenetic relationships between the subclones.

Applying our methods to 20 breast cancers reveals a complex subclonal landscape, reflecting the variety of previous genomic aberrations and clonal expansions that have shaped the tumours. In particular, they show that every tumour harbours a dominant subclone, whose expansion may represent the final rate-limiting step in carcinogenesis.

Keyword:

TOP

LBR24 - The Landscape of Somatic Structural Variations in Human Cancer Genomes

Date: Tuesday, July 17, 4:00 p.m. - 4:25 p.m.Room: 202B/C
Presenting author: Lixing Yang , Harvard University, United States

Additional authors:
Lixing Yang, Harvard, United States
Peter Park, Harvard, United States

Session Chair: Christina Curtis

Presentation Overview: Show/Hide

The cancer genome is known to harbor various somatic rearrangements. However, the full spectrum of these alterations and their underlying mechanisms remain poorly understood. Here, we performed a comprehensive identification of somatic Structural Variations (SVs) and the mechanisms generating them, using high-coverage whole-genome sequencing data of tumor and matched normal samples from 48 individuals across five tumor types (glioblastoma, ovarian, colon, prostate and multiple myeloma). By analyzing a total of 160 billion Illumina short reads, 4555 somatic SVs have been identified with true positive rate of 91%. The patterns of rearrangements are highly variable across tumor types and among individuals, with translocations (46%) being the most abundant, followed by deletions (36%) and tandem duplications (18%). Our detailed reconstruction of the events responsible for CDKN2 loss, EGFR and CDK4 gain in glioblastoma revealed much more complex sets of events than previously assumed, sometimes involving dozens of fragments. Our analysis of the breakpoints at base pair resolution shows that focal CDKN2 loss is often generated by non-homologous end joining but could also be generated by microhomology-mediated end joining or template switching mechanisms. Focal amplifications are sometimes generated by complex tandem duplications via template switching mechanism. This study provides new insights on cancer genome rearrangements and their contribution to cancer progression.

Keyword:

TOP

OPT01 -

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT02 -

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT03 -

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT04 -

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT05 -

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT06 -

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT07 -

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT08 -

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT09 -

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT10 -

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT11 -

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT12 -

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT13 -

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT14 -

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT15 -

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT16 -

Cancelled

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT17 -

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT18 -

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT19 -

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT20 -

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT21 -

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT22 -

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT23 -

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

OPT24 -

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

PP01 (PT) - GenomeRing: alignment visualization based on SuperGenome coordinates

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: Grand Ballroom
Presenting author: Alexander Herbig , University of Tübingen, Germany

Additional authors:
Günter Jäger, University of Tübingen
Florian Battke, University of Tübingen
Kay Nieselt, University of Tübingen

Session Chair: Robert Murphy

Presentation Overview: Show/Hide

Motivation: The number of completely sequenced genomes is continuously rising, allowing for comparative analyses of genomic variation. Such analyses are often based on whole-genome alignments to elucidate structural differences arising from insertions, deletions or from rearrangement events. Computational tools which can visualize genome alignments in a meaningful manner are needed to help researchers gain new insights into the underlying data. Such visualizations typically are either realized in a linear fashion as in genome browsers or by using a circular approach, where relationships between genomic regions are indicated by arcs. Both methods allow for the integration of additional information such as experimental data or annotations. However, providing a visualization that that still allows for a quick and comprehensive interpretation of all important genomic variations together with various supplemental data, which may be highly heterogeneous, remains a challenge. Results: Here we present two complementary approaches to tackle this problem. Firstly, we propose the SuperGenome concept for the computation of a common coordinate system for all genomes in a multiple alignment. This coordinate system allows for the consistent placement of genome annotations in the presence of insertions, deletions, and rearrangements. Secondly, we present the GenomeRing visualization which, based on the SuperGenome, creates an interactive visualization of the multiple genome alignment in a circular layout. We demonstrate our methods by applying them to an alignment of Campylobacter jejuni strains for the discovery of genomic islands as well as to an alignment of Helicobacter pylori, which we visualize in combination with gene expression data.

Keyword: Bioimaging

TOP

PP02 (HT) - Enriching the human apoptosis pathway by predicting the structures of protein-protein complexes

Date: Sunday, July 15 , 10:45 a.m. - 11:10 a.m.Room: 104A
Presenting author: Saliha Ece Acuner Ozbabacan , Koc University, Turkey

Additional authors:
Ozlem Keskin, Koc University, Turkey
Ruth Nussinov, NCI-Frederick, United States
Attila Gursoy, Koc University, Turkey

Session Chair: Yanay Ofran

Presentation Overview: Show/Hide

The structures of protein–protein complexes in the apoptosis signaling pathway are important as the structural pathway helps in understanding the mechanism of the regulation and information transfer, and in identifying targets for drug design. Here, we aim to predict the structures toward a more informative pathway than currently available. Based on the 3D structures of complexes in the target pathway and a protein–protein interaction modeling tool which allows accurate and proteome-scale applications, we modeled the structures of 29 interactions, 21 of which were previously unknown. Next, 27 interactions which were not listed in the KEGG apoptosis pathway were predicted and subsequently validated by the experimental data in the literature. Additional interactions are also predicted. The multi-partner hub proteins are analyzed and interactions that can and cannot co-exist are identified. Overall, our results enrich the understanding of the pathway with
interactions and provide structural details for the human apoptosis pathway.

Keyword: Protein Interactions and Molecular Networks, Protein Structure & Function

TOP

PP03 (HT) - Prediction by promoter logic in bacterial quorum sensing

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 104B
Presenting author: Mukund Thattai , National Centre for Biological Sciences, India

Additional authors:
Navneet Rai, National Centre for Biological Sciences, India
Rajat Anand, National Centre for Biological Sciences, India
Krishna Ramkumar, Indian Institute of Technology Bombay, India
Varun Sreenivasan, St. Xavier’s College, India
Sugat Dabholkar, National Centre for Biological Sciences, India
Kareenhalli Venkatesh, Indian Institute of Technology Bombay, India

Session Chair: Paul Horton

Presentation Overview: Show/Hide

Bacterial cells communicate with one another by exchanging chemical signals, which can be used to coordinate actions across a cell population. Such coordination, regulated by so-called quorum-sensing systems, works on the following principle: every cell secretes a specific signal; the more cells there are, the more signal is generated; when the population density crosses a critical threshold, cells respond by driving transcription at a specific promoter. In our experiments, we find that quorum-sensing feedback systems can generate a diverse array of response types; this diversity arises through the complex interaction of microscopic parameters with feedback topology. I will show how, treating the promoter as a black-box characterized only by its input/output response or ‘promoter logic’, we are able to qualitatively and quantitatively predict the entire range of experimentally observed responses: smooth activation; hysteretic behavior; and even synchronized oscillations. Promoter logic is thus a necessary and sufficient representation of microscopic biochemistry.

Keyword: Gene Regulation and Transcriptomics, Protein Interactions & Molecular Networks

TOP

PP04 (PT) - Joint Stage Recognition and Anatomical Annotation of Drosophila Gene Expression Patterns

Date: Sunday, July 15 , 11:15 a.m. - 11:40 a.m.Room: Grand Ballroom
Presenting author: Heng Huang, University of Texas at Arlington

Additional authors:
Hua Wang, University of Texas at Arlington
Chris Ding, University of Texas at Arlington
Xiao Cai, University of Texas at Arlington, United States

Session Chair: Robert Murphy

Presentation Overview: Show/Hide

Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increasing number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms. Results: In this paper, we propose a novel computational model for joint stage classification and anatomical terms annotation of Drosophila gene expression patterns. We introduce a new Tri-Relational Graph (TG) model that comprises the data graph, anatomical terms graph, developmental stage term graph, and connects them by three additional graphs induced from stage or annotation label assignments. Upon the TG model, we present a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP data sets demonstrate our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.

Keyword: Bioimaging

TOP

PP05 (HT) - Understanding human disease through 3D protein interactome network

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 104A
Presenting author: Haiyuan Yu , Cornell University, United States

Additional authors:
Xiujuan Wang, Cornell University, United States
Xiaomu Wei, Weill Cornell Medical College, United States
Bram Thijssen, Maastricht University, Netherlands
Jishnu Das, Cornell University, United States
Steven Lipkin, Weill Cornell Medical College, United States

Session Chair: Yanay Ofran

Presentation Overview: Show/Hide

To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific interactions is critical to understanding pathogenesis. Structurally resolved interaction networks should be valuable tools for interpreting the wealth of data being generated by large-scale structural genomics and disease association studies.

Keyword: Protein Interactions and Molecular Networks, Disease Models & Epidemiology

TOP

PP06 (HT) - An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 104B
Presenting author: Pavel Sumazin , Columbia, United States

Additional authors:
Andrea Califano, Columbia, United States

Session Chair: Paul Horton

Presentation Overview: Show/Hide

By analyzing gene expression data in gliobastoma in combination with matched microRNA profiles, we have uncovered a post-transcriptional regulation layer of surprising magnitude, comprising hundreds of thousands of microRNA-mediated interactions. These include thousands of genes whose transcripts act as microRNA ‘sponges’ and hundreds of genes that act through alternative, non-sponge interactions. Biochemical analyses in cell lines confirmed that this network regulates established drivers of glioblastoma tumor initiation and subtype, including P53, PTEN, PDGFRA, RB1, VEGFA, STAT3, and RUNX1, suggesting that these interactions mediate crosstalk between canonical oncogenic pathways. RNA silencing of 13 microRNA-mediated PTEN regulators, whose locus deletions are predictive of PTEN expression variability, was sufficient to downregulate PTEN in a 3' UTR-dependent manner and to increase tumor-cell growth rates. Thus, this microRNA-mediated network provides a mechanistic, experimentally validated rationale for the loss of PTEN expression in a large number of glioma samples with an intact PTEN locus.

Keyword: Gene Regulation and Transcriptomics, Applied Bioinformatics

TOP

PP07 (PT) - Improved synapse detection for mGRASP-asssisted brain connectivity mapping

Date: Sunday, July 15 , 11:45 a.m. - 12:10 p.m.Room: Grand Ballroom
Presenting author: Linqing Feng , Korea Institute of Science and Technology, Republic of Korea

Additional authors:
Ting Zhao, Zhejiang University
Jinhyun Kim, Korea Institute of Science and Technology

Session Chair: Robert Murphy

Presentation Overview: Show/Hide

Motivation: A new technique, mammalian GFP reconstitution across synaptic partners (mGRASP), enables mapping mammalian synaptic connectivity with light microscopy. To! characterize the locations and distribution of synapses in complex neuronal networks visualized by mGRASP, it is essential to detect mGRASP fluorescence signals with high accuracy.

Results: We developed a fully-automatic method for detecting mGRASP-labeled synapse puncta. By modeling each punctum as a Gaussian distribution, our method enables accurate detection even when puncta of varying size and shape partially overlap. The method consists of three stages: blob detection by global thresholding; blob separation by watershed; and punctum modeling by a Variational Bayesian Gaussian Mixture Model. Extensive testing shows that the three-stage method improved detection accuracy markedly, and especially reduces under-segmentation. The method provides a goodness-of-fit score for each detected punctum, allowing efficient error detection. We applied this advantage to also develop an efficient interactive method for correcting errors.

Availability: The software is available on http! ://jinny.kist.re.kr

Keyword: Bioimaging

TOP

PP08 (HT) - Guilt by association is the exception rather than the rule in gene networks

Date: Sunday, July 15 , 11:45 a.m. - 12:10 p.m.Room: 104A
Presenting author: Jesse Gillis , University of British Columbia, Canada

Session Chair: Yanay Ofran

Presentation Overview: Show/Hide

This paper concerns a central issue in the analysis of biological networks, which is how functional information can be discovered or exploited through their use. Our key finding is that almost all the available information on gene function is concentrated in a tiny part of networks. A striking demonstration is that a mouse gene network of 4.5 million edges can be reduced one with just 23 edges, while retaining key features commonly thought to involve widely distributed properties. At a basic level, the “guilt-by-association” approach that is practised by biologists all the time to study genes one-by-one does not scale up to networks, despite numerous claims to the contrary. Attempts to adjust or validate networks based on gene function are highly misleading, and attempts to predict gene function using computational means are based on deeply flawed assumptions. We offer concrete suggestions to help others avoid these pitfalls.

Keyword: Protein Interactions and Molecular Networks, Protein Interactions & Molecular Networks

TOP

PP09 (PT) - Nonparametric Bayesian Inference for Perturbed and Orthologous Gene Regulatory Networks

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 104B
Presenting author: Christopher A. Penfold , University of Warwick, United Kingdom

Additional authors:
Vicky Buchanan-Wollaston, University of Warwick
Katherine J. Denby, University of Warwick
David L. Wild, University of Warwick

Session Chair: Paul Horton

Presentation Overview: Show/Hide

The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. Here we outline a hierarchical, nonparametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions i.e., where switching events could potentially arise under the different treatments; and (ii) for inference in evolutionary related species in which orthologous GRNs exist. The hierarchical inference outperforms related (but non- hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one-hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to abiotic stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses.

Keyword: Gene Regulation and Transcriptomics

TOP

PP10 (PT) - Protein Subcellular Location Pattern Classification in Cellular Images Using Latent Discriminative Models

Date: Sunday, July 15 , 12:15 p.m. - 12:40 p.m.Room: Grand Ballroom
Presenting author: Jieyue Li , Carnegie Mellon University, United States

Additional authors:
Liang Xiong, Carnegie Mellon University
Robert Murphy, Carnegie Mellon University
Jeff Schneider, Carnegie Mellon University

Session Chair: Robert Murphy

Presentation Overview: Show/Hide

In human proteome, the subcellular location pattern is crucial for understanding the functions of a protein. This pattern is essentially characterized by the co-localization of the protein and the components in the cell. In this paper, we address the protein pattern classification problem based on the confocal immune-fluorescence cellular images from the Human Protein Atlas (HPA) project. In our HPA data set, each cell has the staining images of one protein and three reference components, and in the meanwhile there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. Compared to traditional cell based methods, this region based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient based methods to maximize the data likelihood. In the experiments, we show that the proposed models can both help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this paper for classifying $942$ proteins into $13$ classes of patterns is about $84.6\%$, which to our knowledge is the best so far. In addition, we can give these results biological interpretations.

Keyword: Bioimaging

TOP

PP11 (HT) - Putting genetic interactions in context through a global modular decomposition

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 104A
Presenting author: Chad Myers , University of Minnesota, United States

Additional authors:
Jeremy Bellay, University of Minnesota, United States
Gowtham Atluri, University of Minnesota, United States
Tina Sing, University of Toronto, Canada
Kiana Toufighi, Centre for Genomic Regulation, Spain
Michael Costanzo, University of Toronto, Canada
Philippe Souza Moraes Ribeiro, University of Minnesota, United States
Gaurav Pandey, Mount Sinai School of Medicine, United States
Joshua Baller, University of Minnesota, United States
Benjaim VanderSluis, University of Minnesota, United States
Magali Michaut, University of Toronto, Canada
Sangjo Han, University of Toronto, Canada
Philip Kim, University of Toronto, Canada
Grant Brown, University of Toronto, Canada
Brenda Andrews, University of Toronto, Canada
Charles Boone, University of Toronto, Canada
Vipin Kumar, University of Minnesota, United States

Session Chair: Yanay Ofran

Presentation Overview: Show/Hide

Genetic interactions provide a powerful perspective into biological processes that is fundamentally different from other high-throughput genome-wide studies. We developed a data mining approach based on association rule learning to exhaustively discover all statistically significant block structures within the yeast genetic interaction network, producing a complete modular decomposition of the network. This provides a first opportunity for a global, unbiased assessment of the structure of the genetic interaction network and the relationship between structure and individual gene function. The genetic interaction network is highly structured with over half of interactions appearing in modular structures, and genetic interactions contained within modules exhibit strikingly different functional properties relative to isolated interactions. In addition, gene module membership provides a specific and unbiased assessment of the prevalence of multi-functionality among genes. Our modular decomposition also provided a basis for testing the between-pathway model of negative genetic interactions and within-pathway model of positive genetic interactions.

Keyword: Protein Interactions and Molecular Networks, Gene Regulation & Transcriptomics

TOP

PP12 (PT) - NOrMAL: Accurate Nucleosome Positioning using a Modified Gaussian Mixture Model

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m. Room: 104B
Presenting author: Anton Polishko , UC Riverside, United States

Additional authors:
Nadia Ponts, UC Riverside
Karine Le Roch, UC Riverside
Stefano Lonardi, UC Riverside

Session Chair: Paul Horton

Presentation Overview: Show/Hide

Motivation: Nucleosomes are the basic elements of DNA chromatin structure. They control the packaging of DNA and play a critical role in gene regulation by allowing physical access to transcription factors. The advent of second-generation sequencing has enabled landmark genome-wide studies of nucleosome position for several model organisms. Current methods to determine nucleosome positioning first compute an occupancy coverage profile by mapping nucleosome-enriched sequenced reads to a reference genome; then, nucleosomes are placed according to the peaks of the coverage profile. These methods are quite accurate on placing isolated nucleosomes, but they do not properly handle "overlapping" nucleosomes. Also, they can only provide the positions of nucleosomes and their occupancy level, while it is very beneficial to supply molecular biologists additional information about nucleosomes like the probability of placement, the size of DNA fragments enriched for nucleosomes, and/or whether nucleosome are well-positioned or "fuzzy" in the sequenced cell sample. Results: We address these issues by providing a novel method based on a parametric probabilistic model. An expectation maximization (EM) algorithm is used to infer the parameters of the mixture of distributions. We compare the performance of our method on two real datasets against Template Filtering, which is considered the current state-of-the-art. Experimental results show that our method detects a significantly higher number of nucleosomes than Template Filtering.

Keyword: Gene Regulation and Transcriptomics

TOP

PP13 (HT) - Disrupting human pathways by minimal miRNA sets

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: Grand Ballroom
Presenting author: Ohad Balaga , The Hebrew University of Jerusalem, Israel

Additional authors:
Guy Naamati, The Hebrew University of Jerusalem, Israel
Yitzhak Friedman, The Hebrew University of Jerusalem, Israel
Michal Linial, The Hebrew University of Jerusalem, Israel

Session Chair: Lenore Cowen

Presentation Overview: Show/Hide

In human, over 1000 microRNAs (miRNAs) regulate the expression of about half of the genes. This study addresses the potential of a coordinated action of miRNAs to manipulate hundreds of human pathways. Specifically, we analyzed the effectiveness of disrupting the topology of human pathway graphs through a regulation by miRNAs. We will present the combination of our concept of miRNA ‘working together’ (Friedman et al., Bioinformatics, 2010) with the pathways’ topology considerations. From a set of miRNA candidates, an exhaustive search for all possible doubles and triplets (coined miR-Duo, miR-Trios) that impact the integrity of a pathway is performed. We will discuss the surprising finding that 85% of all pathways are effectively disconnected by a remarkably small number of miRNAs sets. Significantly, the combination of the most effective miR-Trios is unique for each pathway. The impact of the selected miR-Duo/Trios on various diseases will be discussed.

Keyword: Protein Interactions and Molecular Networks, Gene Regulation & Transcriptomics

TOP

PP14 (HT) - cn.MOPS: mixture of Poissons for discovering copy number variations in next generation sequencing data with a low false discovery rate

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 104A
Presenting author: Guenter Klambauer , Johannes Kepler University of Linz, Austria

Additional authors:
Karin Schwarzbauer, Johannes Kepler University of Linz, Austria
Andreas Mayr, Johannes Kepler University of Linz, Austria
Djork-Arné Clevert, Johannes Kepler University of Linz, Austria
Andreas Mitterecker, Johannes Kepler University of Linz, Austria
Ulrich Bodenhofer, Johannes Kepler University of Linz, Austria
Sepp Hochreiter, Johannes Kepler University of Linz, Austria

Session Chair: Eran Halperin

Presentation Overview: Show/Hide

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Technological or genomic variations in the depth of coverage lead to a high false discovery rate (FDR), even upon correction for GC content. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise, which is the reason for the superior performance.

Keyword: Population Genomics, Evolution & Comparative Genomics

TOP

PP15 (HT) - RNA secondary structure mediates alternative 3’ss selection in Saccharomyces cerevisiae

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 104B
Presenting author: Eduardo Eyras , Universitat Pompeu Fabra, Spain

Additional authors:
Mireya Plass, Universitat Pompeu Fabra, Spain
Josep Vilardell, CSIC, Spain

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

Splicing is generally regulated by protein factors binding the pre-mRNA. Yeast lacks many of the splicing factors present in metazoans; hence it is thought to have limited regulated splicing. We present experimental evidence that the structure adopted by the pre-mRNA can function as a regulator of 3’ splice site selection in yeast, bringing the selected site close to the branch-site (BS) and occluding the rest.

Based on these observations we built a computational classifier that explains most of the annotated 3’ss in yeast. Our model also predicts the usage of alternative 3’ss at low and/or high temperatures, some of which we validated experimentally. Our results are consistent with the presence of alternative 3’ss selection in yeast that is mediated by the pre-mRNA structure, which can be responsive to external cues, like temperature, and which is possibly related to the control of gene expression.

Keyword: Gene Regulation and Transcriptomics, Sequence Analysis

TOP

PP16 (HT) - Identification of a Novel Class of Farnesylation Targets by Structure-Based Modeling of Binding Specificity

Date: Sunday, July 15 , 3:00 p.m. - 3:25 p.m.Room: Grand Ballroom
Presenting author: Ora Schueler-Furman , The Hebrew University, Israel

Additional authors:
Nir London, Hebrew University, Israel
Hougland James, Syracuse University, United States
Carol Fierke, University of Michigan, United States
Yousef Abu-Kwaik, University of Louisville, United States
Tasneem Al-Qadan, University of Louisville, United States
Christopher Price, University of Louisville, United States

Session Chair: Lenore Cowen

Presentation Overview: Show/Hide

Prenylation is an important post-translational modification in which a lipid prenyl group is covalently attached to a protein, thereby changing its functional role. As an example, ras uses this mechanism to reach the membrane where it is active.

In this talk I will describe our recent work on the structure-based modeling of prenylation substrates based on Rosetta FlexPepDock, our peptide docking protocol. Based on structural models of the c-terminal peptide sequence of a protein bound to the enzyme farnesyltransferase, our protocol FlexPepBind identifies both known and novel farnesylation substrates. In vitro validation of the latter demonstrates the high accuracy of this approach: 26/29 peptides are indeed farnesylated. Application of our protocol to human as well as pathogenic genomes has identified many new and interesting targets.This work provides a link between the structure of a peptide-protein complex to its biological importance.

Keyword: Protein Interactions and Molecular Networks, Protein Structure & Function

TOP

PP17 (HT) - Systematic Detection of Epistatic Interactions Based on Allele Pair Frequencies

Date: Sunday, July 15 , 3:00 p.m. - 3:25 p.m.Room: 104A
Presenting author: Marit Ackermann , Technical University Dresden, Germany

Additional authors:
Andreas Beyer, Technical University Dresden, Germany

Session Chair: Eran Halperin

Presentation Overview: Show/Hide

Epistatic interactions between genes are crucial for understanding the molecular mechanisms of complex diseases. While systematic testing of genetic interactions with an impact on physiological fitness is possible in simple model organisms, such screens have not been successful in mammals. Here, we propose a computational screening method that only requires genotype information of family trios for predicting epistasis. Based on a Chi-squared test approach, it detects the under-representation of allele pairs in a given population.
We tested our framework on a set of 2,000 heterozygous mice and found 168 imbalanced allele pairs, which is substantially more than expected by chance. We confirmed many of the interactions using independent data and found that interacting loci are enriched for developmental genes. The number of imbalanced allele pairs that we detected is surprisingly large and was not expected based on published evidence. This framework sets the stage for similar work in human trios.

Keyword: Population Genomics

TOP

PP18 (HT) - Deciphering the Gene Translation Code and its Modeling

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 104B
Presenting author: Tamir Tuller , Tel-Aviv University, Israel

Additional authors:
Hadas Zur, Tel Aviv University, Israel
Nir Gazit , Tel Aviv University, Israel
Marin Kupiec, Tel Aviv University, Israel
Eytan Ruppin , Tel Aviv University, Israel
Michal Ziv-Ukelson, Ben Gurion University, Israel
Isana Veksler-Lublinsky, Ben Gurion University, Israel

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

Gene translation is a central process in all living organisms. Thus, attaining a better understanding of this complex process has ramifications to every biomedical discipline. In this talk, I will survey recent results related to this topic.
I will show that features of the transcript, such as its folding strength, the adaptation of its codons to the tRNA pool, and the charge of the amino acids encoded in it, contribute to translation efficiency in a causal and/or non-causal way. Specifically, highly expressed genes have stronger mRNA folding, possibly to prevent aggregation of mRNA molecules. In addition, each of these features contributes to: 1) The spatial distribution of ribosomes along transcripts; 2) Slowing down ribosomes at the beginning of the coding regions, presumably to reduce ribosomal traffic-jams and decrease the translation cost.
I will also demonstrate how these results can be integrated into a comprehensive computational predictive model of translation.

Keyword: Gene Regulation and Transcriptomics, Applied Bioinformatics

TOP

PP19 (PT) - How networks change with time

Date: Sunday, July 15 , 3:30 p.m. - 3:55 p.m.Room: Grand Ballroom
Presenting author: Yongjin Park , Johns Hopkins University, United States

Additional authors:
Joel Bader, Johns Hopkins University

Session Chair: Lenore Cowen

Presentation Overview: Show/Hide

Motivation: Biological networks change in response to genetic and environmental cues. Changes are reflected in the abundances of biomolecules, the composition of protein complexes, and other descriptors of the biological state. Methods to infer the dynamic state of a cell would have great value for understanding how cells change over time to accomplish biological goals. Results: A new method predicts the dynamic state of protein complexes in a cell, with protein expression inferred from transcription profile time courses and protein complexes inferred by joint analysis of protein co-expression and protein-protein interaction maps. Two algorithmic advances are presented: a new method, DHAC (Dynamical Hierarchical Agglomerative Clustering), for clustering time-evolving networks; and a companion method, MATCH-EM, for matching corresponding clusters across time-points. With link prediction as an objective assessment metric, DHAC provides a substantial advance over existing clustering methods. An application to the yeast metabolic cycle demonstrates how waves of gene expression correspond to individual protein complexes. Our results suggest regulatory mechanisms for assembling the mitochondrial ribosome and illustrate dynamic changes in the components of the nuclear pore. Availability: All source code and data will be available through a BSD open source license as supplementary material and at www.baderzone.org.

Keyword: Protein Interactions and Molecular Networks

TOP

PP20 (PT) - Leveraging Input and Output Structures For Joint Mapping of Epistatic and Marginal eQTLs

Date: Sunday, July 15 , 3:30 p.m. - 3:55 p.m.Room: 104A
Presenting author: Seunghak Lee , Carnegie Mellon University, United States

Additional authors:
Eric Xing, Carnegie Mellon University

Session Chair: Eran Halperin

Presentation Overview: Show/Hide

Motivation: Since many complex disease and expression phenotypes are the outcome of intricate perturbation of molecular networks underlying gene regulation resulted from interdependent genome variations, association mapping of causal QTLs or eQTLs must consider both additive and epistatic effects of multiple candidate genotypes. This problem poses a significant challenge to contemporary genome-wide-association (GWA) mapping technologies because of its computational complexity. Fortunately, a plethora of recent developments in biological network community, especially the availability of genetic interaction networks, make it possible to construct informative priors of complex interactions between genotypes, which can substantially reduce the complexity and increase the statistical power of GWA inference. Results: In this paper, we consider the problem of learning a multi-task regression model while taking advantage of the prior information on structures on both the inputs (genetic variations) and outputs (expression levels). We propose a novel regularization scheme over multi-task regression called structured jointly input/output lasso based on an L1/L2 norm, which allows shared sparsity patterns for related inputs and outputs to be optimally estimated. Such patterns capture multiple related SNPs that jointly influence multiple related expression traits. In addition, we generalize this new multi-task regression to structurally regularized polynomial regression to detect epistatic interactions with manageable complexity by exploiting the prior knowledge on candidate epistatic SNPs from biological experiments. We demonstrate our method on simulated and yeast eQTL datasets.

Keyword: Population Genomics

TOP

PP21 (PT) - Lineage based identification of cellular states and expression programs

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 104B
Presenting author: Tatsunori Hashimoto , Massachusetts Institute of Technology, United States

Additional authors:
Tommi Jaakkola, Massachusetts Institute of Technology
Richard Sherwood, Brigham and Women's Hospita
Esteban Mazzoni, Columbia University Medical Center
Hynek Witchterle, Columbia University Medical Center
David Gifford, Massachusetts Institute of Technology

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

We present a method, Lineage Program, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 based methods controls the parameters in three distinct ways: the number of genes which change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization techniques such as singular value decomposition and nonnegative matrix factorization show that our method provides higher predictive power in held-out tests while inducing sparse and biologically relevant gene sets.

Keyword: Gene Regulation and Transcriptomics

TOP

PP22 (PT) - A single-source k shortest paths algorithm to infer regulatory pathways in a gene network

Date: Sunday, July 15 , 4:00 p.m. - 4:25 p.m.Room: Grand Ballroom
Presenting author: Yu-Keng Shih , The Ohio State University, United States

Additional authors:
Srinivasan Parthasarathy, The Ohio State University

Session Chair: Lenore Cowen

Presentation Overview: Show/Hide

Motivation: Inferring the underlying signaling pathways within a gene interaction network is a fundamental problem in Systems Biology to help understand the complex interactions and the transmission and flow of information within a system-of-interest. Given a weighted gene network and a gene in this network, the goal of an inference algorithm is to identify the potential signaling pathways passing through this gene. Results: In a departure from previous approaches that largely rely on the random walk model, we propose a novel single-source $k$ shortest paths based algorithm to address this inference problem. An important element of our approach is to explicitly account for and enhance the diversity of paths discovered by our algorithm. The intuition here is that diversity in paths can help enrich different functions and thereby better position one to understand the underlying system-of-interest. Results on the yeast gene network demonstrate the utility of the proposed approach over extant state-of-the-art inference algorithms. Beyond utility, our algorithm achieves a significant speedup over these baselines.

Keyword: Protein Interactions and Molecular Networks

TOP

PP23 (PT) - Incorporating Prior Information into Association Studies

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 104A
Presenting author: Gregory Darnell , UCLA , United States

Additional authors:
Dat Duong, University of California Berkeley
Buhm Han, University of California
Eleazar Eskin, UCLA

Session Chair: Eran Halperin

Presentation Overview: Show/Hide

Recent technological developments in measuring genetic variation have ushered in an era of genome wide association studies which have discovered many genes involved in human disease. Current methods to perform association studies collect genetic information and compare the frequency of variants in a individuals who with and without the disease. Standard approaches do not take into account any information on whether or not a given variant is likely to have an effect on the disease. We propose a novel method for computing an association statistics which takes into account prior information. Our method improves both power and resolution by 43.5% and 45%, repsectively, over traditional methods for performing association studies when applied to simulations using the HapMap data. Advantages of our method are that it is as simple to apply to association studies as standard methods, the results of the method are intepretable since the method reports p-values, and the method is optimal in its use of prior information in regards to statistical power.

Keyword: Population Genomics

TOP

PP24 (PT) - Matching experiments across species using expression values and textual information

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 104B
Presenting author: Aaron Wise , Carnegie Mellon University, United States

Additional authors:
Zoltan Oltvai, University of Pittsburgh
Ziv Bar-Joseph, Carnegie Mellon University

Session Chair: Janet Kelso

Presentation Overview: Show/Hide

Motivation: With the vast increase in the number of gene expression datasets deposited in public databases, novel techniques are required to analyze and mine this wealth of data. Similar to the way BLAST enables cross-species comparison of sequence data, tools that enable cross-species expression comparison will allow us to better utilize these datasets: Cross-species expression comparison enables us to address questions in evolution and development, and further allows the identification of disease related genes and pathways that play similar roles in humans and model organisms. Unlike sequence, which is static, expression data changes over time and under different conditions. Thus, a prerequisite for performing cross-species analysis is the ability to match experiments across species. Results: To enable better cross-species comparisons, we developed methods for automatically identifying pairs of similar expression datasets across species. Our method uses a co-training algorithm to combine a model of expression similarity with a model of the text which accompanies the expression experiments. The co-training method outperforms previous methods based on expression similarity alone. Using expert analysis, we show that the new matches identified by our method indeed capture biological similarities across species. We then use the matched expression pairs between human and mouse to recover known and novel cycling genes as well as to identify genes with possible involvement in diabetes. By providing the ability to identify novel candidate genes in model organisms, our method opens the door to new models for studying diseases.

Keyword: Gene Regulation and Transcriptomics

TOP

PP25 (HT) - HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Date: Monday, July 16, 10:45 a.m. - 11:10 a.m.Room: Grand Ballroom
Presenting author: Johannes Soeding , Ludwig-Maximilians-Univeristaet Muenchen, Germany

Additional authors:
Michael Remmert, Ludwig-Maximilians-Univeristaet Muenchen, Germany
Andreas Biegert, genedata.com, Germany
Andreas Hauser, Ludwig-Maximilians-Univeristaet Muenchen, Germany

Session Chair: David Gifford

Presentation Overview: Show/Hide

Sequence-based protein function and structure prediction depends critically on sequence-search sensitivity and accuracy of the resulting sequence alignments. I will present HHblits (HMM-HMM–based lightning-fast iterative sequence search), an open-source, general-purpose search tool, which represents both query and database sequences by profile-hidden hidden Markov models (HMMs). Compared to the PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50–100% higher sensitivity and generates more accurate alignments. It thus has the potential to improve many downstream analysis and prediction methods. I will first explain how HHblits achieves its sensitivity and speed and then show benchmarks and biological applications. If possible, I will finish by a short software demo.

Keyword: Applied Bioinformatics, Applied Bioinformatics

TOP

PP26 (PT) - Toward 3D structure prediction of large RNA molecules: An integer programming framework to insert local 3D motifs in RNA secondary structure

Date: Monday, July 16, 10:45 a.m. - 11:10 a.m.Room: 104A
Presenting author: Vladimir Reinharz , McGill University, Canada

Additional authors:
Francois Major, Institute for Research in Immunology and Cancer
Jerome Waldispuhl, McGill University

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: The prediction of RNA three-dimensional structures from its sequence only is a milestone to RNA function analysis and prediction. In recent years, many methods addressed this challenge, ranging from cycle decomposition and fragment assembly to molecular dynamics simulations. However, their predictions remain fragile and limited to small RNAs. To expand the range and accuracy of these techniques, we need to develop algorithms that will enable to use all the structural information available. In particular, the energetic contribution of secondary structure interactions is now well documented, but the quantification of non-canonical interactions – those shaping the tertiary structure – is poorly understood. Nonetheless, even if a complete RNA tertiary structure energy model is currently unavailable, we now have catalogues of local 3D structural motifs including non-canonical base pairings. A practical objective is thus to develop techniques enabling us to use this knowledge for robust RNA tertiary structure predictors. Results: In this work, we introduce RNA-MoIP, a program that benefits from the progresses made over the last 30 years in the field of RNA secondary structure prediction and expands these methods to incorporate the novel local motif information available in databases. Using an integer programming framework, our method refines predicted secondary structures (i.e. removes incorrect canonical base-pairs) to accommodate the insertion of RNA 3D motifs (i.e. hairpins, internal loops and k-way junctions). Then, we use predictions as templates to generate complete 3D structures with the MC-Sym program. We benchmarked RNA-MoIP on a set of 9 RNAs with sizes varying from 53 to 128 nucleotides. We show that our approach (i) improves the accuracy of canonical base pair predictions, (ii) identifies the best secondary structures in a pool of sub-optimal structures, and (iii) predicts accurate 3D structures of large RNA molecules. RNA-MoIP is publicly available at: http://csb.cs.mcgill.ca/RNAMoIP

Keyword: Sequence Analysis

TOP

PP27 (HT) - Differential oestrogen receptor binding is associated with clinical outcome in breast cancer

Date: Monday, July 16, 10:45 a.m. - 11:10 a.m.Room: 104B
Presenting author: Rory Stark , Cancer Research UK, uk

Additional authors:
Caryn Ross-Innes, Hutchison/MRC, United Kingdom
Teschendorff Andrew, University College London, United Kingdom
Holmes Kelly, Cancer Research UK, United Kingdom
Raza Ali, Cancer Research UK, United Kingdom
Mark Dunning, Cancer Research UK, United Kingdom
Gordon Brown, Cancer Research UK, United Kingdom
Ondrej Gojis, Charles University, Cz
Ian Ellis, Nottingham University, United Kingdom
Andrew Green, Nottingham University, United Kingdom
Simak Ali, Imperial College London, United Kingdom
Suet-Feung Chin, Cancer Research UK, United Kingdom
Carlo Palmieri, Imperial College London, United Kingdom
Carlos Caldas, Cancer Research UK, United Kingdom
Jason Carroll, Cancer Research UK, United Kingdom

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

In this paper, which maps ERα binding via ChIP-seq in tumour tissue from twenty ER+ breast cancer patients, we develop a novel technique for quantitative differential analysis of protein/DNA binding events, identifying ERα sites significantly differentially bound between good prognosis patients vs. those with poor prognosis and metastases. Gene signatures that predict clinical outcome in ER+ disease, validated in publically available breast cancer gene expression datasets, are derived from these sites. These signatures are enriched for genes with relevant proximal cis-regulatory events. Statistical characterization of differentially bound ERα sites enables further downstream analysis, including identification of a differentially enriched motif for the transcription factor FoxA1. Focusing our analysis on differential binding in primary tumour material allows us to show distinct combinations of cis-regulatory elements linked with the different clinical outcomes. These techniques are applicable to other cancers (and indeed other diseases) where master transcription factor regulators are known.

Keyword: Gene Regulation and Transcriptomics, Sequence Analysis

TOP

PP28 (HT) - A structural systems biology approach to polypharmacological drug discovery

Date: Monday, July 16 , 11:15 a.m. - 11:40 a.m.Room: Grand Ballroom
Presenting author: Lei Xie , The City University of New York, United States

Additional authors:
Li Xie, University of California, San Diego , United States
Philip Bourne, University of California, San Diego , United States
Thomas Evangelidis, Biomedical Research Foundation Academy of Athens, Greece

Session Chair: David Gifford

Presentation Overview: Show/Hide

The conventional approach to drug discovery of “one drug – one target – one disease” is insufficient, especially for complex diseases. This inadequacy is partially addressed by accepting the notion of polypharmacology – one drug is likely to bind to multiple targets with varying affinity. However, to identify proteome-wide multiple targets for a drug is a complex and challenging task. We have developed a structural systems biology approach to quantitatively predict potential off-targets for known drugs. This method is applied to identify human off-targets for Nelfinavir, an antiretroviral drug with anti-cancer behavior. We propose inhibition by Nelfinavir of multiple protein kinases. We suggest that broad-spectrum low affinity binding by a drug or drugs to multiple targets may lead to a collective effect important in treating complex diseases such as cancer.

Keyword: Protein Structure and Function, Protein Interactions & Molecular Networks

TOP

PP29 (PT) - Identification of Sequence-Structure RNA Binding Motifs for SELEX Derived Aptamers

Date: Monday, July 16, 11:15 a.m. - 11:40 a.m.Room: 104A
Presenting author: Jan Hoinka , NCBI, NIH, United States

Additional authors:
Elena Zotenko, Garvan Institute for Medical Research
Adam Friedman, UNC Chapel Hill
Zuben E. Sauna, US Food and Drug Administration
Teresa Przytycka, NIH

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: Systematic Evolution of Ligands by EXponential Enrichment (SELEX) represents a state of the art technology to isolate single stranded (ribo)nucleic acid fragments, named aptamers, that bind to a molecule (or molecules) of interest via specific structural regions induced by their sequence dependent fold. This powerful method has applications in designing protein inhibitors, molecular detection systems, therapeutic drugs, and antibody replacement among others. However, full understanding and consequently optimal utilization of the process has lagged behind it's wide application due to the lack of dedicated computational approaches. At the same time the combination of SELEX with novel sequencing technologies is beginning to provide the data that will allow the examination of a variety of properties of the selection process. Results: To close this gap we developed, Aptamotif, a computational method for the identification of sequence-structure motifs in SELEX derived aptamers. To increase the chances of identifying functional motifs, Aptamotif uses an ensemble based approach. Our new algorithmic solutions are accompanied with rigorous statistical analysis. We validated the method using two published aptamer datasets containing experimentally determined motifs of increasing complexity. We were able to recreate the authors findings to a high degree, thus proving the capability of our approach to identify binding motifs in SELEX data. Additionally, using our new experimental dataset, we illustrate the application of Aptamotif to elucidate several properties of the selection process.

Keyword: Sequence Analysis / RNA

TOP

PP30 (HT) - Construction of regulatory networks using expression time-series data of a genotyped population

Date: Monday, July 16, 11:15 a.m. - 11:40 a.m.Room: 104B
Presenting author: Ka Yee Yeung , University of Washington, United States

Additional authors:
Kenneth Dombek, University of Washington, United States
Kenneth Lo, University of Washington, United States
John Mittler, University of Washington, United States
Jun Zhu, Sage Bionetworks, United States
Eric Schadt, Pacific Biosciences, United States
Roger Bumgarner, University of Washington, United States
Adrian Raftery, University of Washington, United States

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

The goal of network inference is to generate testable hypotheses of gene-to-gene influences and subsequently design bench experiments to confirm network predictions. In [Yeung et al. 2011], we used both time-series and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generated time-series gene-expression data profiling 95 genotyped yeast segregants subjected to a drug perturbation. We developed a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We showed that our inferred network recovered existing and novel regulatory relationships, and discovered de novo transcription-factor binding sites. We generated independent microarray data on selected deletion mutants to prospectively test network predictions.

Keyword: Gene Regulation and Transcriptomics, Applied Bioinformatics

TOP

PP31 (HT) - Integrating energy calculations with functional assays to decipher the specificity of G-protein inactivation by RGS proteins

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: Grand Ballroom
Presenting author: Mickey Kosloff , University of Haifa, Israel

Additional authors:
Vadim Arshavsky, Duke University, United States
Amanda Travis, Duke University, United States
Dustin Bosch, University of North Carolina at Chapel Hill, United States
David Siderovski, University of North Carolina at Chapel Hill, United States

Session Chair: David Gifford

Presentation Overview: Show/Hide

Cellular signaling requires that particular protein-protein interactions be tailored to each signaling cascade with either broad or narrow specificity. Understanding the structural code for such selectivity is a major goal in signal transduction research, as well as in drug design. Yet, beyond single representative examples, little is known of how specificity is determined among large protein families, including those involved in signal transduction.

The talk will present a “bottom-up” approach to decipher interaction specificity, using G-protein signaling as a model system. This approach integrates experimental and structure-based energy calculations to map specificity determinants at the protein family level. The resulting residue-level maps are then used to redesign proteins with altered activities and specificities, offering new insights into G-protein signaling and paving the way for the rewiring of signaling networks at the cellular level.

Keyword: Protein Structure and Function, Protein Structure & Function

TOP

PP32 (PT) - GraphClust: alignment-free structural clustering of local RNA secondary structures

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: 104A
Presenting author: Fabrizio Costa , Albert-Ludwigs-University Freiburg, Germany

Additional authors:
Fabrizio Costa, Albert-Ludwigs-University Freiburg
Dominic Rose, Albert-Ludwigs-University Freiburg
Rolf Backofen, Albert-Ludwigs-University Freiburg

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivation: Clustering according to sequence-structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs. Results: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence-structure methods although achieving speed-ups of several orders of magnitude. A selection of applications aiming at the detection of novel structural non-coding RNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements.

Keyword: Sequence Analysis / RNA

TOP

PP33 (HT) - A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: 104B
Presenting author: Xiaotu Ma Ma , The University of Texas at Dallas, United States

Additional authors:
Ashwinikumar Kulkarni, The University of Texas at Dallas, United States
Zhihua Zhang, The University of Texas at Dallas, United States
Zhenyu Xuan, The University of Texas at Dallas, United States
Michael Zhang, The University of Texas at Dallas, United States

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes.

Keyword: Gene Regulation and Transcriptomics, Sequence Analysis

TOP

PP34 (HT) - Text Mining Improves Prediction of Protein Functional Sites

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: Grand Ballroom
Presenting author: Karin Verspoor , National ICT Australia, Australia

Additional authors:
Michael Wall, Los Alamos National Laboratory, United States
Judith Cohn, Los Alamos National Laboratory, United States
Komandur Ravikumar, Mayo Clinic, United States

Session Chair: David Gifford

Presentation Overview: Show/Hide

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites in about 100,000 publicly available protein structures. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

Keyword: Protein Structure and Function, Text Mining

TOP

PP35 (PT) - Detection of Allele-Specific Methylations through a Generalized Heterogeneous Epigenome Model

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: 104A
Presenting author: Qian Peng , UCSD, United States

Additional authors:
Joseph Ecker, The Salk Institute for Biological Studies

Session Chair: Cenk Sahinalp

Presentation Overview: Show/Hide

Motivations: High throughput sequencing has made it possible to sequence DNA methylation of a whole genome at the single-base resolution. A sample however may contain a number of distinct methylation patterns. For instance, cells of different types and in different developmental stages may have different methylation patterns. Alleles may be differentially methylated, which may partially explain that the large portions of epigenomes from single cell types are partially methylated, and may have ma jor effects on transcriptional output. Approaches relying on DNA sequence polymorphism to identify individual patterns from a mixture of heterogeneous epigenomes are insufficient as methylcytosines occur at a much higher density than SNPs. Results: We have developed a mixture model-based approach for resolving distinct epigenomes from a heterogeneous sample. In particular, the model is applied for the detection of allele-specific methylations (ASM). The methods are tested on a synthetic methylome and applied to an Arabidopsis single root cell methylome.

Keyword: Sequence Analysis / RNA

TOP

PP36 (HT) - Functional conservation of enhancers without sequence conservation

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: 104B
Presenting author: Ivan Ovcharenko , NIH, United States

Additional authors:
Leila Taher, NIH, United States
Andrew McCallion, JHU, United States
Marcelo Nobrega, University of Chicago, United States

Session Chair: Carl Kingsford

Presentation Overview: Show/Hide

Enhancers often diverge much faster than exonic sequence. The role of gene regulatory changes in adaptation of species is one of the factors leading to the accelerated rate of enhancer sequence divergence, and the plasticity of the underlying enhancer encoding is the other contributor. We developed a computational approach capable of using DNA sequence motifs within enhancers to identify their functional orthologs when their sequence diverged beyond recognition by the classical alignment methods. Experimental validation confirmed the enhancer activity of 88% of our functional ortholog predictions. Moreover, 71% of the tested predicted functional enhancer othrolog pairs directed largely identical patterns of expression in zebrafish embryos, confirming both the sensitivity and accuracy of our method. Our study argues that motif composition is often necessary to retain and sufficient to predict regulatory function in the absence of overt sequence conservation, revealing an entire class of functionally conserved, evolutionarily diverged regulatory elements.

Keyword: Gene Regulation and Transcriptomics, Evolution & Comparative Genomics

TOP

PP37 (HT) - Why CDRs are not what you think they are or How to identify the real antigen binding sites

Date: Monday, July 16, 2:30 p.m. - 2:55 p.m.Room: 202 B/C
Presenting author: Vered Kunik , Bar Ilan University, Israel

Additional authors:
Yanay Ofran, Bar Ilan University, Israel
Bjoern Peters, La Jolla Institute for Allergy and Immunology, United States

Session Chair: Bonnie Berger

Presentation Overview: Show/Hide

Identification of the residues within an antibody (Ab) that recognize and bind the antigen (Ag), which is at the heart of immunological research, is typically done using computational tools for identifying the so called Complementarity Determining Regions (CDRs). We show that CDRs identification tools miss up to 22% of the residues that actually bind the Ag. We show that essentially all antigen binding residues are located within structural consensus regions between antibodies and that these regions could be identified from sequence. Moreover, we demonstrate that Ag binding residues that fall within Ab structural consensus regions and are not identified by the most commonly used CDR identification methods, have a substantial energetic contribution to Ag binding. Finally, we suggest a computational tool for the identification of Ag binding site from Ab sequence and we show that this tool identifies 94% of the residues that actually bind the Ag.

Keyword: Protein Structure and Function

TOP

PP38 (HT) - Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population

Date: Monday, July 16, 2:30 p.m. - 2:55 p.m.Room: 104A
Presenting author: Nir Ben-Tal , Tel Aviv University, Israel

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

The constellation of molecular factors leading to the emergence of the human pandemic H1N1 (pH1N1) influenza A virus in 2009 is unclear. Using a computational approach, we identified molecular determinants that may discriminate this strain from other strains. Amino acid positions discriminating pH1N1 from seasonal human strains were located in or near known antigenic sites on the hemagglutinin (HA) protein, thus camouflaging pH1N1 from immune recognition. We also detected positions in HA differentiating classical swine viruses from pH1N1. These positions were mostly located around the receptor-binding pocket, possibly influencing binding affinity to the human cell. Such alterations may be liable in part for the virus’s efficient infection and adaptation to humans. Significantly, we showed that the substitutions R133AK and R149K, predicted to be pH1N1 characteristics, each altered virus binding to erythrocytes and conferred virulence to A/swine/NC/18161/02 in mice, reinforcing the computational findings reported here.

Keyword: Sequence Analysis, Protein Structure & Function

TOP

PP39 (HT) - Mapping and analysis of chromatin state dynamics in nine human cell types

Date: Monday, July 16, 2:30 p.m. - 2:55 p.m.Room: 104B
Presenting author: Jason Ernst , University of California Los Angelels, United States

Additional authors:
Pouya Kheradpour, Massachusetts Institute of Technology, United States
Tarjei Mikkelsen, Broad Institute, United States
Noam Shoresh, Broad Institute, United States
Lucas Ward, Broad Institute, United States
Charles Epstein, Broad Institute, United States
Xiaolan Zhang, Broad Institute, United States
Li Wang, Broad Institute, United States
Robyn Issner, Broad Institute, United States
Michael Coyne, Broad Institute, United States
Manching Ku, Massachusetts General Hospital, United States
Timothy Durham, Broad Institute, United States
Manolis Kellis, Massachusetts Institute of Technology, United States
Bradley Bernstein, Massachusetts General Hospital, United States

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity. The approach is especially well suited to the characterization of non-coding portions of the genome, which critically contribute to cellular phenotypes yet remain largely uncharted. Using maps of nine chromatin marks across nine cell types we systematically characterize regulatory elements, their cell-type specificities and their functional interactions. Focusing on cell-type-specific patterns of promoters and enhancers, we define multicell activity profiles for chromatin state, gene expression, regulatory motif enrichment and regulator expression. We then link enhancers to putative target genes, and predict the cell-type-specific activators and repressors that modulate them. The resulting annotations and regulatory predictions have implications for the interpretation of genome-wide association studies. Top-scoring disease SNPs are frequently positioned within enhancer elements specifically active in relevant cell types. Our study presents a general framework for deciphering cis-regulatory connections and their roles in disease.

Keyword: Gene Regulation and Transcriptomics

TOP

PP40 (HT) - Image-derived, Three-dimensional Generative Models of Cellular Organization

Date: Monday, July 16, 2:30 p.m. - 2:55 p.m.Room: 104C
Presenting author: Robert Murphy , Carnegie Mellon University, United States

Additional authors:
Tao Peng, Microsoft Research, United States

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Computational modeling of cell behavior requires information on the spatiotemporal distribution of proteins. We previously developed the first system for automatically constructing generative models of subcellular location directly from microscope images (Zhao & Murphy, Cytometry 71A, 978-990, 2007). Those models were for 2D images, and the Peng & Murphy 2011 paper made the crucial extension to 3D. The Murphy 2011 paper described using these models for active learning of the effects of many perturbagens on many proteins. Subsequent work has integrated these approaches with generative models of microtubules into a cohesive, open source system, CellOrganizer (http://CellOrganizer.org). The system can output images as an idealized cell or as a convolved image as might have been acquired with a specific microscope. The former is suitable for use in cell simulations, while the latter is useful for testing analysis software with images for which the ground truth is known.

Keyword: Bioimaging & Data Visualization, other

TOP

PP41 (HT) - Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach

Date: Monday, July 16, 3:00 p.m. - 3:25 p.m.Room: 202 B/C
Presenting author: Keren Lasker , Stanford University, United States

Additional authors:
Friedrich Förster, Max-Planck-Institute of Biochemistry, Germany
Stefan Bohn, Max-Planck-Institute of Biochemistry, Germany
Thomas Walzthoeni, University of Zürich, Switzerland
Elizabeth Villa, Max-Planck-Institute of Biochemistry, Germany
Pia Unverdorben, Max-Planck-Institute of Biochemistry, Germany
Florian Beck, Max-Planck-Institute of Biochemistry, Germany
Ruedi Aebersold, University of Zürich, Switzerland
Andrej Sali, University of California San Francisco, United States
Wolfgang Baumeister, Max-Planck-Institute of Biochemistry, Germany

Session Chair: Bonnie Berger

Presentation Overview: Show/Hide

In eukaryotes, the ubiquitin–proteasome pathway regulates fundamental cellular processes. The 26S proteasome resides at the downstream end of the pathway and degrades defective proteins. While the structure of its 20S core particle (CP) has been determined by X-ray crystallography, the structure of the 19S regulatory particle (RP), which recruits substrates and translocates them to the CP for degradation, has remained elusive. We have revealed the entire structure of the RP and describe a completed molecular architecture of the 26S proteasome. By integrating data from cryo-electron microscopy, X-ray crystallography, residue-specific chemical cross-linking, and additional proteomics techniques, we were able to produce a more accurate and higher resolution structural model than any of the data sets alone can provide. In addition, we have identified previously unpublished protein- protein interactions. The modular structure of the proteasome provides insights into the sequence of events that occur prior to the degradation of ubiquitylated substrates.

Keyword: Protein Structure and Function, Protein Interactions & Molecular Networks

TOP

PP42 (HT) - Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels

Date: Monday, July 16, 3:00 p.m. - 3:25 p.m.Room: 104A
Presenting author: Marcel Schulz , Carnegie Mellon University, United States

Additional authors:
Daniel Zerbino, University of California Santa Cruz, United States
Martin Vingron, Max Planck Institute for Molecular Genetics, Germany
Ewan Birney, European Bioinformatics Institute, United Kingdom

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Next generation sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics for genetics and medical research. De novo transcriptome assembly has become a feasible alternative for transcriptome analysis of novel model organisms, as de novo genome assembly is still a time-consuming process. De novo transcriptome assembly has other important applications for example gene-fusion detection in cancer or detection of trans-splicing events.

This talk will introduce the Oases de novo transcriptome assembler that exploits the relationship between de Bruijn Graphs and Splicing graphs to accurately model alternative gene isoforms in RNA-Seq data. The dynamic range of expression levels, alternative splicing events and repetitive sequences make de novo transcriptome assembly a challenging task and we will show how to strike the balance to deal with these overlapping problems. Further, the talk will reveal new insights into the importance of RNA-Seq data preprocessing and its’ tremendous effect on assembly performance.

Keyword: Sequence Analysis, Gene Regulation & Transcriptomics

TOP

PP43 (HT) - Proteomics Signature Profiling (PSP): A novel contextualization approach applied towards cancer proteomics

Date: Monday, July 16 , 3:00 p.m. - 3:25 p.m.Room: 104B
Presenting author: Wilson Wen Bin Goh , Imperial College London, uk

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Traditional proteomics analysis is plagued by the use of arbitrary thresholds resulting in large loss of information. We propose here a novel method utilizing all detected proteins. Its efficacy is demonstrated in a liver cancer proteomics screen. Utilizing biological and predicted complexes, a Proteomics Signa?ture Profile (PSP) for each patient was derived. Although consistency of individual proteins between patients is low, we found the reported proteins tend to hit clusters in a meaningful and informative manner. By extracting this information in the form of a Proteomics Signature Profile, we confirm that this information is conserved and can be used for (1) clustering of patient samples, (2) identification of significant clusters based on real biological complexes, and (3) overcoming consistency and coverage issues prevalent in proteomics data sets.

Keyword: Mass Spectrometry and Proteomics, Mass Spectrometry & Proteomics

TOP

PP44 (HT) - Toward interoperable bioscience data

Date: Monday, July 16, 3:00 p.m. - 3:25 p.m.Room: 104C
Presenting author: Susanna-Assunta Sansone , University of Oxford, uk

Additional authors:
Philippe Rocca-Serra, University of Oxford, United Kingdom
Eamonn Maguire, University of Oxford, United Kingdom
Dawn Field, NERC, United Kingdom
Chris Taylor, EMBL, United Kingdom
Oliver Hofmann, Harvard School of Public Health, United States
Hong Fang, ICF International Company, United States
Steffen Neumann, Leibniz Institute of Plant Biochemistry, Germany
Weida Tong, FDA, United States
Linda Amaral-Zettler, Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, International Census of Marine Microbes, United States
Kimberly Begley, Ontario Institute for Cancer Research, Canada
Tim Booth, NERC, United Kingdom
Lydie Bougueleret, SIB, Switzerland
Gully Burns, Information Sciences Institute, United States
Brad Chapman, Harvard School of Public Health, United States
Tim Clark, Harvard Medical School, United States
Lee-Ann Coleman, The British Library, United Kingdom
Jay Copeland, Harvard Medical School, United States
Sudeshna Das, Harvard Medical School, United States
Antoine de Daruvar, Université de Bordeaux, France
Paula de Matos, EMBL, United Kingdom
Ian Dix, AstraZeneca, United Kingdom
Scott Edmunds, GigaScience, China
Chris T. Evelo, The Netherlands Bioinformatics Centre, Netherlands
Mark J. Forster, Syngenta, United Kingdom
Pascale Gaudet, SIB, Switzerland
Jack Gilbert, Argonne National Laboratory, United States
Carole Goble, University of Manchester, United Kingdom
Julian L. Griffin, University of Cambridge, United Kingdom
Daniel Jacob, Université de Bordeaux, CBiB , France
Jos Kleinjans, Netherlands Toxicogenomics Centre, Netherlands
Lee Harland, ConnectedDiscovery Ltd, United Kingdom
Kenneth Haug, EMBL, United Kingdom
Henning Hermjakob, EMBL, United Kingdom
Shannan J. Ho Sui, Harvard School of Public Health, United States
Alain Laederach, University of North Carolina, United States
Shaoguang Liang, GigaScience, China
Stephen Marshall, The Novartis Institutes for BioMedical Research, United Kingdom
Annette McGrath, CSIRO, Australia
Emily M. Merrill, Massachusetts General Hospital, United States
Dorothy Reilly, The Novartis Institutes for BioMedical Research, United States
Magali Roux, University of Pierre and Marie Curie CNRS UMS 7606, France
Caroline E. Shamu, Harvard Medical School, United States
Catherine A. Shang, Bioplatforms Australia Ltd, Australia
Christoph Steinbeck Christoph, EMBL, United Kingdom
Anne Trefethen, University of Oxford, United Kingdom
Bryn Williams-Jones, ConnectedDiscovery Ltd, United Kingdom
Ioannis Xenarios, SIB, Switzerland
Katherine Wolstencroft, University of Manchester, United Kingdom

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

The ISA commons (www.isacommons.org) is a growing exemplar ecosystem of data curation and sharing solutions built on a common metadata tracking framework, providing tools and resources to create and manage large, heterogeneous data sets in a coherent manner, and allowing users of (parts of) data sets to ‘connect the metadata dots’. We invite new communities interested in breaching the boundary of their own biodomain to join the growing ISA network to empower ever more scientists to take data management, biocuration and sharing into their own hands, using community standards while remaining blissfully unaware of the underlying complexities of the implementation of those standards.

Keyword: Databases and Ontologies, Applied Bioinformatics

TOP

PP45 (PT) - A Conditional Neural Fields model for protein threading

Date: Monday, July 16, 3:30 p.m. - 3:55 p.m.Room: 202 B/C
Presenting author: Jianzhu Ma , Toyota Technological Institute at Chicago, United States

Additional authors:
Jian Peng, Toyota Technological Institute at Chicago
Sheng Wang, Toyota Technological Institute at Chicago
Jinbo Xu, Toyota Technological Institute at Chicago

Session Chair: Bonnie Berger

Presentation Overview: Show/Hide

Motivation: Alignment errors are still the main bottleneck of current template-based protein modeling (TM) methods including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein threading method for much more accurate sequence-template alignment by employing a probabilistic graphical model Conditional Neural Fields (CNF), which aligns one protein sequence to its remote template using a nonlinear scoring function. This scoring function can account for correlation among a variety of protein sequence and structure features, make use of information in the neighborhood of two residues to be aligned, and thus, is much more sensitive than the widely-used linear function or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method that can directly maximize the expected quality of a set of training alignments, instead of the standard maximum-likelihood method. Experimental results show that our CNF method generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks and very large in-house datasets. Our method outperforms others regardless of protein classes and lengths and works particularly well for proteins with sparse sequence profile due to the effective utilization of structure information. The methodology presented here can also be adapted to protein sequence alignment.

Keyword: Protein Structure and Function

TOP

PP46 (HT) - Multiple reference genomes and transcriptomes for Arabidopsis thaliana

Date: Monday, July 16, 3:30 p.m. - 3:55 p.m.Room: 104A
Presenting author: Gunnar Ratsch , Memorial Sloan-Kettering Cancer Center, United States

Additional authors:
Xiangchao Gan, University of Oxford, United Kingdom
Oliver Stegle, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Germany
Jonas Behr, Friedrich Miescher Laboratory, Germany
Philipp Drewe, Friedrich Miescher Laboratory, Germany
Joshua G. Steffen, University of Utah, United States
Richard Clark, University of Utah, United States
Edward J. Osborne, University of Utah, United States
Sebastian Schultheiss, Friedrich Miescher Laboratory, Germany
Vipin T. Sreedharan, Friedrich Miescher Laboratory, Germany
Andre Kahles, Friedrich Miescher Laboratory, Germany
Regina Bohnert, Friedrich Miescher Laboratory, Germany
Geraldine Jean, Friedrich Miescher Laboratory, Germany
Katie L. Hildebrand, Kansas State University, United States
Christopher Toomajian, Kansas State University, United States
Rune Lyngsoe, University of Oxford, United Kingdom
Paul Derwent, European Bioinformatics Institute, United Kingdom
Paul Kersey, European Bioinformatics Institute, United Kingdom
Eric Belfield, University of Oxford, United Kingdom
Nicholas Harberd, University of Oxford, United Kingdom
Eric Kemen, The Sainsbury Laboratory, United Kingdom
Paula X. Kover, University of Bath, United Kingdom

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.

Keyword: Sequence Analysis, Population Genomics

TOP

PP47 (HT) - Identifying the unknowns by aligning fragmentation trees

Date: Monday, July 16 , 3:30 p.m. - 3:55 p.m.Room: 104B
Presenting author: Sebastian Böcker , Friedrich-Schiller-University Jena, Germany

Additional authors:
Florian Rasche, Friedrich-Schiller-University Jena, Germany
Kerstin Scheubert, Friedrich-Schiller-University Jena, Germany
Franziska Hufsky, Friedrich-Schiller-University Jena, Germany
Thomas Zichner, European Molecular Biology Laboratory, Germany
Marco Kai, Max Planck Institute for Chemical Ecology, Germany

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

The structural elucidation of organic compounds in complex biofluids and tissues remains a significant challenge. For mass spectrometry, the manual interpretation of tandem mass spectra is cumbersome and requires expert knowledge, as the fragmentation mechanisms of small molecules are not completely understood. Thus, the automated identification of compounds is generally limited to searching in spectral libraries.
We have developed a fully automated pipeline for the identification of truly unknown compounds. First, it annotates the spectra with fragmentation trees, and then compares these trees via tree aligment. This allows for the retrieval of similar compounds from a reference library, even if it contains spectra from a different instrument type. A decoy database strategy enables FDR calculation. In addition, clustering based on tree similarities agrees well with known compound classes. This allows for a basic identification of unknown metabolites in an high-throughput setup.

Keyword: Mass Spectrometry and Proteomics, Applied Bioinformatics

TOP

PP48 (HT) - The Three-Dimensional Architecture of a Bacterial Genome and Its Alteration by Genetic Perturbation

Date: Monday, July 16, 3:30 p.m. - 3:55 p.m.Room: 104C
Presenting author: Davide Bau , National Center for Genomic Analysis, Spain

Additional authors:
Mark Umbarger, Harvard Medical School, United States
Esteban Toro, School of Medicine, United States
Matthew Wright, Harvard Medical School, United States
Gregory Porreca, Harvard Medical School, United States
Sun-Hae Hong, School of Medicine, United States
Michael Fero, School of Medicine, United States
Lihua Zhu, Program in Gene Function and Expression, United States
Marc Marti-Renom, National Center for Genomic Analysis , Spain
Harley McAdams, School of Medicine, United States
Lucy Shapiro, School of Medicine, United States
Job Dekker, University of Massachusetts Medical School, United States
George Church, Harvard Medical School, United States

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

We have determined the three-dimensional (3D) architecture of the Caulobacter crescentus genome by combining genome-wide chromatin interaction detection, live-cell imaging, and computational modeling. Using chromosome conformation capture carbon copy (5C), we derive around 13 kb resolution 3D models of the Caulobacter genome. The resulting models illustrate that the genome is ellipsoidal with periodically arranged arms. The parS sites, a pair of short contiguous sequence elements known to be involved in chromosome segregation, are positioned at one pole, where they anchor the chromosome to the cell and contribute to the formation of a compact chromatin conformation. Repositioning these elements resulted in rotations of the chromosome that changed the subcellular positions of most genes. Such rotations did not lead to large-scale changes in gene expression, indicating that genome folding does not strongly affect gene regulation. Collectively, our data suggest that genome folding is globally dictated by the parS sites and chromosome segregation.

Keyword: other, Applied Bioinformatics

TOP

PP49 (PT) - Novel domain combinations in proteins encoded by chimeric transcripts

Cancelled

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: TBA
Presenting author: Milana Frenkel-Morgenstern , Spain Spanish National Cancer Research Centre (CNIO), Spain

Additional authors:
Alfonso Valencia, Spain Spanish National Cancer Research Centre (CNIO)

Session Chair:

Presentation Overview: Show/Hide

Chimeric RNA transcripts are generated by different mechanisms, including pre-mRNA trans-splicing, chromosomal translocation and/or gene fusion, and it was recently shown that at least some chimeric transcripts may be translated into functional chimeric proteins. To gain a better understanding of the design principles behind the production of chimeric proteins, we have analyzed 7,424 chimeric RNAs from humans. We focused on the specific domains present in these proteins, comparing their permutations with those of known human proteins. We found that chimeras contain complete protein domains more often than in random datasets and specifically, that eight different types of domains are over represented among all chimeras, as well as in those chimeras confirmed by RNA-seq experiments. Moreover, we discovered that some chimeras potentially encode proteins with novel and unique combinations of such domains. Given the prevalence of complete protein domains observed in chimeras, we predict that putative chimeras that lack activation domains may actively compete with their parental proteins, thereby exerting a dominant negative effect. In more general terms, the generation of chimeric transcripts produces a combinatorial increase in the number of protein products available, which may disturb the function of parental genes and influence their protein-protein interaction network.

Keyword: Protein Structure and Function

TOP

PP50 (PT) - Xenome - A Tool for Classifying Reads from Xenograft Samples

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: 104A
Presenting author: Thomas Conway , NICTA, Australia

Additional authors:
Jeremy Wazny, NICTA
Andrew Bromage, NICTA
Martin Tymms, Monash Institute for Medical Research
Dhanya Sooraj, Monash Institute for Medical Research
Elizabeth Williams, Monash Institute for Medical Research
Bryan Beresford-Smith, NICTA

Session Chair: Burkhard Rost

Presentation Overview: Show/Hide

Motivation: Shotgun sequence read data derived from xenograft material contains a mixture of reads arising from the host and reads arising from the graft. Classifying the read mixture to separate the two allows for more precise analysis to be performed. Results: We present a technique, with an associated tool Xenome, which performs fast, accurate and specific classification of xenograft derived sequence read data. We have evaluated it on RNA-Seq data from human, mouse and human-in-mouse xenograft data sets. Availability: Xenome is available for non-commercial use from http://www.nicta.com.au/bioinformatics

Keyword: Sequence Analysis

TOP

PP51 (PT) - Fast alignment of fragmentation trees

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: 104B
Presenting author: Franziska Hufsky , Friedrich-Schiller-University Jena, Germany

Additional authors:
Kai Dührkop, Friedrich-Schiller-University Jena
Florian Rasche, Friedrich-Schiller-University Jena
Markus Chimani, Friedrich-Schiller-University Jena
Sebastian Böcker, Friedrich-Schiller-University Jena

Session Chair: Reinhard Schneider

Presentation Overview: Show/Hide

Mass spectrometry allows sensitive, automated and high- throughput analysis of small molecules such as metabolites. One major bottleneck in metabolomics is the identification of "unknown" small molecules not in any database. Recently, fragmentation tree alignments have been introduced for the automated comparison of the fragmentation patterns of small molecules. Fragmentation pat- tern similarities are strongly correlated with the chemical similarity of molecules, and allow us to cluster compounds based solely on their fragmentation patterns. Aligning fragmentation trees is computationally hard. Nevertheless, we present three exact algorithms for the problem: A dynamic pro- gramming (DP) algorithm, a sparse variant of the DP, and an Integer Linear Program (ILP). Evaluation of our methods on three different datasets showed that thousands of alignments can be computed in a matter of minutes using DP, even for "challenging" instances. Run- ning times of the sparse DP were an order of magnitude better than for the classical DP. The ILP was clearly outperformed by both DP approaches. We also found that for both DP algorithms, computing the 1 % slowest alignments required as much time as computing the 99 % fastest.

Keyword: Mass Spectrometry and Proteomics

TOP

PP52 (PT) - Dissect: Detection and Characterization of Novel Structural Alterations in Transcribed Sequences

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: 104C
Presenting author: Deniz Yorukoglu , Massachusetts Institute of Technology, United States

Additional authors:
Faraz Hach, Simon Fraser University
Lucas Swanson, Simon Fraser University
Colin C. Collins, Vancouver Prostate Centre
Inanc Birol, Genome Sciences Centre
S. Cenk Sahinalp, Simon Fraser University

Session Chair: Hagit Shatkay

Presentation Overview: Show/Hide

Motivation: Computational identification of genomic structural variants via high throughput sequencing is an important problem for which a number of highly sophisticated solutions have been developed recently. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention. In this paper, we introduce two novel algorithmic formulations for identifying transcriptomic structural variants through aligning transcripts to the reference genome under the consideration of such variation. The first formulation is based on a nucleotide-level alignment model; a second, potentially faster formulation is based on chaining fragments shared between each transcript and the reference genome. Based on these formulations, we introduce a novel transcriptome-to-genome alignment tool, Dissect, which can identify and characterize transcriptomic events such as duplications, inversions, rearrangements and fusions. Dissect is suitable for whole transcriptome structural variation discovery problems involving sufficiently long reads or accurately assembled contigs.

Keyword: Sequence Analysis

TOP

PP53 (PT) - MoRFpred, a computational tool for sequence-based prediction and characterization of disorder-to-order transitioning binding sites in proteins

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: Grand Ballroom
Presenting author: Deniz Yorukoglu , Massachusetts Institute of Technology, United States

Additional authors:
Wei-Lun Hsu, Indiana University
Marcin J. Mizianty, University of Alberta
Christopher J. Oldfield, Indiana University
Bin Xue, University of South Florida
A. Keith Dunker, Indiana University
Vladimir N. Uversky, University of South Florida
Lukasz Kurgan, University of Alberta
Fatemeh Miri Disfani, University of Alberta, Canada

Session Chair: Nir Ben-Tal

Presentation Overview: Show/Hide

Motivation: Molecular Recognition Feature (MoRF) regions are disordered binding sites that become structured upon binding. MoRFs are implicated in important biological processes, including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains. Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (alpha, beta, coil, and complex). We develop a comprehensive dataset of annotated MoRFs and use it to build and empirically compare our method. MoRFpred is based on a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary pro-files, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility, and B-factors. Empirical evaluation shows that MoRFpred statistically significantly outperforms existing predictors, alpha-MoRF-Pred and ANCHOR, by 0.07 in AUC and 10% in success rate. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We present case studies to analyze these putative MoRFs. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues. Availability: http://biomine.ece.ualberta.ca/MoRFpred/ Supplementary information: http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf Contact: lkurgan@ece.ualberta.ca

Keyword: Protein Structure and Function

TOP

PP54 (HT) - Uncovering Ancient Networks from Present-Day Interactions

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: 104A
Presenting author: Carl Kingsford , University of Maryland, College Park, United States

Additional authors:
Saket Navlakha, Carnegie Mellon University, United States
Rob Patro, University of Maryland, College Park, United States
Emre Sefer, University of Maryland, College Park, United States
Justin Malin, University of Maryland, College Park, United States
Guillaume Marçais, University of Maryland, College Park, United States

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

I will present our recent work on reconstructing ancient biological networks. We have developed several methods for recovering interactions between molecules that were present in ancestral species, starting with only the present-day networks that we are able to measure. We have shown that, using these algorithms, ancestral interactions can be inferred with high accuracy. I will discuss several applications of these approaches, including predicting missing interactions between present-day viral proteins, identifying functionally related proteins, and modeling how protein complexes have rewired over time in yeast.

Keyword: Protein Interactions and Molecular Networks, Evolution & Comparative Genomics

TOP

PP55 (HT) - An effective statistical evaluation of ChIPseq dataset similarity

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: 104B
Presenting author: Maria Chikina , Mount Sinai Medical School, United States

Additional authors:
Olga G. Troyanskaya, Princeton University, United States

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

ChIPseq technology has become the state-of-the-art whole-genome technique for analyzing protein-DNA interactions, making it necessary to have rigorous methods for quantifying similarity between datasets, and defining interactions among chromatin features. This presents a statistical problem for which several solutions have been proposed.
While other methods for obtaining significance of similarity must make somewhat arbitrary choices of distance metrics, parametric distributions, or procedures for simulating the null hypothesis, we present a simple and intuitive approach for calculating exact p-values that is essentially assumption-free. Our approach is robust to non-biological variations and involves an asymmetric comparison, allowing one to tease out hierarchical relationships among chromatin proteins.

Keyword: Applied Bioinformatics

TOP

PP56 (PT) - Extending ontologies by finding siblings using set expansion techniques

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: 104C
Presenting author: Götz Fabian , Technische Universität Dresden, Germany

Additional authors:
Thomas Wächter, Technische Universität Dresden
Michael Schroeder, Technische Universität Dresden

Session Chair: Michal Linial

Presentation Overview: Show/Hide

Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text- based ontologies level by level. Results: We developed an approach to extend ontologies by discovering new terms which are in a sibling relationship to existing terms of an ontology. For this purpose, we combined two approaches which retrieve new terms from the web. The first approach extracts siblings by exploiting the structure of HTML documents, whereas the second approach uses text mining techniques to extract siblings from unstructured text. Our evaluation against MeSH shows that our method for sibling discovery is able to suggest first-class ontology terms and can be used as an initial step towards assessing the completeness of ontologies. The evaluation yields a recall of 80% at a precision of 61% where the two independent approaches are complementing each other. For MeSH in particular, we show that it can be considered complete in its medical focus area. We integrated the work into DOG4DAG, an ontology generation plugin for the editors OBO-Edit and Protégé, making it the first plugin that supports sibling discovery on-the-fly. Availability: Sibling discovery for ontology is available as part of DOG4DAG (www.biotec.tu-dresden.de/research/schroeder/dog4dag) for both Proteégé 4.1 and OBO-Edit 2.1.

Keyword: Databases and Ontologies / Disease Models and Epid

TOP

PP57 (PT) - Recognition Models to Predict DNA-binding Specificities of Homeodomain Proteins

Date: Tuesday, July 17, 11:15 a.m. - 11:40 a.m.Room: Grand Ballroom
Presenting author: Gary D. Stormo, Washington University School of Medicine

Additional authors:
Metewo Selase Enuameh, University of Massachusetts Medical School
Marcus B. Noyes, University of Massachusetts Medical School
Michael H. Brodsky, University of Massachusetts Medical School
Scot A. Wolfe, University of Massachusetts Medical School
Ryan Christensen, Washington University School of Medicine, United States

Session Chair: Nir Ben-Tal

Presentation Overview: Show/Hide

Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests can produce recognition models for homeodomain proteins that are significant improvements over k-nearest neighbor based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting PFMs from protein sequence using a random forest based model.

Keyword: Protein Structure and Function

TOP

PP58 (HT) - A three-dimensional map of protein networks within and between species

Date: Tuesday, July 17, 11:15 a.m. - 11:40 a.m.Room: 104A
Presenting author: Yu Xia , Boston University, United States

Additional authors:
Eric Franzosa, Boston University, United States

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

General properties of the largely antagonistic biomolecular interactions between pathogens and their hosts remain poorly understood, and may differ significantly from known principles governing the cooperative interactions within the host. Recent host-pathogen systems biology efforts have generated global, but low-resolution, maps of host-pathogen protein-protein interaction networks. Here, we integrate three-dimensional homology models of protein complexes with interaction networks among human and viral proteins to construct the first human-virus structural interaction network. Subsequent analyses reveal significant biophysical, functional, and evolutionary differences between host-virus and within-host structural interaction networks. We find that viral proteins tend to bind to existing within-host interfaces. Compared to within-host protein-protein interfaces, host-virus protein-protein interfaces tend to be more transient, targeted by more host proteins, more regulatory in function, faster evolving, and rely less on sequence similarity to achieve interface mimicry. These results highlight the distinct consequences of cooperation versus antagonism in biological networks within and between species.

Keyword: Protein Interactions and Molecular Networks, Protein Structure & Function

TOP

PP59 (PT) - DELISHUS: An Efficient and Exact Algorithm for Genome-Wide Detection of Deletion Polymorphism in Autism

Date: Tuesday, July 17, 11:15 a.m. - 11:40 a.m.Room: 104B
Presenting author: Derek Aguiar , Brown University, United States

Additional authors:
Bjarni Halldorsson, Reykjavik University
Eric Morrow, Brown University
Sorin Istrail, Brown University

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A substantial portion of autism appears to be correlated with copy number variation which is not directly probed by single nucleotide polymorphism (SNP) array technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem due, in part, to the inability of algorithms to detect them. In this paper we present an algorithmic framework, which we term DELISHUS, that implements three highly efficient algorithms for inferring genomic deletions of all sizes and frequencies in SNP array data. We implement a polynomial-time backtracking algorithm -- that finishes on a 1 billion entry genome-wide association study (GWAS) SNP matrix in a few minutes -- to compute all potential deletions in a dataset. Given a set of called deletions, we also give a polynomial time algorithm for detecting regions that contain multiple recurrent deletions. Finally, we give an algorithm for detecting de novo deletions. Because our algorithms consider all individuals in the sample at once, they achieve significantly lower false positive rates and higher power when compared to previously published single individual algorithms. Our method may be used to identify the deletion spectrum for GWAS where deletion polymorphism was previously not analyzed. DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/

Keyword: Population Genomics

TOP

PP60 (PT) - Ranking of Multidimensional Drug Profiling Data by Fractional Adjusted Bi-Partitional Scores

Date: Tuesday, July 17 , 11:15 a.m. - 11:40 a.m.Room: 104C
Presenting author: Dorit Hochbaum , University of California at Berkeley, United States

Additional authors:
Chun-Nan Hsu, University of Southern California, Marina del Rey
Yan T. Yang, University of California at Berkeley

Session Chair: Michal Linial

Presentation Overview: Show/Hide

Motivation: The recent development of high-throughput drug profiling (high content screening or HCS) provides a large amount of quantitative multidimensional data. Despite its potentials, it poses several challenges for academia and industry analysts alike. This is especially true for ranking the effectiveness of several drugs from many thousands of images directly. This paper introduces, for the first time, a new framework for automatically ordering the performance of drugs, called fractional adjusted bi-partitional score (FABS). This general strategy takes advantage of graph-based formulations and solutions and avoids many shortfalls of traditionally used methods in practice. We experimented with FABS framework by implementing it with a specific algorithm, a variant of normalized cut – normalized cut prime (FABS-NC′), producing a ranking of drugs. This algorithm is known to run in polynomial time and therefore can scale well in high-throughput applications.

Results: We compare the performance of FABS-NC′ to other methods that could be used for drugs ranking. We devise two variants of the FABS algorithm: FABS-SVM that utilizes support vector machine (SVM) as black box, and FABS-Spectral that utilizes the eigenvector technique (spectral) as black box. We compare the performance of FABS-NC′ also to three other methods that have been previously considered: center ranking (Center), PCA ranking (PCA), and graph transition energy method (GTEM). The conclusion is encouraging: FABS-NC′ consistently outperforms all these five alternatives. FABS-SVM has the second best performance among these six methods, but is far behind FABS-NC′: In some cases FABSNC ′ produces over half correctly predicted ranking experiment trials than FABS-SVM.

Availablility: The system and data for the evaluation reported here will be made available upon request to the authors after this manuscript is accepted for publication.

Keyword: Disease Models and Epidemiology

TOP

PP61 (PT) - TMBMODEL: Toward 3D modeling of transmembrane beta barrel proteins based on z-coordinate and topology prediction

Date: Tuesday, July 17, 11:45 a.m. - 12:10 p.m.Room: Grand Ballroom
Presenting author: Sikander Hayat , Stockholm University, Sweden

Additional authors:
Arne Elofsson, Stockholm University

Session Chair: Nir Ben-Tal

Presentation Overview: Show/Hide

Motivation: Two types of transmembrane proteins exist, alpha-helical membrane proteins and transmembrane beta-barrels. The later type exists in the outer membrane of gram-negative bacteria and in chloroplast and mitochondria where they play a major role in the translocation machinery. Here, we aim to build three-dimensional models for transmembrane beta-barrels based on a large set of predicted topologies used to generate alternative three-dimensional models.Thereafter, the predicted Z-coordinate, i.e. the distance of a residue from the membrane center, is used to identify the best model.

Results: We present TMBMODEL; a method for generating three-dimensional models based on predicted topologies. TMBMODEL employs theoretic principles from known structures to construct a model for a barrel of a given transmembrane beta-barrel sequence. Firstly, different topologies are obtained from running the BOCTOPUS topology predictor and then three-dimensional models are constructed for different shear numbers. The best model is then selected based on a novel Z-coordinate predictor. Based on a leave-one-out cross-validation, the Z-coordinate predictor predicts 74% residues within 2 Å on a non-redundant dataset of 36 transmembrane beta-barrels. The average error and correctly identified membrane residues is 1.61 Å and 71%, respectively. TMBMODEL chose the correct topology for 75% proteins in the data set, which is a slight improvement over BOCTOPUS. More importantly TMBMODEL provides a C-alpha template for more detailed structural analysis. The average RMSD for this template is 7.24 Å.

Availability: TMBMODEL is freely available as a web-server at: http://tmbmodel.cbr.su.se/. The data sets used for training and evaluations are also available from this site.

Contact: arne@bioinfo.se

Abbreviations: TMB, transmembrane beta-barrel protein; HMM, Hidden
Markov Model; SVM, support vector machine.

Keyword: Protein Structure and Function

TOP

PP62 (HT) - Network-Based Prediction and Analysis of HIV Dependency Factors

Date: Tuesday, July 17, 11:45 a.m. - 12:10 p.m.Room: 104A
Presenting author: T. Murali , Virginia Tech, United States

Additional authors:
Matthew Dyer, Applied Biosystems, United States
David Badger, Virginia Tech, United States
Brett Tyler, Virginia Tech, United States
Michael Katze, University of Washington, United States

Session Chair: Alex Bateman

Presentation Overview: Show/Hide

HIV Dependency Factors (HDFs) are human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three genome-wide RNAi experiments identified HDF sets with little overlap. We discuss how we combined these three datasets with a human PPI network to predict new HDFs, using an algorithm called SinkSource and four other algorithms published in the literature. A number of HDFs that we predicted are known to interact with HIV proteins. Many predicted HDF genes showed significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers of AIDS development.

We conclude with recent results on predicting dependency factors for multiple viruses, in an effort to discover human proteins that may serve as broad-spectrum drug targets for infectious diseases.

Keyword: Protein Interactions and Molecular Networks

TOP

PP63 (PT) - SEQuel: Improving the Accuracy of Genome Assemblies

Date: Tuesday, July 17, 11:45 a.m. - 12:10 p.m.Room: 104B
Presenting author: Roy Ronen , University of California, San Diego, United States

Additional authors:
Christina Boucher, University of California, San Diego
Hamidreza Chitsaz, Wayne State University
Pavel Pevzner, University of California, San Diego

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

Motivation: Assemblies of next generation sequencing data, while accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e., insertions, deletions, and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell E. coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently-assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/.

Keyword: Sequence Analysis

TOP

PP64 (PT) - DACTAL: divide-and-conquer trees (almost) without alignments

Date: Tuesday, July 17 , 11:45 a.m. - 12:10 p.m.Room: 104C
Presenting author: Tandy Warnow, University of Texas at Austin

Additional authors:
Kevin Liu, Rice University
Li-San Wang, University of Pennsylvania
C. Randal Linder, University of Texas at Austin
Serita Nelesen, Calvin College, United States

Session Chair: Michal Linial

Presentation Overview: Show/Hide

We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divide-and-conquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATe and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000 to 27,643 taxa. Our studies show that DACTAL dramatically outperforms two-phase methods with respect to tree accuracy. The comparison to SAT\{e} shows that both have the same tree accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SAT\'{e}, including a dataset with almost 28,000 sequences.

DACTAL source code is available at www.cs.utexas.edu/users/phylo/software/dactal

Keyword: Evolution and Comparative Genomics

TOP

PP65 (PT) - Minimum Message Length Inference of Secondary Structure from Protein Coordinate Data.

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: Grand Ballroom
Presenting author: Arun Konagurthu , Monash University, Australia

Additional authors:
Arthur Lesk, Pennsylvania State University
Lloyd Allison, Monash University

Session Chair: Nir Ben-Tal

Presentation Overview: Show/Hide

Motivation: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous deﬁnition. A variety of comparative studies have highlighted major disagreements in the way the available methods deﬁne and assign secondary structure to coordinate data. Results: We report a new method to infer secondary structure based on the Bayesian method of Minimum Message Length (MML) inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximise the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it to popular programs such as DSSP and STRIDE amongst others. Our evaluation suggests that SST gives reliable assignments even on low resolution structures.

Keyword: Protein Structure and Function

TOP

PP66 (PT) - Weighted Pooling - Practical and Cost Effective Techniques for Pooled High Throughput Sequencing

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: 104B
Presenting author: David Golan , Tel Aviv University , Israel

Additional authors:
Saharon Rosset, Tel Aviv University
Yaniv Erlich, Whitehead Institute

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

Motivation: Despite the rapid decline in sequencing costs, sequencing large cohorts of individuals is still prohibitively expensive. Recently, several sophisticated pooling designs were suggested that can identify carriers of rare alleles in large cohorts with a significantly smaller number of pools, thus dramatically reducing the cost of such large scale sequencing projects (Erlich et al. 2009). These approaches use combinatorial pooling designs where each individual is either present or absent from a pool. One can then infer the number of carriers in a pool, and by combining information across pools, reconstruct the identity of the carriers. Results: We show that one can gain further efficiency and cost reduction by using "weighted" designs, in which different individuals donate different amounts of DNA to the pools. Intuitively, in this situation the number of mutant reads in a pool does not only indicate the number of carriers, but also their identity. We describe and study a powerful example of such weighted designs, using non-overlapping pools. We demonstrate that this approach is not only easier to implement and analyze but is also competitive in terms of accuracy with combinatorial designs when identifying rare variants, and is superior when sequencing common variants. We then discuss how weighting can be incorporated into existing combinatorial designs to increase their accuracy and demonstrate the resulting improvement using simulations. Finally, we argue that weighted designs have enough power to facilitate detection of common alleles, so they can be used as a cornerstone of whole-exome sequencing projects.

Keyword: Sequence Analysis

TOP

PP67 (HT) - Predicting relapse in medulloblastoma patients by integrating evidence from clinical and genomic features

Date: Tuesday, July 17 , 2:30 p.m. - 2:55 p.m.Room: Grand Ballroom
Presenting author: Pablo Tamayo , Broad Institute, United States

Additional authors:
Yoon-Jae Cho, Stanford University, United States
Aviad Tscherniak Tsherniak, Broad INstitute, United States
Marcel Kool, Amsterdam Medical Center, Netherlands
Scott Pomeroy, Children's Hospital, United States
Jill Mesirov, Broad Institute, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Despite significant progress in the molecular understanding of medulloblastoma, stratification of risk in patients remains a challenge. Focus has shifted from clinical parameters to molecular markers, such as expression of specific genes and selected genomic abnormalities, to improve accuracy of treatment outcome prediction. Here, we show how integration of high-level clinical and genomic features or risk factors, including disease subtype, can yield more comprehensive, accurate, and biologically interpretable prediction models for relapse versus no-relapse classification. We also introduce a novel Bayesian nomogram indicating the amount of evidence that each feature contributes on a patient-by-patient basis.

Keyword: Disease Models and Epidemiology

TOP

PP68 (HT) - Chemical-Protein Interactome and its Application in Personalized Medicine and Drug Repositioning

Date: Tuesday, July 17, 2:30 p.m. - 2:55 p.m.Room: 104A
Presenting author: Lun Yang , GlaxoSmithKline, United States

Additional authors:
Lin He, Shanghai Jiao Tong U, China
Kejian Wang, Shanghai Jiao Tong U, China
Heng Luo, Shanghai Jiao Tong U, China

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

Chemical-Protein Interactome is a computational methodology with a focus on characterizing differential drug efficacy and side effects through the combined analysis of genetic polymorphisms and their impact on chemical-protein interactions and gene expression perturbations. The methodology opens opportunities for developing patient-specific medication in terms of decreasing adverse drug reactions and broadening new uses for old drugs.

Keyword: Applied Bioinformatics

TOP

PP69 (HT) - Large-scale DNA editing of retrotransposons accelerates mammalian genome evolution.

Date: Tuesday, July 17 , 2:30 p.m. - 2:55 p.m.Room: 104B
Presenting author: Erez Levanon , Bar-Ilan University, Israel

Additional authors:
Shai Carmi, Columbia University., United States
George Church, Harvard Medical School , United States

Session Chair: Jaques Reifman

Presentation Overview: Show/Hide

Genomic innovation is thought to be mediated by slow accumulation of uncorrelated mutations. Here, we show that mammalians utilized an antiviral mechanism to accelerate their genome evolution by large-scale, parallel editing of their retrotransposons. We found thousands of clusters of G-to-A mismatches between pairs of retrotransposon sequences, indicating massive editing of retrotransposons prior to their integration. Such clusters are the hallmark of the activity of APOBEC3, a potent antiretroviral protein family with cytidine deamination function. We found DNA editing to span many mammalian genomes and retrotransposon families, as well as human-specific elements. Since DNA editing simultaneously generates a large number of mutations, each affected element can begin its evolutionary trajectory from a unique starting point, thereby increasing the probability of developing a novel function.

Keyword: Evolution and Comparative Genomics

TOP

PP70 (HT) - Interpreting human disease associations using comparative genomic and epigenomic signatures

Date: Tuesday, July 17 , 3:00 p.m. - 3:25 p.m.Room: Grand Ballroom
Presenting author: Manolis Kellis , MIT, United States

Additional authors:
Luke Ward, MIT, United States
29-mammals Consortium, Broad Institute, United States

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

The large number of single-nucleotide polymorphisms (SNP) from genome-wide association studies (GWAS) that implicate non-coding regions in human disease poses the important challenge of interpreting their molecular mechanisms of action, needed for drug targets and therapeutics. Comparison of many related genomes has emerged as a powerful lens for genome interpretation, which complements large-scale experimental datasets of gene and genome activity by providing information on selective pressures for functional nucleotides. We have used the comparative analysis of 29 eutherian genomes to provide a high-resolution map of selective constraint in the human genome, revealing 3 million novel elements, and used distinct evolutionary signatures and chromatin information to suggest their candidate functions. We have further automated their use for interpreting disease-associated regions, by exploiting the population-specific linkage disequilibrium (LD) structure from the 1000 Genomes Project, to facilitate development of mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation.

Keyword: Disease Models and Epidemiology, Evolution & Comparative Genomics

TOP

PP71 (HT) - A data integration approach illustrates evolutionary mechanisms of ligand selectivity between related protein targets

Date: Tuesday, July 17, 3:00 p.m. - 3:25 p.m.Room: 104A
Presenting author: Felix Kruger , European Bioinformatics Institute, uk

Additional authors:
John P Overington, European Bioinformatics Institute, United Kingdom

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

We integrated small molecule bioactivity data and homology information and compared small molecule binding between pairs of human paralogs and also between curated pairs of human to rat orthologs. To account for noise in the data set, we further compared measurements of the same ligand and target in different assays. We found that differences in small molecule binding between human paralogs are greater than the assay sample error. In contrast, differences between human to rat orthologs are no greater than the sample error. We then analyzed, for pairs of human paralogs, the relationship between sequence identity and differences in small molecule binding. For a subset of the data, differences in small molecule binding are greater for pairs with more divergent sequences. We conclude that small molecule binding between human to rat orthologs is largely conserved while selectivity of small molecule binding was observed between pairs of human paralogs.

Keyword: Applied Bioinformatics, Evolution & Comparative Genomics

TOP

PP72 (HT) - Domain architecture conservation in orthologs

Date: Tuesday, July 17 , 3:00 p.m. - 3:25 p.m.Room: 104B
Presenting author: Erik Sonnhammer , Stockholm University, Sweden

Additional authors:
Kristoffer Forslund, SBC, Stockholm University, Sweden

Session Chair: Jaques Reifman

Presentation Overview: Show/Hide

According to the “ortholog conjecture”, orthologous proteins are expected to retain function more often than other homologs. Several proxies for functional conservation have been used, such as GO annotations and tissue expression. We here test the ortholog conjecture using conservation of domain architecture as an alternative proxy for protein function.

We studied domain architecture conservation in orthologs and paralogs between human and 40 other species. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold.

Our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

Keyword: Evolution and Comparative Genomics, Protein Structure & Function

TOP

PP73 (PT) - Statistical model-based testing to evaluate the recurrence of genomic aberrations

Date: Tuesday, July 17 , 3:30 p.m. - 3:55 p.m.Room: Grand Ballroom
Presenting author: Atsushi Niida , University of Tokyo, Japan

Additional authors:
Seiya Imoto, University of Tokyo
Teppei Shimamura, University of Tokyo
Satoru Miyano, University of Tokyo

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Motivation: In cancer genomes, chromosomal regions harboring cancer genes are often subjected to genomic aberrations like copy number alteration and loss of heterozygosity (LOH). Given this, finding recurrent genomic aberrations is considered an apt approach for screening cancer genes. Although several permutation-based tests have been proposed for this purpose, none of them are designed to find recurrent aberrations from the genomic data set without paired normal sample controls. Their application to unpaired genomic data may lead to false discoveries, because they retrieve pseudo-aberrations that exist in normal genomes as polymorphisms. Results: We develop a new parametric method named parametric aberration recurrence test (PART) to test for the recurrence of genomic aberrations. The introduction of Poisson-binomial statistics allow us to compute small p-values more efficiently and precisely than the previously proposed permutation-based approach. Moreover, we extended PART to cover unpaired data (PART-up) so that there is a statistical basis for analyzing unpaired genomic data. PART-up utilizes information from unpaired normal sample controls to remove pseudo-aberrations in unpaired genomic data. Using PART-up, we successfully predict recurrent genomic aberrations in cancer cell line samples whose paired normal sample controls are unavailable. This paper thus proposes a powerful statistical framework for the identification of driver aberrations, which would be applicable to ever-increasing amounts of cancer genomic data seen in the era of next generation sequencing.

Keyword: Disease Models and Epidemiology

TOP

PP74 (HT) - Materiomics: instructing cell fate using topographical biomaterials

Date: Tuesday, July 17, 3:30 p.m. - 3:55 p.m.Room: 104A
Presenting author: Marc Hulsman , TU Delft, Netherlands

Additional authors:
Hemant Unadkat, University of Twente, Netherlands
Kamiel Cornelissen, University of Twente, Netherlands
Bernke Papenburg, University of Twente, Netherlands
Roman Truckenmüller, University of Twente, Netherlands
Gerhard Post, University of Twente, Netherlands
Marc Uetz, University of Twente, Netherlands
Marcel Reinders, Delft University of Technology, Netherlands
Dimitrios Stamatialis, University of Twente, Netherlands
Clemens van Blitterswijk, University of Twente, Netherlands
Jan de Boer, University of Twente, Netherlands

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

In the development of cellular tissues, not only growth factors play a role, but also the structural environment. With material surfaces shown to affect stem cell fate, new avenues are opening for improving the biological performance of (implant) surfaces used in the human body. In the present study, we developed a chip, allowing us to chart cell – surface topography interactions in high-throughput. Human mesenchymal stromal cells (hMSCs) were grown on the chips, and using high-content imaging, each individual cell response was measured. The results reveal formerly unknown surface topographies that are able to induce MSC proliferation or osteogenic differentiation. Moreover, using machine learning techniques, we correlate parameters of the surface designs to cellular responses, yielding new insight into surface design criteria, and enabling us to predict the performance of untested surfaces.

Keyword: Applied Bioinformatics, Applied Bioinformatics

TOP

PP75 (HT) - Viral-host coevolution: Playing 'seek and hide'

Date: Tuesday, July 17 , 3:30 p.m. - 3:55 p.m.Room: 104B
Presenting author: Michal Linial , The Hebrew University of Jerusalem, Israel

Additional authors:
Nadav Rappoport, The Hebrew University of Jerusalem, Israel

Session Chair: Jaques Reifman

Presentation Overview: Show/Hide

Ample of studies focuses on the exchange of genetic material between viruses and cellular hosts. The common view claims that along the evolutionary history (bacteria to humans), viruses have shaped the host genomes. We will present evidence that, in addition to codon usage adaptation (Bahir et al. MSB 5:311), shaping viral proteomes is executed by ‘stealing and refinement’ of genetic material from the host. Tracing such events is challenging as the origin of the sequences is masked by viruses’ high mutation rate. We will present evidence for “stolen” genetic material from metazoan hosts to their viruses. For about 75% of the cross-taxa families, viral proteins are significantly shorter than their counterpart host proteins. We expose instances for active trimming of domain tails, and removal of internal domains by viruses. The inventory of viral stolen proteins provides insights on the overlooked intimacy of viruses and their multicellular hosts.

Keyword: Evolution and Comparative Genomics, Sequence Analysis

TOP

PP76 (PT) - Data-Driven Integration Of Epidemiological And Toxicological Data To Select Candidate Interacting Genes And Environmental Factors In Association With Disease

Date: Tuesday, July 17 , 4:00 p.m. - 4:25 p.m.Room: Grand Ballroom
Presenting author: Chirag Patel , Stanford University, United States

Additional authors:
Rong Chen, Stanford University
Atul Butte, Stanford University

Session Chair: Serafim Batzoglou

Presentation Overview: Show/Hide

Complex diseases, such as Type 2 Diabetes Mellitus (T2D), result from the interplay of both environmental and genetic factors. However, most studies either investigate either the genetics or the environment in context of disease and there are a few that study their possible interaction. One key challenge in documenting interactions between genes and environment includes choosing which of each to test jointly. Here, we attempt to address this challenge through a data-driven integration of epidemiological and toxicological studies. Specifically, we derive lists of candidate interacting genetic and environmental factors by integrating findings from genome-wide and environment-wide association studies (GWAS and EWAS). Next, we search for evidence of toxicological relationships between these genetic and environmental factors that may have an etiological role in the disease. We illustrate our method by selecting candidate interacting factors for Type 2 Diabetes.

Keyword: Disease Models and Epidemiology

TOP

PP77 (PT) - Identifying Disease Sensitive and Quantitative Trait Relevant Biomarkers from Heterogeneous Imaging Genetics Data via Sparse Multi-Modal Multi-Task Learning

Date: Tuesday, July 17, 4:00 p.m. - 4:25 p.m.Room: 104A
Presenting author: Hua Wang , University of Texas at Arlington, United States

Additional authors:
Feiping Nie, University of Texas at Arlington
Heng Huang, University of Texas at Arlington
Shannon Leigh Risacher, Indiana University
Andrew Saykin, Indiana University School of Medicine
Li Shen, Indiana University School of Medicine

Session Chair: Terry Gaasterland

Presentation Overview: Show/Hide

Motivation: Recent advances in brain imaging and high-throughput genotyping techniques enable new approaches to study the influence of genetic and anatomical variations on brain functions and disorders. Traditional association studies typically perform independent and pairwise analysis among neuroimaging measures, cognitive scores, and disease status, and ignore the important underlying interacting relationships between these units. Results: To overcome this limitation, in this paper, we propose a new sparse multi-modal multi-task learning method to reveal complex relationships from gene to brain to symptom. Our main contributions are three-fold: 1) utilizing a joint classification and regression learning model to identify disease-sensitive and cognition-relevant biomarkers; 2) introducing combined structured sparsity regularizations into multimodal multi-task learning to integrate heterogenous imaging genetics data and identify multi-modal biomarkers; 3) deriving a new efficient optimization algorithm to solve our non-smooth objective function and providing rigorous theoretical analysis on the global optimum convergency. Using the imaging genetics data from the Alzheimer’s Disease Neuroimaging Initiative database, the effectiveness of the proposed method is demonstrated by clearly improved performance on predicting both cognitive scores and disease status. The identified multi-modal biomarkers could predict not only disease status but also cognitive function to help elucidate the biological pathway from gene to brain structure and function, and to cognition and disease.

Keyword: Applied Bioinformatics / Disease Models

TOP

PP78 (PT) - Efficient Algorithms for the Reconciliation Problem with Gene Duplication, Horizontal Transfer, and Loss

Date: Tuesday, July 17 , 4:00 p.m. - 4:25 p.m.Room: 104B
Presenting author: Mukul S. Bansal , Massachusetts Institute of Technology, United States

Additional authors:
Eric J. Alm, Massachusetts Institute of Technology
Manolis Kellis, Massachusetts Institute of Technology

Session Chair: Jaques Reifman

Presentation Overview: Show/Hide

Motivation: Gene family evolution is driven by evolutionary events like speciation, gene duplication, horizontal gene transfer, and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. This reconciliation problem is referred to as Duplication-Transfer-Loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL-reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction. Results: We present two new algorithms for the DTL-reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL-reconciliation model by considering distance-dependent transfer costs, that allow for more accurate reconciliation, and give an efficient algorithm for DTL-reconciliation under this extended model. We implemented our new algorithms and demonstrate up to 100,000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL-reconciliation for performing rigorous evolutionary analyses of large gene families, and enables its use in advanced reconciliation-based gene and species tree reconstruction methods.

Keyword: Evolution and Comparative Genomics

TOP

SS2_partB - Reconstructing the Regulatory Network of TB: Deconstruction of the Hypoxic Response

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 202BC
Presenting author: Elham Azizi , Boston University, United States

Session Chair:

Presentation Overview: Show/Hide

We have generated the first genome scale model of the in M. tuberculosis regulatory network and combined this network with the first comprehensive profiling of mRNA, proteins, metabolites and lipids in MTB during hypoxia and re-aeration. We have developed a high-throughput system based on ChIP-Seq for comprehensively mapping regulatory binding, and integrated this with expression data from the induction of the same factors. Our method allows us to map DNA binding of all MTB regulators in a consistent and comparable manner independent of regulatory function. Using this method we have reconstructed a regulatory network model based on over 50 transcriptions factors. The network doubles the number of regulators whose interactions have been studied in MTB, discovers thousands of interactions and assigns functions to a substantial number, suggests many more potentially functional interactions for even well-studied regulators, and displays predictive power for gene expression. The network model also reveals a direct and interconnection between the hypoxic response, lipid catabolism, lipid anabolism and the production of known immunomodulatory lipids, and protein degradation. Consistent with this, we observe substantial alterations in lipid, amino acid, and protein content in response to oxygen availability. The regulator network provides insight into the transcription factors underlying these changes. Using our regulatory network data – generated under independent normoxic conditions - we are able to generate models of steady state gene expression that allow us to predict MTB gene expression during hypoxia and re-aeration.

Keyword:

TOP

SS2_partD - Systems Biology of Infectious Disease

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 202BC
Presenting author: Jason McDermott , Pacific Northwest National Laboratory, United States

Session Chair:

Presentation Overview: Show/Hide

The study of infectious disease and the complex interplay between pathogens and their hosts has benefitted greatly from the ability to generate many different high-throughput measurements of systems, including transcriptomics, proteomics, and metabolomics. Ways to represent, interpret, and model such multimodal datasets allow improved understanding of the host-pathogen relationship at a systems biology level. We present recent results from systems biology studies of bacterial enteropathogens, Salmonella Typhimurium and Yersinia pestis, as well as respiratory viruses, influenza H5N1 and SARS coronavirus, interacting with their hosts. We will describe the use of network-based approaches to interpretation of high-throughput data and prediction of important components of the system, including experimental validation of some of these predictions. We will also describe how predictive modeling approaches can be used to model important aspects of the interaction and provide predictions of control points for pathogenesis and host response. Finally, we will discuss critical gaps that exist in the systems biology study of infectious diseases and future directions to address those gaps.

Keyword:

TOP

SS7_partA - Assessing the contribution of scientists to Wikipedia for Pfam and Rfam annotation

Date: Tuesday, July 17, 2:30 p.m. - 2:55 p.m.Room: 104C
Presenting author: Alex Bateman , Wellcome Trust Sanger Institute, United Kingdom

Session Chair:

Presentation Overview: Show/Hide

In this presentation I will show the latest survey of the scientific community's engagement with Wikipedia and its relevance to the annotation in Pfam and Rfam. Major challenges remain in: (1) educating experts in the field that Wikipedia contributions are a valuable communication tool; (2) giving non-technical scientists the confidence and knowledge of how to edit Wikipedia content.

Keyword:

TOP

SS7_partB - WikiPathways and How to Change the World (or at least your small corner of the world)

Date: Tuesday, July 17, 3:00 p.m. - 3:25 p.m.Room: 104C
Presenting author: Alexander Pico , Gladstone Institutes, United States

Session Chair:

Presentation Overview: Show/Hide

WikiPathways is a collaborative platform for collecting, curating and distributing biological pathway knowledge in the research community. We started WikiPathways with almost a decade of experience archiving pathway models as a conventional resource maintained by a small internal team of experts. Switching to a community curation approach has dramatically increased the size, quality and relevance of our content. Increased relevance is a particularly unique advantage of ‘community intelligence’ efforts that directly engage researchers in real-time. More and more, we are finding research communities eager to participate in data and knowledge repositories that utilize their contributions directly and transparently. Over the past 4 years, WikiPathways has grown from 100 registered users to over 2000, with a steadily increasing percentage making edits and contributing new content. The number of visits has doubled in the last year to over 10,000 per month. In this special session, we will present the lessons we have gleaned from launching and developing WikiPathways as a ‘community intelligence’ effort: how to set milestones for early success, how to utilize open source code and culture, how to tap into already established communities and resources, how to build data mining and analytical tools and services around your content, how to make use of new models of data sharing and publishing.

Keyword:

TOP

SS7_partC - The Gene Wiki: Crowdsourcing the annotation of human gene function

Date: Tuesday, July 17, 3:30 p.m. - 3:55 p.m.Room: 104C
Presenting author: Andrew Su , The Scripps Research Institute, United States

Session Chair:

Presentation Overview: Show/Hide

Comprehensively annotating the function of human genes is a formidable challenge for the biomedical research community. The goal of the Gene Wiki project is to create a continuously updated, community-reviewed and collaboratively-written review article for every human gene. The Gene Wiki currently takes the form of 10,000 articles in the online encyclopedia Wikipedia. This collection of articles is viewed over 50 million times and edited over 15,000 times per year. In this talk, we will describe our efforts to create a critical mass of users, to mine structured gene annotations from Gene Wiki text, and to integrate these data in bioinformatics analyses.

Keyword:

TOP

SS7_partD - Distributed Community Intelligence through the Scientific Discovery Game Foldit

Date: Tuesday, July 17, 4:00 p.m. - 4:25 p.m.Room: 104C
Presenting author: Firas Khatib , University of Washington, United States

Session Chair:

Presentation Overview: Show/Hide

Foldit is a graphical user interface representation of the Rosetta algorithm where players manipulate protein structures with the corresponding Rosetta energy shown in real time as their score. By leveraging human puzzle solving, pattern-recognition, and 3D spatial reasoning, humans are able to outperform many state of the art prediction methods. Foldit players have generated models accurate enough for successful molecular replacement and subsequent structure determination of a monomeric retroviral protease, despite not being given any experimental data. Foldit players have also been provided tools to encode their folding strategies, and within seven months one of these player-developed folding algorithms outperformed a previously published algorithm. Most recently, players were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24 residue helix-turn-helix motif, including a 13 residue insertion, that increased enzyme activity over 18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. The ability of an online gaming community to successfully guide large-scale protein structure prediction and design problems suggests that human creativity can extend down to molecular scale when given the appropriate tools.

Keyword:

TOP

TT01 -

Date: Sunday, July 15, 10:45 a.m. - 11:40 a.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT02 -

Date: Sunday, July 15, 10:45 a.m. - 11:10 a.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT03 -

Date: Sunday, July 15, 11:15 a.m. - 11:40 a.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT04 -

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT05 -

Date: Sunday, July 15, 11:45 a.m. - 12:10 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT06 -

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT07 -

Date: Sunday, July 15, 12:15 p.m. - 12:40 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT08 -

Date: Sunday, July 15, 2:30 p.m. - 2:55 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT09 -

Date: Sunday, July 15, 2:30 p.m. - 3:25 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT10 -

Date: Sunday, July 15, 3:00 p.m. - 3:25 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT11 -

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT12 -

Date: Sunday, July 15, 3:30 p.m. - 3:55 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT13 -

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT14 -

Date: Sunday, July 15, 4:00 p.m. - 4:25 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT15 -

Date: Monday, July 16, 10:45 a.m. - 11:10 a.m.Room: 202B/C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT16 -

Date: Monday, July 16, 10:45 a.m. - 11:40 a.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT17 -

Date: Monday, July 16, 10:45 a.m. - 11:40 a.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT18 -

Date: Monday, July 16, 11:15 a.m. - 11:40 a.m.Room: 202B/C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT19 -

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: 202B/C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT20 -

Date: Monday, July 16, 11:45 a.m. - 12:40 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT21 -

Date: Monday, July 16, 11:45 a.m. - 12:10 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT22 -

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: 202B/C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT23 -

Date: Monday, July 16, 12:15 p.m. - 12:40 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT24 -

Date: Monday, July 16, 2:30 p.m. - 3:25 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT25 -

Date: Monday, July 16, 3:30 p.m. - 3:55 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT26 -

Date: Monday, July 16, 4:00 p.m. - 4:25 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT27 -

Date: Tuesday, July 17, 10:45 a.m. - 11:10 a.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT28 -

Date: Tuesday, July 17, 11:15 a.m. - 11:40 a.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT29 -

Date: Tuesday, July 17, 11:45 a.m. - 12:10 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT30 -

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: 104A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT31 -

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: 104C
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT32 -

Date: Tuesday, July 17, 12:15 p.m. - 12:40 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT33 -

Date: Tuesday, July 17, 2:30 p.m. - 2:55 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT34 -

Date: Tuesday, July 17, 2:30 p.m. - 2:55 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT35 -

Date: Tuesday, July 17, 3:00 p.m. - 3:25 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT36 -

Date: Tuesday, July 17, 3:00 p.m. - 3:25 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT37 -

Date: Tuesday, July 17, 3:30 p.m. - 3:55 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT38 -

Date: Tuesday, July 17, 3:30 p.m. - 3:55 p.m.Room: 201B
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP

TT39 -

Date: Tuesday, July 17, 4:00 p.m. - 4:25 p.m.Room: 201A
Presenting author: , ,

Session Chair:

Presentation Overview: Show/Hide

Keyword:

TOP