20th Annual International Conference on
Intelligent Systems for Molecular Biology
PDF Print E-mail

Deprecated: mysql_escape_string(): This function is deprecated; use mysql_real_escape_string() instead. in /mnt/target04/348208/www.iscb.org/web/content/phpincludes/friendfeedapi/checkfeed.php on line 78

Late Breaking Research Presentation Schedule

As of May 1, 2012 (schedule subject to change)

Presenting Authors are are shown in bold:

LBR01 Sunday, July 15: 10:45 a.m. - 11:10 a.m.
Identifying tissue specificity of protein complexes based on a global map of human expression data
Room: 202B/C

Author(s):
Daniela Börnigen, Harvard University, Harvard School of Public Health, United States
Tune Pers, Technical University of Denmark, Denmark
Lieven Thorrez, KU Leuven, Belgium
Curtis Huttenhower, Harvard School of Public Health, Harvard University, United States
Yves Moreau, KU Leuven, Belgium
Søren Brunak, Technical University of Denmark, Denmark

Session Chair: Carl Kingsford
Abstract
Disease-causing human genetic variants are often highly tissue specific, but for most disease genes the primarily affected tissue is unknown. We hypothesized that the degree of coordinated expression between genes coding for distinct protein complex subunits might pinpoint the tissues in which linked diseases are manifested.

We thus developed a method to predict the tissue involvement of disease-linked protein complexes. For each susceptibility gene, we ranked tissues according the gene’s concordant expression with its protein interaction partners under normal conditions. The analysis thus combined a high-quality human interactome, its constituent set of protein complexes, a global map of human gene expression data in healthy tissues, and a predefined set of disease-linked genes.

We validated our hypothesis using this method by comparing our predictive tissue ranking with a literature-based gold standard ranking of 260 unique protein disease associations across 35 tissues. Our predictions achieved an average AUC of 0.78 over all tissues, with some (such as adipose or placental) tissues obtaining AUCs over 0.9. These were due to less heterogeneous cell types within the tissues, in contrast to tissues such as the blood or lymphatic system in which tissue specific disease involvement proved more difficult to predict. Our overall accuracy, however, suggests that the degree of coordinated expression of a disease gene and its protein interaction partners indeed provides insight into as to which tissue is most likely to be affected or causal in human disease.

Presentation PDF: Download Abstract

TOP

LBR02 Sunday, July 15: 11:15 a.m. - 11:40 a.m.
Quantifying the Systemic Consequences of Point Mutations in Proteins through Pathway Dynamics and Protein Structures
Room: 202B/C

Author(s):
Tammy Cheng, Cancer Research UK London Research Institute, United Kingdom
Lucas Goehring, Max Planck Institute, Germany
Linda Jeffery, Cancer Research UK London Research Institute, United Kingdom
Yu-En Lu, University of Cambridge, United Kingdom
Jacqueline Hayles, Cancer Research UK London Research Institute, United Kingdom
Béla Novák , University of Oxford, United Kingdom
Paul Bates, Cancer Research UK London Research Institute, United Kingdom

Session Chair: Carl Kingsford
Abstract
Gauging the systemic effect of point mutations in proteins is an important topic in the current post GWAS era. However, it is not a trivial task to understand how a change at the protein structure level eventually affects a cell's phenotypic outcome. This is because complex, multi-scale information, ranging from proteins to pathways, is usually required for obtaining analytical results with physiological meaning. With respect to the fact that the idea of integrating both protein and pathway dynamics to estimate the systemic impact of point mutations in proteins remain predominantly unexplored, we investigate the practicality of this approach by formulating mathematical models to study point mutations that involve the cell cycle control mechanism (G2 to Mitosis transition) in yeast and the neuro-cardio-facial-cutaneous syndrome associated with the human MAPK signalling pathway.

Presentation PDF: Download Abstract

TOP

LBR03 Sunday, July 15 : 11:45 a.m. - 12:10 p.m.
Regulatory Network Structure as the Dominant Determinant of Transcription Factor Evolutionary Rate in Yeast
Room: 202B/C

Author(s):
Jasmin Coulombe-Huntington, Boston University, United States

Session Chair: Carl Kingsford
Abstract
The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. However, since trans-level variation is known to account for much of the gene expression variation between strains, studying the evolution of trans-factors is crucial to understanding regulatory network evolution. Here, we systematically asses the different genomic and network-level determinants of transcription factor (TF) evolutionary rate in yeast and how they compare to those of generic proteins. We develop a novel method to demonstrate that transcription factors possess significantly distinct trends relating evolutionary rate to various genomic features, such as mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We then go on to show that the strongest predictor of transcription factor evolutionary rate is the median evolutionary rate of its target genes, followed by the fraction of target genes which are species-specific. After decomposing the regulatory network into positive and negative edges, we found that this effect is limited to activating regulatory relationships. This work is the first to establish the modularity of TF-target protein evolution and highlights key evolutionary differences between positive and negative regulation systems. We have also demonstrated that systems-level properties can leave evolutionary traces of comparable effect size to physical features such as interaction degree and expression level and that TF evolution in particular is best understood through a regulatory network-level perspective.

Presentation PDF: Download Abstract

TOP

LBR04 Sunday, July 15: 12:15 p.m. - 12:40 p.m.
Global and specific Regulation of mRNA Decay analyzed by Dynamic Transcriptome Analysis
Room: 202B/C

Author(s):
Achim Tresch, Ludwig-Maximilians-University Munich, Germany

Session Chair: Carl Kingsford
Abstract
To measure eukaryotic mRNA turnover, we developed comparative Dynamic Transcriptome Analysis (cDTA). cDTA provides absolute rates of mRNA synthesis and decay in Saccharomyces cerevisiae (Sc) cells with the use of Schizosaccharomyces pombe (Sp) as internal standard. We apply cDTA to Sc mutants of its transcription- and degradation machinery. We find that mutants with a decreased degradation show also a decreased transcriptional activity. Surprisingly, this negative feedback is mutual, i.e., mutants that are globally impaired in their RNA synthesis have a globally decreased decay. Extended kinetic modeling reveals that this mutual feedback is achieved by a factor that inhibits synthesis and a factor that enhances degradation.

Presentation PDF: Download Abstract

TOP

LBR05 Sunday, July 15: 2:30 p.m. - 2:55 p.m.
Fractionation, rearrangement and subgenome dominance
Room: 202A

Author(s):
David Sankoff, University of Ottawa, Canada

Session Chair: Olga Vitek
Abstract
Fractionation, the loss of duplicate genes after whole genome duplication (WGD), causes more gene order disruption than classical chromosomal rearrangements such as inversion or reciprocal translocation. WGD and fractionation are particularly prevalent in flowering plants. Gene order disruption follows from the partly random choice of which of the two copies is deleted, This artificially inflates the inferred amount of chromosomal rearrangement observed between the WGD descendant and an unduplicated sister genome. Our work is designed to computationally detect, characterize and correct for this impediment to the study of evolution.

We developed the "consolidation algorithm" to assess and correct for the gross errors in rearrangement inference caused by fractionation. In simulations our procedure almost completely wipes out this distortion.

In applying our method to the poplar genome, an ancient tetraploid, compared to a diploid sister genome, grapevine, we discovered that the majority of the apparent rearrangement is actually attributable to fractionation. Examining the consolidated regions detected by our algorithm, there are a number of regions much longer than those in the simulations, suggesting a non-independence of deletion events affecting neighboring genes, and clear tendency for genes to be deleted in one of the two homeologs, as would be predicted by the recent theory of subgenome dominance

Presentation PDF: Download Abstract

TOP

LBR06 Sunday, July 15: 3:00 p.m. - 3:25 p.m.
Internal pseudo-symmetry in proteins
Room: 202A

Author(s):
Andreas Prlic, University of California San Diego, United States
Spencer Bliven, UCSD, United States
Philippe Youkharibache, InPharmatics Corporation, United States
Peter Rose, UCSD, United States
Phil Bourne, UCSD, United States

Session Chair: Olga Vitek
Abstract
Symmetry in the quaternary structure of proteins is frequently associated with function. For example, symmetry plays a prominent role in models of enzyme activity. While the observation of symmetry in quaternary structure goes back to the very first protein structures, more and more cases of pseudo-symmetry within protein domains have been described. It is hypothesized that such symmetries can be linked to function and folding of proteins. Here, we attempt to verify this hypothesis by both systematically detecting pseudo-symmetry via a new algorithm and by manually investigating crafted alignments of symmetric proteins. The new algorithm detects internal pseudo-symmetry and repeats in protein chains and is available in the software CE-Symm. By applying it systematically we can detect such structural features in many examples that have previously not been described. We investigate the hypothesis that symmetry is related to function by manually analyzing many of the detected cases. Our results show that symmetry plays an important functional role not only in quaternary structure, but also within protein chains. We can identify local alignments between distant folds, in which symmetric subunits, here called “protodomains” are conserved. This allows us to gain novel insights into distant evolutionary relationships. Knowledge of internal symmetry is important for a better understanding of evolution, function and folding and newly resolved protein structures should be investigated for hidden internal pseudo-symmetries.

Presentation PDF: Download Abstract

TOP

LBR07 Sunday, July 15: 3:30 p.m. - 3:55 p.m.
Technology to identify global dynamics of protein interaction networks
Room: 202A

Author(s):
Nozomu Yachie, University of Toronto, Canada
Sedide Ozturk, University of Toronto, Canada
Joseph Mellor, University of Toronto, Canada
Atina Cote, University of Toronto, Canada
Anna Karkhanina, University of Toronto, Canada
Haiyuan Yu, Cornell University, United States
Pascal Braun, Dana Farber Cancer Institute, United States
David Hill, Dana Farber Cancer Institute, United States
Marc Vidal, Dana Farber Cancer Institute, United States
Frederick Roth, University of Toronto, Canada

Session Chair: Olga Vitek
Abstract
Cancer and other genetic diseases are mediated by a web of macromolecular interactions that are regulated dynamically (for example, through post-transcriptional modification). Thus, a technology that captures the regulated dynamics of a global-scale protein interaction network would be important to accelerate our understanding of complex diseases. In vivo assays such as affinity purification followed by mass spectrometry (AP-MS) capture interactions under one condition, while in vitro assays such as Y2H capture interactions that could occur under different conditions, so long as these interactions do not require a third co-factor or post-translational modifier. No current method has the ability to economically produce many “conditional interactome” maps, each in the presence of different co-factors or modifiers. Here we describe a new technology BFG-Y2H (Barcode Fusion Genetics-Y2H) which exploits the efficiencies of deep short-read sequencing and offers the potential to map dozens of genome-scale conditional interactomes for a given species by one researcher within one year with the cost of less than $1,000 per interactome.

Presentation PDF: Download Abstract

TOP

LBR08 Sunday, July 15: 4:00 p.m. - 4:25 p.m.
Assembling Acute Myeloid Leukemia RNA-seq Data to Infer Alternative Polyadenylation Site Usage
Room: 202A

Author(s):
Inanc Birol, Genome Sciences Centre BC Cancer Agency, Canada

Session Chair: Olga Vitek
Abstract
Alternative polyadenylation in 3’ UTRs is known to affect post-transcriptional gene regulation, and can be dysregulated in tumour cells. Thus identification of alternative polyadenylation site usage and measurement of expression levels of the resulting 3’ UTRs will be valuable for understanding tumor biology. In this study, we use RNA-seq data from the Illumina HiSeq 2000 platform to characterize the transcriptome repertoires of several Acute Myeloid Leukemia (AML) samples with and without NPM1 insertions, and investigate the association of this biomarker with 3’ UTR usage and expression.
To interrogate RNA-seq data for unbiased 3’ UTR reconstruction, we expanded the functionality of Trans-ABySS, our de novo transcriptome assembly tool. Trans-ABySS assembles RNA-seq data using a range of read-to-read overlap stringency levels to account for the sensitivity-specificity balance while reconstructing transcripts with a range of expression levels. Our preliminary analysis of AML transcriptomes indicate that our approach can assemble one or more 3’ UTRs for about 80% of genes that are expressed at 10-fold or more coverage, and offer a number of novel 3’ UTR predictions, which we will study further to assess their relationships to disease biology.

Presentation PDF: Download Abstract

TOP

LBR09 Monday, July 16 : 10:45 a.m. - 11:10 a.m.
CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction
Room: 202A

Author(s):
Steven Brenner, University of California, Berkeley, United States

Session Chair: Chad Myers
Abstract
The Critical Assessment of Genome Interpretation (CAGI, 'kā-jē) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. In this assessment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterizations by independent assessors. The CAGI experiment culminates with a community workshop and publications to disseminate results, assess our collective ability to make accurate and meaningful phenotypic predictions, and better understand progress in the field. A long-term goal for CAGI is to improve the accuracy of phenotype and disease predictions in clinical settings.
This presentation will focus on the practical implications of CAGI 2011 results on a diversity of challenges. The presentation will summarize the state-of-the-art in identifying the impact of variants in a metabolic enzyme and in an oncogene, and thus the appropriate use of such methods in basic and clinical research. CAGI has revealed the relative strengths of different prediction approaches, and the best will be described.
CAGI also explored genome-scale data, showing unexpected successes in predicting Crohn’s disease from exomes, as well as disappointing failures in using genome and transcriptome data to distinguish discordant monozygotic twins with asthma. Predictors had promising complementary approaches in predicting distinct response of breast cancer cell lines to a panel of drugs. Predictors also made measurable progress in predicting a diversity of phenotypes present in the personal genome project participants.
Current information including additional challenges is available at the CAGI website at http://genomeinterpretation.org.

Presentation PDF: Download Abstract

TOP

LBR10 Monday, July 16: 11:15 a.m. - 11:40 a.m.
Chromatin Structure and Genomic Context Influence Mitochondrial DNA Insertion in Mammalian Nuclear Genomes
Room: 202A

Author(s):
Junko Tsuji, University of Tokyo, Japan
Martin Frith, National Institute of Advanced Industrial Science and Technology, Japan
Kentaro Tomii, National Institute of Advanced Industrial Science and Technology, Japan
Paul Horton, National Institute of Advanced Industrial Science and Technology, Japan

Session Chair: Chad Myers
Abstract
It is known that remnants of partial or whole copies of mitochondrial DNAs are found in nuclear genomes. Such mtDNA-like sequences are called‚ NUMTs (Nuclear MiTochondrial sequences), and are integrated in the double-strand break sites of the nuclear genomes via non-homologous end joining repair. Several computational studies have investigated NUMTs, however those studies have not used appropriate methodology for sensitive detection of NUMTs and precise delineation of their boundaries. We developed a carefully considered protocol to redefine NUMT datasets of four mammalian species (human, rhesus, mouse, and rat). The issues we considered include appropriate alignment parameters, correct handling of circular mtDNA, masking of low complexity sequences, post-insertion duplication of NUMTs, long indels and validation of E-value thresholds. By analyzing the redefined datasets, we found new characteristics of NUMT integration sites. Most of the inferred insertion points of NUMTs in all organisms tested occur in the vicinity of retrotransposons (82.9-90.4%), and the insertion sites show the significant level of over-representation of A+T oligomers (p<0.0001). As well as such genomic contexts, chromatin structures also influenced the NUMT insertion. We found that NUMT insertion sites show a strong tendency to have high predicted DNA curvature, and often occur in experimentally defined nucleosome depleted regions. In light of the above results, the mtDNA insertion events are surely influenced by observed specific chromatin structures and genomic contexts.

Presentation PDF: Download Abstract

TOP

LBR11 Monday, July 16: 11:45 a.m. - 12:10 p.m.
Computing with Chromatin Modification
Room: 202A

Author(s):
Barbara Bryant, Constellation Pharmaceuticals, United States
Greg Tucker-Tellogg, National University of Singapore, Singapore

Session Chair: Chad Myers
Abstract
In living cells, DNA is wrapped around histone octamers to make the nucleosomes that comprise chromatin. The histones and DNA can be modified with chemical groups that are added, removed and recognized by multi-functional molecular complexes. Here we present a computational model, in which chromatin modifications are information units that can be written onto a one-dimensional chromatin memory. Chromatin-modifying complexes are modeled as read-write rules that operate on several adjacent nucleosomes. We illustrate the use of this “chromatin computer” by writing programs to solve problems that cannot be solved with finite state automata or logic circuits. We show the execution of these programs on a chromatin computer simulator, and provide animated snapshots of the intermediate states of the nucleosome memory. We model additional features of biological chromatin, resulting in more efficient computation. This formalism is useful both analytically, to model chromatin biology, and theoretically, as a programming paradigm.

Presentation PDF: Download Abstract

TOP

LBR12 Monday, July 16: 12:15 p.m. - 12:40 p.m.
Transcription factor target gene identification based on ChIP-seq data
Room: 202A

Author(s):
Andreas Beyer, TU Dresden, Germany
Weronika Sikora-Wohlfeld, TU Dresden, Germany
Marit Ackermann, TU Dresden, Germany
Eleni Christodoulou, TU Dresden, Germany

Session Chair: Chad Myers
Abstract
Chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) has been instrumental for elucidating transcriptional networks by measuring the genome-wide binding of proteins at high resolution. Despite the precision of these experiments it is not trivial to identify the genes that are regulated through the observed bindings. A lot of recent research has been devoted to the correct identification of binding sites, but very little to predicting target genes. Here we present a comprehensive evaluation of computational methods used to define target genes of transcription factors (TFs) based on ChIP-seq data. In order to systematically analyze target gene prediction we structured the process into three steps and we evaluated alternatives for each of these steps. Using 66 ChIP-seq and 23 expression datasets we could show that parameter-free methods (not requiring any tunable parameters) better adapt to the specificities of a particular ChIP-seq dataset. Our analysis revealed a potential bias when comparing ChIP-seq and perturbation expression data sets due to unregulated genes. We show that target genes with the highest TF association scores tend to respond later than medium scoring targets, which partly explains the poor overlap typically observed between ChIP-seq and expression data. Finally, we investigated the clustering of TF target genes in the genome, revealing 95 regions with highly significant enrichment of targets of 42 different factors.

Presentation PDF: Download Abstract

TOP

LBR13 Monday, July 16: 2:30 p.m. - 2:55 p.m.
Fast and accurate metagenomic profiling of microbial community composition using unique clade-specific marker genes
Room: 202A

Author(s):
Nicola Segata, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Annalisa Ballarini, University of Trento, It
Vagheesh Narasimhan, Harvard School of Public Health, United States
Olivier Jousson, University of Trento, It
Curtis Huttenhower, Harvard School of Public Health, United States

Session Chair: Predrag Radivojac
Abstract
Identifying which organisms populate a microbial community and in what proportions is crucial for characterizing human-associated microbiomes. Shotgun sequencing allows biological function and phylogenetic composition to be assayed simultaneously, but existing taxonomic profiling methods are impractical for the scope of current datasets. We propose MetaPhlAn, a novel approach incorporating clade-specific marker genes identified computationally using 2,887 reference genomes. The resulting catalog of 400,000 genes permits unambiguous taxonomic assignments from metagenomic data more accurately and >50 times faster than current approaches. The method was evaluated on terabases of short reads in addition to ten synthetic metagenomes, achieving correlations with true organismal relative abundances over 0.99 for high-complexity and log-normally distributed communities. Applied to the 691 metagenomes of the Human Microbiome Project, MetaPhlAn profiled the microbial species populating all 15 assayed body sites together with their abundance pattern signatures. Specifically, on 51 vaginal microbiomes, MetaPhlAn agreed closely with 16S-based results and further identified the Lactobacillus species forming five distinct microbiome types. An analysis of marine ecosystems confirmed detection of archaeal organisms and MetaPhlAn's applicability and accuracy even in communities with limited numbers of sequenced reference genomes. Finally, MetaPhlAn allowed us to perform a meta-analysis integrating 263 samples from the HMP and MetaHIT projects, providing the largest metagenomic community profiling to date of the human gut microbiota. This dataset highlights a range of dominant Bacteroides species among these American and European cohorts, and it suggests complexity at the species level beyond that captured by the recently proposed gut enterotypes.
MetaPhlAn is available at http://huttenhower.sph.harvard.edu/metaphlan.

Presentation PDF: Download Abstract

TOP

LBR14 Monday, July 16 : 3:00 p.m. - 3:25 p.m.
Optimizing functional genomics screening strategies for drug target prediction
Room: 202A

Author(s):
Raamesh Deshpande, University of Minnesota, United States

Session Chair: Predrag Radivojac
Abstract
Developing new drugs is a lengthy and expensive process. In comparison, many
compounds have been identified from natural sources but their activity on living cells has not been haracterized. Recent studies have proven the utility of chemical genomics based on yeast functional genomics tools for the discovery of compounds’ modes of action. Specifically, the chemical genetic interactions of a particular compound across a large nonessential deletion strain collection should mimic the genetic interactions of the corresponding target. One limitation of this approach, however, is that it requires a relatively high volume of compound given the size of the deletion collection to be queried. As a solution to this problem, we propose a method to identify a small subset of the deletion collection that is the most informative in discovering compounds’ modes of action. We have applied this method in the context of yeast and identified a diagnostic strain set comprising around 5% of the non-essential deletion mutant collection. We show that even with a small fraction of the genome, this diagnostic set performs comparably to complete chemicalgenetic profiles. We also demonstrate that our method provides substantial improvement over baseline strategies based on selection of either random genes or hubs. Large-scale chemical genomic screens of natural compound libraries based on this diagnostic set of genes are currently in progress.

Presentation PDF: Download Abstract

TOP

LBR15 Monday, July 16: 3:30 p.m. - 3:55 p.m.
Structure-Based Ligand Discovery for Solute Carrier Transporters
Room: 202A

Author(s):
Avner Schlessinger, University of California, San Francisco, United States

Session Chair: Predrag Radivojac
Abstract
Polypharmacology is a phenomenon in which a drug binds multiple rather than a single target with significant affinity. The effect of polypharmacology on therapy can be positive (effective therapy) and/or negative (side effects). Solute Carrier (SLC) Transporters are membrane proteins that control the uptake and efflux of various solutes such as amino acids, sugars, and drugs. SLCs can be drug targets themselves or be responsible for absorption, targeting, and disposition of drugs. We describe an integrated structure-based approach for identifying protein-small molecule interactions. Particularly, we use comparative modeling, virtual screening, and experimental validation (with kinetic measurements of uptake), to identify interactions between SLC transporters and small molecules ligands, including prescription drugs and metabolites. For example, we discovered that several existing prescription drugs interact with the norepinephrine transporter, NET, which may explain some of the pharmacological effects (i.e., efficacy and/or side effects) of these drugs. We also apply our approach to related transporters, to identify rules for substrate specificity in a key membrane transporter family of the nervous system. Our systems pharmacology approach is generally applicable to structural characterization of protein families other than SLCs, including receptors, ion-channels, and enzymes, as well as their interactions with small molecule ligands.

Presentation PDF: Download Abstract

TOP

LBR16 Monday, July 16: 4:00 p.m. - 4:25 p.m.
Data-driven Prediction of Drug Effects and Interactions
Room: 202A

Author(s):
Nicholas Tatonetti, Stanford University, United States

Session Chair: Predrag Radivojac
Abstract
Adverse drug events remain a leading cause of morbidity and mortality around the world and many are not detected during clinical trials. Fortunately, regulatory agencies and other institutions maintain large collections of adverse event reports, and these databases present an opportunity to study drug effects from patient population data. However, confounding factors such as concomitant medications, patient demographics, and reasons for prescribing a drug often are uncharacterized in spontaneous reporting systems (for example, patient medical histories), and these omissions can limit the use of quantitative signal detection methods used in the analysis of such data. Here, we present an adaptive data-driven approach for correcting these factors in cases for which the covariates are unknown or unmeasured and combine this approach with existing methods to improve analyses of drug effects using three test data sets. We also present a comprehensive database of drug effects (OFFSIDES) and a database of drug-drug interaction side effects (TWOSIDES). To demonstrate the biological use of these new resources, we used them to identify drug targets, predict drug indications, and discover drug class interactions. We then corroborated 47 (P < 0.0001) of the drug class interactions using an independent analysis of electronic medical records. Patients taking combined treatment of selective serotonin reuptake inhibitors and thiazides had a significantly increased incidence of prolonged QT. We conclude that confounding effects from covariates in observational clinical data can be controlled in data analyses and thus improve the detection and prediction of adverse drug effects and interactions.

Presentation PDF: Download Abstract

TOP

LBR17 Tuesday, July 17: 10:45 a.m. - 11:10 a.m.
MalaCards – the integrated Human Malady Compendium
Room: 202B/C

Author(s):
Marilyn Safran, Weizmann Institute of Science, Israel

Session Chair: Yinyin Yuan
Abstract
We introduce MalaCards, an integrated database of human maladies and their annotations (malacards.weizmann.ac.il), modeled on the architecture and richness of the popular GeneCards human genes database, (www.genecards.org). MalaCards mines varied sources to generate a ‘card’ for each disease via: 1. Identifying sources of nomenclature/annotation, targets for disease data mining; 2. Developing algorithms for merging heterogeneous disease names, and defining unique identifiers. For example, alzheimer’s disease, ad, dementia alzheimer’s type, are merged under Alzheimer Disease, acronym AD, ID=ALZ001, with others listed as aliases (see malacards.weizmann.ac.il/card/index/ALZ001); 3. Engineering scripts to mine annotations; 4. Building MalaCards V1.01(alpha), with thousands of user-friendly ‘cards’ for all incorporated maladies, containing a variety of sections; 5. Implementing a strategy whereby detailed gene-disease relationships within GeneCards are used to create disease-specific content, leveraging the GeneCards relational database and search engine; 6. Constructing a second-tier annotator, based on GeneDecks Set Distiller, a GeneCards suite member. For example, diseases related to the key disease are computed to be those maximally associated with the set of found genes. Similarly, we obtain drugs/compounds, publications and mouse phenotypes contextually related to the disease; 7. Formulating scores for prioritizing derived annotations; 8. Initiating QA based on extensive knowledge within the Crown Human Genome Center. As our R&D continues, we plan to expand the list of annotation sources and sections, and include genetic variation details. This will be enhanced by collaborations with researchers outside of our group, and expanded by the initiation of systems biology tools, towards the goal of enabling novel biomedical discoveries.

Presentation PDF: Download Abstract

TOP

LBR18 Tuesday, July 17: 11:15 a.m. - 11:40 a.m.
Simultaneous host-parasite transcriptomes provide insight into malarial host-parasite interactomes
Room: 202B/C

Author(s):
Adam Reid, Wellcome Trust Sanger Institute, United Kingdom
Matthew Berriman, Wellcome Trust Sanger Institute, United Kingdom

Session Chair: Yinyin Yuan
Abstract
Molecular interactions are key to the ability of a parasite to enter and persist in its host. However our understanding of the genes and proteins involved in these interactions is no more than partial in even the most well understood systems. We have applied the popular concept of using correlated gene expression profiles to identify molecular interactions in one species to the interspecific (host-parasite) case. We show for the first time that genes in different species with correlated expression are more likely to encode proteins which interact or are otherwise involved in host-parasite interaction. We go on to examine predicted host-parasite interactions between the malaria parasite and both its mammalian host and insect vector.

Presentation PDF: Download Abstract

TOP

LBR19 Tuesday, July 17: 11:45 a.m. - 12:10 p.m.
A Predictive Gene Expression Model for quantifying Plasmodium falciparum red blood cell stages
Room: 202B/C

Author(s):
Vagheesh Narasimhan, Harvard University, United States
Regina Joice, Harvard University, United States
Curtis Huttenhower, Harvard University, United States
Matthias Marti, Harvard University, United States
Jacqui Montgomery, Malawi-Liverpool-Wellcome Trust, United Kingdom
Karl Seydel, Michigan State University, United States
Daouda Ndiaye, University of Cheikh Anta Diop, Sn
Johanna Daily, Albert Einstein College of Medicine, United States
Kim Williamson, Loyola University, Chicago, United States
Terrie Taylor, Michigan State University, United States
Danny Milner, Harvard University, United States

Session Chair: Yinyin Yuan
Abstract
P. falciparum, the parasitic causative agent of malaria, undergoes a complex staged life cycle during its infection of human hosts. The transcriptional expression program of this cycle has been well-modeled, but not that of the small minority of these stages that are transmissible among hosts and thus offer a potential target for preventative interventions. We have thus developed a quantitative model for determining the proportions of transmissible morphological stages of P. falciparum in a mixed population based on transcript levels. Our model consists of a constrained linear regression, in which each transcript's total measured expression level is the sum of parasites' contributions from each life cycle stage. The model was trained and initially cross-validated using a set of five published in vitro microarray time courses in which stage distributions were determined by light and fluorescence microscopy. To apply this method in vivo, we selected the minimum number of markers needed to quantify the stage distribution using a combination of model selection, stage-specificity, and qRT-PCR primer design. We then assessed the model on microarray and qRT-PCR expression measurements from blood samples of 40 malaria patients from Valingara, Senegal, revealing that only a small subset of patients carry transmissible parasite stages. In addition to the model's ability to capture enriched biomolecular processes within transmissible malaria stages, we believe the field-applicable qRT-PCR assay may be a useful tool for future control of malaria transmission through stage-specific targeted interventions.

Presentation PDF: Download Abstract

TOP

LBR20 Tuesday, July 17: 12:15 p.m. - 12:40 p.m.
Achieving better agreement among microarray disease studies through automatic correction for latent variables
Room: 202B/C

Author(s):
Maria Chikina, Mount Sinai Medical School, United States
Stuart Sealfon, Mount Sinai School of Medicine, United States

Session Chair: Yinyin Yuan
Abstract
Microarray studies with human subjects often have limited sample sizes, hampering the ability of differential expression analysis to make trustworthy predictions of biomarkers associated with disease. Existing techniques for meta-analysis address this problem by aggregating the results of multiple datasets to gain statistical power, but the performance of this kind approach is limited by the fact that human gene expression is influenced by many non-random factors such as genetics, sample preparations, tissue heterogeneity, etc. that may contribute to the lack of inter-study agreement.

We show that it is in fact possible to carry out an automatic correction of individual datasets to reduce the effect of such `latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition will show better agreement once each is corrected, and allowing for more trustworthy aggregated predictions. We demonstrate our approach, which involves a crucial modification of the method of "surrogate variable analysis", on studies of multiple sclerosis. We find improved agreement across varying study designs, platforms, and tissues, and are able to make a number of novel predictions. Our analysis implicates several metabolic pathways contributing to the emerging understanding of metabolic involvement in MS pathology.

Presentation PDF: Download Abstract

TOP

LBR21 Tuesday, July 17 : 2:30 p.m. - 2:55 p.m.
Matrix geometry determines optimal cancer migration strategy and modulates response to therapeutic agents
Room: 202B/C

Author(s):
Melda Tozluoglu, Cancer Research UK, United Kingdom

Session Chair: Christina Curtis
Abstract
Cell motility is required for many biological processes, including cancer metastasis. The molecular requirements for migration and morphology of migrating cells can vary considerably depending on matrix geometry; therefore, predicting the optimal migration strategy or the effect of experimental perturbation is difficult. Here, we present a model of single cell motility that encompasses actin polymerisation based protrusions, cell cortex asymmetry, membrane blebbing, local heterogeneity, cell-extracellular matrix adhesion, and varying extracellular matrix geometries. This is used to explore the theoretical requirements for rapid migration in different matrix geometries. Confined matrix geometries cause profound changes in the relationship of adhesion and contractility to cell velocity; indeed cell-matrix adhesion is dispensable for migration in discontinuous confined environments. The utility of the model is shown by predicting the effect of different drugs and integrin depletion in vivo based only on simple in vitro measurements. Multiphoton intravital imaging of melanoma is used to verify bleb-driven migration of both melanoma and endothelial cells at tumour margins, and the predicted response to drugs.

Presentation PDF: Download Abstract

TOP

LBR22 Tuesday, July 17: 3:00 p.m. - 3:25 p.m.
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer
Room: 202B/C

Author(s):
Yves Lussier, University of Illinois at Chicago, United States

Session Chair: Christina Curtis
Abstract
Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic interpretation of expression arrays remains challenging. Additionally, there is surprisingly little gene overlap among distinct clinically validated signatures. These “causality challenges” hinder the adoption of signatures as compared to functionally well-characterized single gene biomarkers. To increase the utility of multi-gene signatures in survival studies, we developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional Analysis of Individual Microarray Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. Comparing head and neck squamous cell carcinoma (HNSCC) samples with non-tumor controls, the precision and recall of deregulated FAIME-derived mechanisms of pathways and molecular functions are comparable to those produced by conventional cohort-wide methods (e.g. GSEA). The overlap of “Oncogenic FAIME Features of HNSCC” among three HNSCC datasets is more significant than the gene overlap. These Oncogenic FAIME Features of HNSCC accurately discriminated tumors from control tissues and stratify recurrence-free survival in patients from two independent studies. Previous approaches depending on group assignment of individual samples before learning a classifier are limited by design to discrete-class prediction. In contrast, FAIME calculates mechanism profiles for individual patients without requiring group assignment in validation sets. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes in clinical outcome studies.

Presentation PDF: Download Abstract

TOP

LBR23 Tuesday, July 17: 3:30 p.m. - 3:55 p.m.
Exploring the subclonal architecture of breast cancer
Room: 202B/C

Author(s):
David Wedge, Wellcome Trust Sanger Institute, United Kingdom

Session Chair: Christina Curtis
Abstract
Although the existence of substantial genetic heterogeneity within a tumour is now widely accepted, fundamental questions remain about the dynamics of Darwinian evolution in cancer. Our work aims to answer some of these questions using a variety of bioinformatic algorithms to characterise the subclonal architecture of 21 breast cancers from their whole-genome sequences.

We gain substantial statistical power to discriminate copy number aberrations (CNAs) present in a small fraction of tumor cells through the application of haplotype phasing. Further, by combining novel segmentation algorithms, including a Hierarchical Dirichlet Process - Hidden Markov Model, with constraints that reflect the known structure of the sequence data, we are able to detect CNAs present in less than 5% of the sampled cells.

We model the patterns of clonal and subclonal single nucleotide mutations using a Bayesian Dirichlet process, which simultaneously identifies the number of subclones, the fraction of tumour cells within each subclone and the mutation burden within each subclone. Using novel methods to phase mutations relative to each other and to heterozygous SNP loci, this information is used to discern the phylogenetic relationships between the subclones.

Applying our methods to 20 breast cancers reveals a complex subclonal landscape, reflecting the variety of previous genomic aberrations and clonal expansions that have shaped the tumours. In particular, they show that every tumour harbours a dominant subclone, whose expansion may represent the final rate-limiting step in carcinogenesis.

Presentation PDF: Download Abstract

TOP

LBR24 Tuesday, July 17: 4:00 p.m. - 4:25 p.m.
The Landscape of Somatic Structural Variations in Human Cancer Genomes
Room: 202B/C

Author(s):
Lixing Yang, Harvard University, United States
Peter Park, Harvard, United States

Session Chair: Christina Curtis
Abstract
The cancer genome is known to harbor various somatic rearrangements. However, the full spectrum of these alterations and their underlying mechanisms remain poorly understood. Here, we performed a comprehensive identification of somatic Structural Variations (SVs) and the mechanisms generating them, using high-coverage whole-genome sequencing data of tumor and matched normal samples from 48 individuals across five tumor types (glioblastoma, ovarian, colon, prostate and multiple myeloma). By analyzing a total of 160 billion Illumina short reads, 4555 somatic SVs have been identified with true positive rate of 91%. The patterns of rearrangements are highly variable across tumor types and among individuals, with translocations (46%) being the most abundant, followed by deletions (36%) and tandem duplications (18%). Our detailed reconstruction of the events responsible for CDKN2 loss, EGFR and CDK4 gain in glioblastoma revealed much more complex sets of events than previously assumed, sometimes involving dozens of fragments. Our analysis of the breakpoints at base pair resolution shows that focal CDKN2 loss is often generated by non-homologous end joining but could also be generated by microhomology-mediated end joining or template switching mechanisms. Focal amplifications are sometimes generated by complex tandem duplications via template switching mechanism. This study provides new insights on cancer genome rearrangements and their contribution to cancer progression.

Presentation PDF: Download Abstract

TOP