20th Annual International Conference on
Intelligent Systems for Molecular Biology
PDF Print E-mail

Deprecated: mysql_escape_string(): This function is deprecated; use mysql_real_escape_string() instead. in /mnt/target04/348208/www.iscb.org/web/content/phpincludes/friendfeedapi/checkfeed.php on line 78

Oral Presentation Schedule

All Morning OPs
Session Chair: Janita Thusberg, Buck Institute, United States
All Afternoons OPs
Session Chair: Yana Bromberg, Rutgers, The State University of New Jersey, United States


OPT01 - Comparing Diverse Pluripotent Stem Cell Fate Programs Using Cell-Type-Specific Data Integration and Machine Learning Methods
Date: Sunday, July 15: 10:45 a.m. - 11:10 a.m.Scientific Area: Systems Biology and Networks
Room: 104C

Presenting author: Karen Dowell, The Jackson Laboratory, United States

Additional authors:
Allen Simons, The Jackson Laboratory, United States
Zack Wang, Maine Medical Center Research Institute, United States
Matthew Hibbs, The Jackson Laboratory, United States

Presentation Overview: Show/Hide
Machine learning techniques that apply Bayesian networks for genomic data integration were originally developed to predict functional linkages among proteins in yeast and have since been successfully applied to explore functional gene relationships in mouse and human systems. These types of predictive functional networks enable researchers to explore proteins likely to be related based on data collected from scores of studies, under hundreds of conditions, applying many different high-throughput experimental techniques. However, little work done to date considers the importance of cellular context for data integration in mammalian systems, in which the same protein may perform very different functions in different cell types. We developed a Bayesian network machine learning methodology designed to generate predictive functional networks using cell-type-specific, high-throughput mammalian data. For this study, we assembled separate compendiums of mouse and human pluripotent stem cell data that we integrated into cell-type-specific functional relationship networks focused on a biological processes known to be active in those cell types: self-renewal and cell fate determination. We have compared networks generated for different pluripotent stem cell types in different species, analyzed and compared the biological content of these predictive networks, and selected top predictions for experimental validation. These pluripotent stem cell networks will be publicaly available at StemSight.org, our portal for high-throughput stem cell data analysis. Our results demonstrate that Bayesian network integration of high-throughput data restricted to a single cell type can significantly enhance the predictive clarity of mammalian functional relationship networks.
TOP


OPT02 - The effect of ribosome collisions and queuing on gene expression
Date: Sunday, July 15: 10:45 a.m. - 11:10 a.m.Scientific Area: Sequence analysis
Room: 104C

Presenting author: Marlena Siwiak, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Poland

Additional authors:
Piotr Zielenkiewicz, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Poland

Presentation Overview: Show/Hide
The movement of translating ribosomes on a transcript allows collisions and formation of ribosome traffic jams, leading to perturbed and improper synthesis of the protein product. Earlier reports revealed several mechanisms that evolved to prevent this phenomenon, however, no quantitative measure of ribosome queuing was defined so far.

We introduced two theoretical measures of mRNA susceptibility to ribosome collisions: dE - the total time wasted by the second translating ribosome due to collisions with the first, and Z - the total number of collisions between two translating ribosomes. Using these measures, we examined the role of ribosome collisions in shaping genes expression levels and analyzed transcript specific features responsible for collision-free ribosome movement. We discovered that the measures correlate negatively with expression levels over distinct sets of genes and the strength of this relationship may achieve the level of correlations with GC content, mRNA secondary structure and codon usage. Additionally, although codons translated with slower velocity tend to cause collisions more often, the final effect depends on their position in the coding sequence. We also confirm that transcripts more resistant to ribosome collisions are able to persist longer in a cell.

Based on these results we state that the mRNA susceptibility to queuing is another feature that must be taken into account when considering translation productivity. The presented results may have important implications for research on translational productivitiy and heterologous expression.
TOP


OPT03 - A Metagenomic Study of Diet-Dependent Interaction Between Gut Microflora and Host in Infants
Date: Sunday, July 15: 10:45 a.m. - 11:10 a.m.Scientific Area: Bioinformatics of Health and Disease
Room: 104C

Presenting author: Iddo Friedberg, Miami University, United States

Additional authors:
Scott Schwartz, Texas A&M University, United States
Iddo Friedberg, Miami University, United States
Laurie Davidson, Texas A&M University, United States
Jennifer Goldsby, Texas A&M University, United States
David Dahl, Texas A&M University, United States
Damir Herman, University of Arkansas for Medical Sciences, United States
Mei Wang, University of Illinois, Urbana, United States
Sharon Donovan, University of Illinois, Urbana, United States
Robert Chapkin, Texas A&M University, United States

Presentation Overview: Show/Hide
Gut flora species and functional composition strongly affect the health and well-being of the host. With the advent of genomic-based personalized medicine, it is important to develop a synthetic approach to study the host transcriptome and the microbiome simultaneously. Early microbial colonization in infants is critically important for directing neonatal intestinal and immune development, and is especially attractive for studying the development of human-commensal interactions. Here we report the results from a study of the gut microbiome and host epithelial transcriptome of three month old exclusively breast-fed and formula-fed infants. Both host mRNA expression and the microbiome phylogenetic and functional profiles provided strong feature sets that distinctly classified the formula-fed from the breast-fed infants. To determine the relationship between host epithelial cell gene expression and the bacterial metagenomic-based profiles, the host transcriptome and functionally profiled microbiome data were subjected to novel multivariate statistical analyses. Gut microbiota metagenome virulence characteristics strongly differed between the formula-fed and the breast-fed infants, while concurrently immunity/mucosal related gene expression in epithelial cells differed as well. Our data provide insight into the integrated responses of the host and microbiome to dietary substrates in the early neonatal period. We demonstrate that differences in diet can affect, via gut colonization, both infant gut development and the innate immune system. Furthermore, the methodology presented in this study can be easily adapted to assess other host-commensal and host-pathogen interactions using genomic and transcriptomic data, providing a synthetic genomics-based picture of host-commensal relationships.
TOP


OPT04 - The Transcriptomic Landscape of Learning and Memory Formation
Date: Sunday, July 15: 11:15 a.m. - 11:40 a.m.Scientific Area: Microarrays
Room: 104C

Presenting author: Lucia Peixoto, University of Pennsylvania, United States

Additional authors:
Mathieu Wimmer, University of Pennsylvania, United States
Shane Poplawski, University of Pennsylvania, United States
Nancy Zhang, University of Pennsylvania, United States
Ted Abel, University of Pennsylvania, United States

Presentation Overview: Show/Hide
Long-term memory reflects the persistent changes in the brain that result from learning. Several studies have shown that long term memory formation requires transcription and translation, and that this requirement is limited to defined “critical periods”. Genome-wide microarray studies in our lab 30 minutes after contextual fear conditioning show regulation of a substantial number of genes, validating the potential of a genome-wide approach to understand the transcriptional changes that underlie memory formation. In this study gene expression was examined using microarrays before contextual fear conditioning, during the established critical periods for memory consolidation (30 minutes, 4 and 12 and 24 hours after learning) and after memory retrieval in C57BL/6J mice. A similar time-course was performed without the learning experience to model the effect of circadian time. The study was randomized, collecting one sample per time-point per-day, for a total of n=9 mice per time-point. Normalization was carried out using affymetrix powertools and statistical analysis was performed using R. Our study shows that the biggest changes in gene expression happen 30 min after learning and after retrieval of memory. Up-regulated genes after acquisition and retrieval overlap greatly and are involved in transcriptional control. This was verified by q-PCR of known genes induced at 30 minutes. Interestingly, memory consolidation down-regulates chromatin assembly while retrieval down-regulates RNA processing. Almost no-transcriptional changes can be detected 4 and 12 hours after learning. In addition, several novel non-coding RNAs induced after memory formation and retrieval have been identified and selected for follow up studies.
TOP


OPT05 - Improving PPI predictions from coAP-MS data through sampling
Date: Sunday, July 15: 11:15 a.m. - 11:40 a.m.Scientific Area: Interactions
Room: 104C

Presenting author: George Tucker, Massachusetts Institute of Technology, United States

Additional authors:
Po-Ru Loh, Massachusetts Institute of Technology, United States
Bonnie Berger, Massachusetts Institute of Technology, United States

Presentation Overview: Show/Hide
Comprehensive protein-protein interaction (PPI) maps are a powerful resource for uncovering the molecular basis of genetic interactions and providing mechanistic insights. Coaffinity purification combined with tandem mass spectrometry (coAP-MS) has been used to generate PPI maps at proteome scale, but results are confounded by both high false positive and false negative rates.

To address these issues, several methods have been developed to post-process coAP-MS datasets. These generally fall in two classes: spoke and matrix models. Spoke models produce confidence scores on directly observed interaction data, whereas matrix models additionally infer interactions that are not directly observed and hence have broader coverage at the expense of increased false positives.

Recent literature has shown promising results from matrix model methods. However, with few exceptions, these methods only consider binary experimental data (where each possible interaction is deemed either observed or unobserved), throwing away any quantitative information from the experiment such as spectral counts.

We propose a novel approach to incorporating quantitative interaction information into coAP-MS PPI prediction. Our methodology introduces a probabilistic framework that addresses the uncertainty of observed interactions, for example interactions with low spectral counts. Using a sampling-based approach, we model this uncertainty with an ensemble of possible alternative experimental outcomes. Importantly, this procedure allows us to directly harness previous methods without modification, thus extending previous methods to use quantitative information. We demonstrate that our approach improves interaction prediction performance on the recently published Drosophila Protein interaction Map (DPiM), the largest Drosophila coAP-MS dataset to date, which includes nearly 5000 proteins.
TOP


OPT06 - Discovery and Interaction Analysis of Alternatively Spliced Isoforms of Autism Candidate Genes
Date: Sunday, July 15: 11:15 a.m. - 11:40 a.m.Scientific Area: Interactions
Room: 104C

Presenting author: Shuli Kang, University of San Diego, United States

Additional authors:
Xinping Yang, Dana-Farber Cancer Institute, United States
Guan Lin, University of San Diego, United States
Roser Corominas, University of San Diego, United States
Yun Shen, Dana-Farber Cancer Institute, United States
Lila Ghamsari, Dana-Farber Cancer Institute, United States
Shelley Wanamaker, Dana-Farber Cancer Institute, United States
Stanley Tam, Dana-Farber Cancer Institute, United States
Maria Rodriguez, Dana-Farber Cancer Institute, United States
Martin Broly, Dana-Farber Cancer Institute, United States
Jonathan Sebat, University of San Diego, United States
Kourosh Salehi-Ashtiani, Dana-Farber Cancer Institute, United States
David Hill, Dana-Farber Cancer Institute, United States
Marc Vidal, Dana-Farber Cancer Institute, United States
Tong Hao, Dana-Farber Cancer Institute, United States
Lilia Iakoucheva, University of San Diego, United States
Lilia Iakoucheva, University of San Diego, United States

Presentation Overview: Show/Hide
Autism is a neurodevelopmental disorder involving a large number of functionally diverse genes. Currently, it is not completely understood how these genes interact with each other and with other human genes. Even less is known about the influence of alternative splicing (AS) on protein-protein interactions. Here, we performed high-throughput splice isoform discovery for 191 autism candidate genes from fetal and adult human brain RNA, and then screened them for interactions with 15,000 human ORFs. We have identified 373 brain-expressed AS isoforms, 226 of which were novel. This increased the isoform space of autism candidate genes by 29%. We then build autism-centered “splice-actome” consisting of 630 isoform-level interactions. By incorporating isoform interactions into the network, we were able to expand it by 25%. The comparison between fetal and adult isoform networks demonstrated 59% overlap, emphasizing network-level similarities and differences between these two brain tissues. In order to evaluate the probability of isoforms to share partners, we have implemented Fraction of Shared Interactions (FSI) score. This score allowed identification of contrasting isoforms groups: those that share the majority of their interaction partners, and those that have unique partners. Furthermore, isoform interactions also influenced topological properties of the network such as its connectivity and modularity. This work clearly demonstrates that “splice-actome” adds another layer of complexity to autism network, and may be necessary step towards better understanding of other disease networks.
TOP


OPT07 - iFad: an integrative factor analysis model for drug-pathway association inference
Date: Sunday, July 15: 11:45 a.m. - 12:10 p.m.Scientific Area: Bioinformatics of Health and Disease
Room: 104C

Presenting author: Haisu Ma, Yale University, United States

Additional authors:
Hongyu Zhao, Yale University, United States

Presentation Overview: Show/Hide
Pathway-based approaches for drug discovery consider the therapeutic effects of compounds in the global physiological environment. However, for many compounds, the target pathways and mechanism of action are still unknown. In addition, rationally designed drugs may also have unexpected off-target effects. Therefore, the inference of drug-pathway associations is a crucial step to fully realize the potential of system-based pharmacological research. On the other hand, pathway activities are also reflected by the gene expression levels. We developed a new Bayesian sparse factor analysis model to jointly analyze the paired gene expression and drug sensitivity datasets measured across the same panel of samples. This model enables direct incorporation of prior knowledge regarding gene-pathway and/or drug-pathway associations to aid the discovery of new association relationships. A collapsed Gibbs sampling algorithm was implemented for inference. Satisfactory performance of the proposed factor analysis model has been achieved on both simulated datasets with various patterns, and the real datasets from the NCI-60 cell lines. Our study demonstrates that the combination of pathway analysis, gene expression and drug response is a promising approach for the prediction of drug targets. This model also provides a general statistical framework for pathway-based integrative analysis of multiple types of Omics data.
TOP


OPT08 - HotLink: Identifying causal paths linking genomic perturbations to expression states in cancer.
Date: Sunday, July 15: 11:45 a.m. - 12:10 p.m.Scientific Area: Systems Biology and Networks
Room: 104C

Presenting author: Evan Paull, University of California, Santa Cruz, United States

Additional authors:
Dan Carlin, University of California, Santa Cruz, United States
Josh Stuart, University of California, Santa Cruz, United States

Presentation Overview: Show/Hide
Samples from the same cohort are characterized by any number of genomic perturbations involving gene mutations, focal copy number gains and losses, and distinct promoter methylation events. One goal of cancer genomics is to connect these observed and imposed perturbations to the molecular changes that occur in cancer cells. Identifying genetic pathways activated in response to perturbations will lead to a mechanistic understanding of drug response and disease progression.

We have developed a method based on a heat-diffusion kernel approach that connects genomic perturbations to gene expression changes. The method computes a subnetwork solution that interconnects protein level data to gene expression level data using protein-protein interactions, predicted transcription factor to target connections, and curated interactions from literature. Permutation-based analysis is then used to gauge the significance of the solutions resulting from the HotLink network.

We have applied our method to four Cancer Genome Atlas (TCGA) datasets including glioblastoma multiform, ovarian cysadenocarcinoma, colorectal, and breast and found that the method identifies the expected major pathways in these different tumor types. In the breast cancer TCGA dataset, our method identified a key signaling pathway through beta-catenin that explains MYC activity in basal tumors, as well as additional signaling pathways involved in the basal tumor phenotype. In each case, these pathways contain genes lacking any genomic perturbation data, and can only be identified with a pathway based approach. In addition to uncovering these key genes, our results provide a mechanistic explanation of tumor behavior that may suggest subtype-specific drug targets.
TOP


OPT09 - Reconstructing targetable pathways in lung cancer
Date: Sunday, July 15: 11:45 a.m. - 12:10 p.m.Scientific Area: Systems Biology and Networks
Room: 104C

Presenting author: Alejandro Balbin, University of Michigan, United States

Additional authors:
John Presner, University of Michigan, United States
Anirban Sahu, University of Michigan, United States
Sunita Shankar, University of Michigan, United States
Anastasia Yocum, University of Michigan, United States
Mohan Dhanasekaran, University of Michigan, United States
Xuhong Cao, University of Michigan, United States
Alexey Nesvizhskii, University of Michigan, United States
Arul Chinnaiyan, University of Michigan, United States

Presentation Overview: Show/Hide
Signaling networks are frequently perturbed in cancer cells, and their aberrant activity leads the cancer initiation and progression. Although, oncogenic pathways have been extensively characterized, in many cases as in the KRAS oncogenic pathway, the specific network of effector proteins that drives carcinogenesis in a particular tissue is still far from being understood. In this work, we studied the KRAS signaling pathway in KRAS dependent non-small lung cancer cell lines (NSCLC) by integrating transcriptome, proteome and phosphoproteome. Gene expression, protein abundance and protein phosphorylation were quantified for each of 12 cell lines using RNA-sequencing (RNA-seq) and label-free quantitative tandem mass spectrometry (LC-MS/MS) respectively. In order to reconstruct active and targetable networks associated with KRAS dependency, we formulated this network reconstruction task as a Prize Collecting Steiner Tree (PCST) problem allowing us to synthesize transcriptome, proteome and phosphoproteome signatures with a human protein interaction network derived from public repositories. The network reconstruction formulation have several advantages when compared with traditional pathways enrichment methods: it uses the topology of the network; it can find hidden modules or nodes relevant for the network which were not directly measured and it does not require very large data sets in order to reconstruct the network as it is the case with network inference methods. By using the above strategy we are suggesting a druggable pathway (MET, LCK, PAK1) that is active in the KRAS-Dependent but not in the KRAS-Independent phenotype, and so defining new potential druggable targets for treating KRAS dependent lung cancer.
TOP


OPT10 - PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis
Date: Sunday, July 15: 12:15 p.m. - 12:40 p.m.Scientific Area: Bioinformatics of Health and Disease
Room: 104C

Presenting author: Sam Ng, University of California Santa Cruz, United States

Additional authors:
Eric Collisson, University of California San Francisco, United States
Theodore Goldstein, University of California Santa Cruz, United States
Abel Gonzalez-Perez, Parc de Recerca Biomedica de Barcelona, Spain
Nuria Lopez-Bigas, Universitat Pompeu Fabra, Spain
Christopher Benz, Buck Institute, United States
David Haussler, University of California Santa Cruz, United States
Joshua Stuart, University of California Santa Cruz, United States

Presentation Overview: Show/Hide
The major mechanism by which cancer arises is through somatic mutations. This can lead to alterations in gene regulation and changes in protein structure and function. It is critical to distinguish mutations that have an important role – driver mutations – from unimportant ones – passenger mutations. Differentiating driver and passenger events is essential for understanding cancer disease mechanisms, which can help guide treatment decisions as well as identify novel targets for treatment.

We are developing a method called PARADIGM-SHIFT based on integrated pathway analysis to discriminate loss-of-function, neutral, and gain-of-function mutations. Utilizing regulatory interactions annotated for a given gene, we can detect a shift in the downstream effects of an altered gene compared to what is expected. We show that a score based on this shift is highly predictive of the presence of a mutation and that the directionality of this shift also reflects the gain- or loss-of-function.

Application of our method to a set of known driver mutations reveals that there is a significantly strong signal for loss- and gain- of functional mutations in the surrounding network, demonstrating the sensitivity of this approach. In addition, when applied to the negative control of passenger mutations, the method predicts little pathway impact, indicating this approach also has high specificity. Application of this approach to recurrent mutations in cancers from the TCGA project identifies several important driver mutations across these cohorts. We also highlight the novel utility of this specific approach by comparison to earlier published approaches including SIFT, PolyPhen2, and MutationAssessor.
TOP


OPT11 - Network-based Survival Analysis on Ovarian Cancer
Date: Sunday, July 15: 12:15 p.m. - 12:40 p.m.Scientific Area: Bioinformatics of Health and Disease
Room: 104C

Presenting author: Rui Kuang, University of Minnesota, United States

Additional authors:
Wei Zhang, University of Minnesota Twin Cities , United States
Takayo Oat, Mayo Clinic College of Medicine, United States
Jeremy Chien, Mayo Clinic College of Medicine, United States
Viji Shridhar, Mayo Clinic College of Medicine, United States
Baolin Wu, University of Minnesota Twin Cities , United States

Presentation Overview: Show/Hide
This poster is based on Proceedings Submission 183. Survival analysis is commonly used to predict the time to an event of interest and identify relevant features in cancer genomics studies. Existing survival models suffer from the high-dimensionality and strong dependence in genomic features, and often lead to inconsistent relevant features across independent datasets for similar studies. We investigate a network-based cox proportional hazards model called Net-Cox to cope with the dependence and high-dimensionality by exploring the structural relation among the genomic features in a network. In this study, we focused on studying the survival and recurrence in ovarian carcinoma since there is no available molecular signatures that can predict the events. We applied Net-Cox on three independent ovarian cancer gene expression datasets including the TCGA ovarian cancer dataset that only became available recently. In the analysis of the three ovarian cancer datasets, Net-Cox with the network information from gene co-expression or known gene relations can significantly improve the accuracy of survival prediction over the Cox model in cross-validation on the same dataset or across the three datasets. Net-Cox also identified more consistent relevant genes across the three independent datasets and in addition, the top-ranked genes compose dense protein-protein interaction sub-networks and enrich known cancer pathways. Our literature survey confirmed 16 signature genes with supporting evidences. We further validated with tumor array in an independent patient cohort from Mayo Clinics that FBN1, the gene ranked highest by Net-Cox, is a signature that could predict recurrence after 12 months of treatment. Availability:http://compbio.cs.umn.edu/Net_Cox/.
TOP


OPT12 - Integrating Many Co-Splicing Networks to Reconstruct Splicing Regulatory Modules
Date: Sunday, July 15: 12:15 p.m. - 12:40 p.m.Scientific Area: Systems Biology and Networks
Room: 104C

Presenting author: Chao Dai, University of Southern California, United States

Additional authors:
Wenyuan Li, University of Southern California, United States
Juan Liu, Wuhan University, China
Xianghong Zhou, University of Southern California, United States
Xianghong Zhou, University of Southern California, United States

Presentation Overview: Show/Hide
Alternative splicing is a ubiquitous gene regulatory mechanism that dramatically increases the complexity of the proteome. However, the mechanism for regulating alternative splicing is poorly understood, and study of coordinated splicing regulation has been limited to individual cases. To study genome-wide splicing regulation, we integrate many human RNA-seq datasets from Sequence Read Archive to identify splicing module, which we define as a set of cassette exons co-regulated by the same splicing factors. We have designed a tensor-based approach to identify co-splicing clusters that frequently appear across multiple conditions, thus very likely to represent splicing modules - a unit in the splicing regulatory network. In particular, we model each RNA-seq dataset as a co-splicing network, where the nodes represent exons and the edges are weighted by the correlations between exon inclusion rate profiles. We apply our tensor-based method to the 38 co-splicing networks derived from human RNA-seq datasets and indentify an atlas of frequent co-splicing clusters. We demonstrate that these identified clusters represent potential splicing modules by validating against four biological knowledge databases. The likelihood that a frequent co-splicing cluster is biologically meaningful increases with its recurrence across multiple datasets, highlighting the importance of the integrative approach. Co-splicing clusters reveal novel functional groups which cannot be identified by co-expression clusters, particularly they can grant new insights into functions associated with post-transcriptional regulation, and the same exons can dynamically participate in different pathways depending on different conditions and different other exons that are co-spliced.
TOP


OPT13 - Deciphering genomic alterations in colorectal cancer through subtype-specific driver networks
Date: Sunday, July 15: 2:30 p.m. - 2:55 p.m.Scientific Area: Systems Biology and Networks
Room: 104C

Presenting author: Jing Zhu, Vanderbilt University, United States

Additional authors:
Bing Zhang, Vanderbilt University, United States

Presentation Overview: Show/Hide
High-throughput genomic studies have identified thousands of genomic alterations in colorectal cancer (CRC). Distinguishing driver from passenger mutations is critical for developing rational therapeutic strategies. Because only a few transcriptional subtypes exist in previously studied tumor types (e.g. breast and ovarian), we hypothesized that highly heterogeneous genomic alterations may converge to a limited number of distinct mechanisms that drive unique cancer biology in different transcriptional subtypes. In this study, we seek to define transcriptional subtypes for CRC and to identify subtype-specific driver mutations and networks. Consensus clustering using a patient cohort with 1173 samples identified three transcriptional subtypes, which were validated in an independent cohort with 485 samples. Survival analysis demonstrated that each subtype was associated with statistically different prognosis. For each subtype, we mapped somatic mutation and copy number variation data onto an integrated signaling network and identified subtype-specific driver networks using a random walk-based strategy. For the subtype with the worst prognosis, we found that the driver network was enriched with genomic alterations in the Wnt signaling pathway and the VEGF signaling pathway. Consistently, Wnt targets were significantly enriched in the transcriptional signature of this subtype, as well as genes involved in biological processes regulated by these pathways such as “cell migration” and “blood vessel morphogenesis”. Functional correlation between inferred upstream driver networks and the downstream expression signatures were also observed for the other two subtypes. These results support the hypothesis stated above, and our work provides a general framework for identifying subtype-specific driver mutations and networks.
TOP


OPT14 - PAA - A New R Package for ProtoArray Data Analysis
Date: Sunday, July 15: 2:30 p.m. - 2:55 p.m.Scientific Area: Microarrays
Room: 104C

Presenting author: Michael Turewicz, Ruhr-University Bochum, Germany

Additional authors:
Maike Ahrens, Ruhr-University Bochum, Germany
Caroline May, Ruhr-University Bochum, Germany
Dirk Woitalla, Ruhr-University Bochum, Germany
Beate Pesch, Ruhr-University Bochum, Germany
Swaantje Casjens, Ruhr-University Bochum, Germany
Helmut E. Meyer, Ruhr-University Bochum, Germany
Christian Stephan, Ruhr-University Bochum, Germany
Martin Eisenacher, Ruhr-University Bochum, Germany

Presentation Overview: Show/Hide
Background: Protein microarrays like the ProtoArray® (Life Technologies, Carlsbad, California, USA) are used for autoimmune antibody screening studies to discover biomarker panels. For ProtoArray data analysis the software Prospector (provided by the ProtoArray vendor) is often used, because it comes with an advantageous feature ranking approach (“M score”). Unfortunately, Prospector provides no capabilities regarding multivariate feature selection, classification, manufacturing batch normalization and computational biomarker candidate validation.
Results: Therefore, we have adopted Prospector’s M score approach and implemented a new R package called Protein Array Analyzer (PAA) that provides these features and a complete data analysis pipeline. Besides ProtoArray data, PAA is also suitable for all other single color microarray data that comes in GenePix® results (gpr) file format. After optional data pre-processing and M score-based feature pre-selection a multivariate feature selection is performed. For this purpose, a backwards elimination (wrapper) approach (“gene shaving” using random forests for feature sub-group evaluation) has been implemented. To validate the performance of the selected protein features, a test set classification is performed. Furthermore, different plots and results files can be obtained to outline the data analysis results.
Conclusions: We propose the new R package PAA for protein microarray data analysis. PAA has been used to successfully analyse several different ProtoArray data sets (e.g. “Parkinson”, “Alzheimer”, “Amyotrophic Lateral Sclerosis”). Thereby, its suitability for protein microarray data analysis has been shown. Meanwhile PAA is the default tool for protein microarray analysis at our facility. The first publicly available version will be published in the next months.
TOP


OPT15 - Variants Affecting Exon Skipping Contribute to Complex Traits
Date: Sunday, July 15: 2:30 p.m. - 2:55 p.m.Scientific Area: Functional Genomics
Room: 104C

Presenting author: Younghee Lee, The University of Chiago, United States

Additional authors:
Eric Gamazon, University of Chicago, United States
Ellen Rebman, University of Chicago, United States
Yeunsook Lee, Iowa State University, United States
Sanghyuk Lee, Ewha Womans University, Korea, Rep
M. Eileen Dolan, University of Chicago, United States
Nancy Cox, University of Chicago, United States
Yves Lussier, University of Illinois at Chicago, United States

Presentation Overview: Show/Hide
DNA variants that affect alternative splicing and the relative quantities of different gene product transcripts have been shown to be risk alleles for some Mendelian diseases. However, for complex traits with low odds ratios for any single contributing gene or variant, very few studies have investigated splicing variants. The overarching goal of this study is to discover and characterize the role that variants affecting alternative splicing may play in the emergence of complex traits, which include a significant number of the common human diseases. Specifically, we hypothesize that single nucleotide polymorphisms (SNPs) in splicing regulatory elements can be computationally characterized to accurately identify variants affecting splicing, and that these variants may contribute to the etiology of complex diseases as well as inter-individual variability. We leverage high-throughput expression profiling to 1) experimentally validate our in silico identified skipped exons and 2) to characterize the molecular role of intronic genetic variations in alternative splicing events in the context of complex human traits and diseases. Furthermore, we propose that intronic SNPs play a role as genetic regulators within splicing regulatory elements and show that their associated exon skipping events often affect protein domains. We find that human complex trait-associated SNPs are enriched among intronic splicing enhancers. This finding raises the possibility that therapies targeting alternative splicing mechanisms may be of value in treating a disease.
TOP


OPT16 - IIIDB: A database for Isoform-Isoform Interactions and isoform network modules
    Cancelled
Date: Sunday, July 15: 3:00 p.m. - 3:25 p.m.Scientific Area: Machine Learning
Room: 104C

Presenting author: Chun-Chi Liu, University of Southern California, United States

Additional authors:
Yu-Ting Tseng, Institute of Genomics and Bioinformatics, Taiwan
Wenyuan Li, Molecular and Computational Biology, United States
Shihua Zhang, National Center for Mathematics and Interdisciplinary Sciences, China
Xianghong Zhou, University of Southern California, United States

Presentation Overview: Show/Hide
Knowledge of Protein-Protein Interactions (PPIs) is a key to understand diverse cellular processes and disease mechanisms. However, current PPI databases only provide low-resolution knowledge of PPIs, in the sense that "proteins" of currently known PPIs generally refer to "genes." In reality, a transcribed gene often can be spliced into multiple transcript isoforms, each of which, in turn, can be translated into a protein. It
is known that alternative splicing often impacts PPI by either directly affecting protein interacting domains, or by indirectly impacting other domains, which, in turn, impacts the PPI binding. Thus, proteins translated from different isoforms of the same gene can have different interaction partners. Due to the limitations of current experimental capacities, little data is available for PPIs at the resolution of isoforms, although such high resolution data is crucial to map pathways and to understand protein functions. To fill the gap, we systematically predicted genome-wide isoform-isoform interaction (III) by a logistic regression approach that integrates information from RNA-seq datasets, orthology, domain-domain interaction, and GO annotation. The results allowed us to develop the first III database, Isoform-Isoform Interaction Database (IIIDB). The IIIDB is a resource for studying human protein-protein interactions at the isoform resolution. It contains 74,408 high confidence predictions and 592,342 low confidence predictions. The IIIDB provides a new resource on human protein-protein interactions at the high resolution of transcript isoforms that can facilitate detailed understanding of protein functions and biological pathways. IIIDB is freely available at http://syslab.nchu.edu.tw/IIIDB.
TOP


OPT17 - Comparative Dynamic Transcriptome Analysis (cDTA) reveals mutual feedback between mRNA synthesis and degradation
Date: Sunday, July 15: 3:00 p.m. - 3:25 p.m.Scientific Area: Regulation
Room: 104C

Presenting author: Björn Schwalb, University of Munich, Germany

Additional authors:
Achim Tresch, University of Munich, Germany
Mai Sun, University of Munich, Germany
Daniel Schulz, University of Munich, Germany
Patrick Cramer, University of Munich, Germany
Nicole Pirkl, University of Munich, Germany
Stefanie Etzold, University of Munich, Germany
Laurent Lariviere, University of Munich, Germany
Kerstin Maier, University of Munich, Germany
Martin Seizl, University of Munich, Germany

Presentation Overview: Show/Hide
To monitor eukaryotic mRNA metabolism, we developed comparative Dynamic Transcriptome Analysis (cDTA). cDTA provides absolute rates of mRNA synthesis and decay in Saccharomyces cerevisiae (Sc) cells with the use of Schizosaccharomyces pombe (Sp) as internal standard. cDTA uses non-perturbing metabolic labeling that supersedes conventional methods for mRNA turnover analysis. cDTA reveals that Sc and Sp transcripts that encode orthologous proteins have similar synthesis rates, whereas decay rates are five-fold lower in Sp, resulting in similar mRNA concentrations despite the larger Sp cell volume. cDTA of Sc mutants reveals that a eukaryote can buffer mRNA levels. Impairing transcription with a point mutation in RNA polymerase (Pol) II causes decreased mRNA synthesis rates as expected, but also decreased decay rates. Impairing mRNA degradation by deleting deadenylase subunits of the Ccr4-Not complex causes decreased decay rates as expected, but also decreased synthesis rates. Extended kinetic modeling reveals mutual feedback between mRNA synthesis and degradation that may be achieved by a factor that inhibits synthesis and enhances degradation. cDTA is provided with a statistical methodology and all required bioinformatics steps that allow the accurate absolute quantification and comparison of RNA turnover. cDTA can be applied to reveal rate changes for all kinds of perturbations, e.g. in knock-out or point mutation strains, as responses to stress stimuli or in small molecule interfering assays like treatments through miRNA or siRNA inhibitors. The cDTA approach is in principle applicable to virtually every organism.
TOP


OPT18 - An accurate and efficient discriminative motif discovery method for large ChIP-seq datasets
Date: Sunday, July 15: 3:00 p.m. - 3:25 p.m.Scientific Area: Regulation
Room: 104C

Presenting author: Chuanbin Du, The University of North Carolina at Charlotte, United States

Additional authors:
Zhengchang Su, The University of North Carolina at Charlotte, United States

Presentation Overview: Show/Hide
The ChIP-seq has become a powerful and efficient methods to study protein-DNA interactions during gene transcriptional regulation. However, it has been a major challenge to identify the cis-regulatory elements in a ChIP-seq dataset that typically contains thousands of sequences as many popular motif discovery methods do not scale well to such large sequences datasets and lack ability to find multiple co-factor motifs. In this work, we developed an accurate and efficient discriminative motif discovery method to address these issues. In our approach, a k-mer enumeration based technique is firstly used to enrich over-presented motifs, thereby reduce the space of potential motifs, and then position weight matrices (PWMs) are constructed and updated by using a modified Gibbs sampling strategy. When tested on large simulated and real biological benchmark datasets, our method is able to very quickly detecting binding motifs of various lengths for the ChIP-ed TF and co-factors simultaneously, and it outperforms some commonly used motif discovery tools, e.g. DREME in both accuracy and efficiency.
TOP


OPT19 - Faster and More Accurate Sequence Alignment with SNAP
Date: Sunday, July 15: 3:30 p.m. - 3:55 p.m.Scientific Area: Sequence analysis
Room: 104C

Presenting author: Matei Zaharia, University of California, Berkeley, United States

Additional authors:
William Bolosky, Microsoft, United States
Kristal Curtis, University of California, Berkeley, United States
Armando Fox, University of California, Berkeley, United States
David Patterson, University of California, Berkeley, United States
Scott Shenker, University of California, Berkeley, United States
Ion Stoica, University of California, Berkeley, United States
Richard Karp, University of California, Berkeley, United States
Taylor Sittler, University of California, San Francisco, United States

Presentation Overview: Show/Hide
As the cost of DNA sequencing continues to drop faster than Moore's Law, there is a growing need for tools that can efficiently analyze larger bodies of sequence data. By mid-2013, sequencing a human genome is expected to cost $1000, at which point this technology enters the realm of routine clinical practice. For example, it is expected that each cancer patient will have their genome and their cancer's genome sequenced. Assembling and interpreting the short read data produced by sequencers in a timely fashion, however, is a significant challenge, with current pipelines taking thousands of CPU-hours per genome.

Here, we address the first and most expensive step of this process: aligning reads to a reference genome. We present the Scalable Nucleotide Alignment Program (SNAP), a new aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. Unlike recent aligners that use graphical processing units (GPUs), SNAP runs on commodity processors. Furthermore, whereas existing fast aligners limit the number and types of differences from the reference genome they allow per read, SNAP supports a rich error model and can cheaply match reads with more differences. This gives it up to 2x lower error rates than current tools and lets it match classes of mutations, such as longer indels, that these tools miss.

Today, SNAP can align a human genome in 1.5 hours on a 16-core machine, compared to 1.5 days for BWA, while offering higher accuracy. In addition, the algorithm scales well to upcoming long-read technologies.
TOP


OPT20 - Multiscale Representation of Genomic Signals
Date: Sunday, July 15: 3:30 p.m. - 3:55 p.m.Scientific Area: Genomics
Room: 104C

Presenting author: Theo Knijnenburg, Netherlands Cancer Institute, Netherlands

Additional authors:
Lodewyk Wessels, Netherlands Cancer Institute, Netherlands
Ilya Shmulevich, Institute for Systems Biology, United States
Stephen Ramsey, Seattle Biomedical Research Institute, United States

Presentation Overview: Show/Hide
In the genome, information is encoded on a wide range of spatial scales. Functional genomic regions can be on the order of base pairs (bps), for e.g. transcription factor binding sites, up to Mbps for nuclear lamina associated domains. As a consequence, measurements derived from the genome will exhibit structure at different spatial scales, a fact that should be taken into account when analyzing such data. In this work, we present a fundamentally new approach to analyze genomic signals at different spatial scales. Genomic signals are defined as quantitative measurements as a function of genomic position and include DNA sequence based data, such as CG-content, as well as (epi-)genomic measurements, such as ChIP-seq data.
We developed a multiscale segmentation method to obtain the multiscale representation (MSR) of a genomic signal. The MSR is a representation of signal enrichment and depletion as a function of spatial scale and genomic position. We applied this approach to a variety of genomic signals in the mouse, including intra-species sequence conservation data, GC-content and ChIP-seq data of TFs, RNA polymerase II and epigenetic marks. The MSR offers a novel way to summarize and visualize the information content across spatial scales. Using correlation analysis and genomic annotation, we demonstrate that a genomic signal indeed contains functional information at multiple scales. This multiscale information can be employed to accurately predict gene expression and function. Using a machine learning framework, we show substantially improved prediction accuracy when compared to approaches that analyze genomic signals at a single scale.
TOP


OPT21 - A Robust Linear Framework for Transcript Quantification using MultiSplice Features.
Date: Sunday, July 15: 3:30 p.m. - 3:55 p.m.Scientific Area: Genomics
Room: 104C

Presenting author: Yan Huang, University of Kentucky, United States

Additional authors:
Yin Hu, University of Kentucky, United States
Corbin Jones, University of North Carolina at Chapel Hill, United States
James MacLeod, University of Kentucky, United States
Derek Chiang, University of North Carolina at Chapel Hill, United States
Yufeng Liu, University of North Carolina at Chapel Hill, United States
Jan Prins, University of North Carolina at Chapel Hill, United States
Jinze Liu, University of Kentucky, United States

Presentation Overview: Show/Hide
The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoform from which they were sampled so that in some conditions the quantification problem is not identifiable, i.e. lacks a unique solution. We develop a generalized linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific bias. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.
TOP


OPT22 - Fishing for Virulent Factors: Machine Learning Predictions and Experimental Validations of Bacterial Effectors
Date: Sunday, July 15: 4:00 p.m. - 4:25 p.m.Scientific Area: Machine Learning
Room: 104C

Presenting author: David Burstein, Tel Aviv University, Israel

Additional authors:
Tal Zusman, Tel Aviv University, Israel
Ziv Lifshitz, Tel Aviv University, Israel
Michael Peeri, Tel Aviv University, Israel
Gil Segal, Tel Aviv University, Israel
Tal Pupko, Tel Aviv University, Israel

Presentation Overview: Show/Hide
Numerous pathogenic bacteria exert their function by translocating a set of proteins, termed effectors, into the cytoplasm of their host cell. The primary goal of this study was to identify novel effectors in a genomic scale, towards a better understanding of the molecular mechanisms of bacterial pathogenesis. We applied a machine learning approach for the detection of effectors in the intracellular pathogen Legionella pneumophila, the causative agent of the Legionnaires' disease, a severe pneumonia-like disease. Our approach is based on the combination of several classification algorithms trained on a variety of features collected on a genomic scale. We applied this methodology to predict and experimentally validate dozens of new effectors. Notably, our computational predictions had a high accuracy rate of over 90%. Having a large pool of identified effectors, we studied the signals that enable the secretion of effectors. We have implemented a hidden semi-Markov model (HSMM) to characterize regions that are recognized by the bacterial secretion machinery. Using the HSMM we were able to detect novel effectors in different species of Legionella, as well as in Coxiella burnetii, an extremely infectious pathogen and a potential bio-terrorism agent. Based on the HSMM we were able to synthesize, for the first time, an artificial secretion signal, and experimentally prove its translocation. Furthermore, we are using similar machine learning approaches to identify pathogenic determinants in several other pathogens, including the food-borne Salmonella enterica, the plant pathogen Xanthomonas campestris, and Pseudomonas aeruginosa – the predominant respiratory pathogen in cystic fibrosis (CF) patients.
TOP


OPT23 - Sources of Experimental Function Annotation in UniProt-GOA, Implications for Function Prediction
Date: Sunday, July 15: 4:00 p.m. - 4:25 p.m.Scientific Area: Structure and Function Prediction
Room: 104C

Presenting author: Alexandra Schnoes, University of California San Francisco, United States

Additional authors:
Alexander Thorman , Miami University, United States
Iddo Friedberg , Miami University, United States

Presentation Overview: Show/Hide
Computational protein function prediction programs rely upon well-annotated databases for testing and training their algorithms. These databases, in turn, rely upon the work of curators to capture experimental findings from the scientific literature and apply them to protein sequence data. However, due to high-throughput experimental assays, it is possible that a small number of experimental papers could dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the “few papers – many proteins” bias. We examine the annotation of experimental protein function in the UniProt Gene Ontology Annotation project (GOA), and show that the distribution of proteins per paper is exponential, with a small number of papers contributing a large number of annotations. We additionally investigate the impact that this bias has on the available function annotations per species. We find that for several important model species, a significant fraction of the annotations available are provided by only a few dominant papers. Given that most high-throughput techniques can find only one (or a small group) of functions, it appears that some level of experimental protein function annotation bias is unavoidable. We discuss how this bias affects our view of the protein function universe, and consequently our ability to predict protein function. Knowing that this bias exists and understanding its extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.
TOP


OPT24 - Evolution of function in the alkaline phosphatase superfamily
Date: Sunday, July 15: 4:00 p.m. - 4:25 p.m.Scientific Area: Evolution
Room: 104C

Presenting author: Alan Barber, University of California, San Francisco, United States

Additional authors:
Jonathan Lassila, Stanford University, United States
Helen Wiersma-Koch, Stanford University, United States
Daniel Herschlag, Stanford University, United States
Michael Hicks, University of California, San Francisco, United States
Patricia Babbitt, University of California, San Francisco, United States

Presentation Overview: Show/Hide
Mechanistically diverse enzyme superfamilies are composed of evolutionarily related enzymes that share common mechanistic features yet catalyze different chemical reactions. The study of the evolution of these mechanistically diverse enzyme superfamilies provides a unique opportunity to understand how nature has modified specific catalytic scaffolds to catalyze numerous enzymatic reactions. The alkaline phosphatase (AP) superfamily provides an especially compelling system because its founding member, AP, is a prototypic phosphoryl transfer catalyst and its mechanism is well characterized. Homologues of AP catalyze a range of phosphoryl and sulfuryl transfer reactions including phosphatases, sulfatases, phosphodiesterases and phosphomutases.

Protein similarity networks (PSNs) are graphical representations of sequence, structural, and other types of similarities among a group of proteins in which pairwise all-by-all similarity connections are calculated. For example, nodes can be used to represent one or more protein sequences or structures and edges drawn between two nodes represent some measure of their similarity. Mapping biological information to network nodes or edges enables hypothesis creation about sequence-structure-function relationships across entire sets of related proteins.

We present an investigation of the AP superfamily using PSNs to create an evolutionary model for their divergence from a common ancestor, which is supported from phylogenetic analysis. This model demonstrates the elaboration of an ancient minimal precursor structure to produce the various contemporary reactions of known and unknown function. This work has applications in selection of targets for structural genomics, automated function prediction, and enzyme engineering.
TOP