Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category M - 'Proteomics'
M01 - Correcting Bias in Mass Spectrometry-Based Protein Abundance Calculation
Shiyong Ma, , Australia
Jason Wong, Prince of Wales Clinical School and Lowy Cancer Research Centre, University of New South Wales, Australia
Short Abstract: Protein quantification has long been an interesting and paramount area in proteomics research. Nevertheless, studies have shown that while quantification is reasonably accurate when protein abundance is high, accuracy declines for lowly expressed proteins where few peptides can be detected. The aim of this study is to assess bias of existing protein quantification methods and design normalizing algorithms to improve quantification accuracy.

Using published mRNA and protein copy number data from NIH3T3 cells (Schwanhäusser et al. 2011), we found that proteins that have substantially lower copy number in relation to mRNA copy number have significantly more theoretically detectable peptides compared with proteins that have substantially higher copy number in relation to mRNA copy number. Application of a linear regression model to normalize this bias showed that Pearson correlation between mRNA and protein copy number of the whole dataset (4,211 proteins) can be improved from 0.6363 to 0.6405, demonstrating that the bias is present throughout the dataset.

To determine the contribution of theoretically detectable peptide number to protein abundance calculation, we further applied an Artificial Neural Network model to predict protein abundance using mRNA copy number, mRNA half-life, protein half-life, transcription and translation rate. Pearson correlation between predicted and observed protein abundance increases substantially from 0.92675 to 0.98886 when peptide number is considered.

Our research suggests that normalization of protein intensity using the number of theoretically detectable peptides in the iBAQ calculation introduces bias and we propose new methods that can be used to improve the accuracy of MS-based protein quantification.
TOP
M02 - Investigating the Fold Space in Membrane Proteins
Usman Saeed, TUM, Germany
Dmitrij Frishman, Department of Genome Oriented Bioinformatics, Technische Universität München, Germany
Short Abstract: Given the experimental difficulties in determining three-dimensional structure of trans-membrane proteins, sequence-based prediction methods continue to be the only option for structurally characterizing the vast majority of membrane proteins encoded in genomes. We present here CAMPS 3.0, a comprehensive update of the membrane protein fold space based on sequence clustering and structure prediction. CAMPS 3.0 features an over two-fold increase of data, covering over 1 million membrane proteins from 2432 genomes of bacteria, archaea, viruses, and eukaryotes. We identified 1524 structurally homogenous sequence clusters, of which only 74 clusters are associated with an experimentally determined three-dimensional structure. The domain content analysis of CAMPS clusters using CATH, SCOP and Pfam databases confirms that these clusters are indeed structurally homogenous. We estimate that approximately 1450 experimental structures are still required to obtain a reasonably complete structural coverage of trans-membrane folds.
TOP
M03 - Drug target prediction by thermal proteome profiling experiments
Dorothee Childs, EMBL Heidelberg, Germany
Mikhail Savitski, Cellzome GmbH, Germany
Holger Franken, Cellzome GmbH, Germany
Wolfgang Huber, EMBL Heidelberg, Germany
Short Abstract: Detecting the binding partners of a drug is one of the biggest challenges in drug research. Nevertheless, it is crucial to understand the drug’s effects on the molecular level in order to infer its mode of action, as well as potential reasons for side effects.

The recently introduced approach of thermal proteome profiling (TPP) addresses this question by combining the cellular thermal shift assay concept with mass spectrometry based proteome-wide protein quantitation. Thereby drug-target interactions can be inferred from changes in the thermal stability of a protein upon drug binding, or upon downstream cellular regulatory events, in an unbiased manner. However, the analysis of TPP experiments requires a chain of complex data analytic and statistical modeling steps for normalization and thermal stability assessment of each protein.

To facilitate this process, we developed the R package 'TPP‘ for analyzing thermal profiling experiments. Here, we highlight the statistical challenges concerning the data processing and analysis and demonstrate how they are currently addressed in the package. We hope that the availability of standardized and executable workflows will promote the adaptation of the powerful TPP method by the community, and aid drug discovery.
TOP
M04 - NextSearch: A search engine for mass spectrometry data against a compact nucleotide exon graph
Hyunwoo Kim, Hanyang Univ., Korea, Rep
Heejin Park, Hanyang Univ., Korea, Rep
Eunok Paek, Hanyang Univ., Korea, Rep
Short Abstract: Proteogenomics research has been using six-frame translation of the whole genome or amino acid exon graphs in order to overcome the limitations of reference protein sequence database. However, six-frame translation is not suitable for annotating genes that span over multiple exons, and amino acid exon graphs are not convenient to represent novel splice variants and exon skipping events between exons of incompatible reading frames. We propose an efficient proteogenomic pipeline NextSearch (Nucleotide EXon-graph Transcriptome Search) that is based on a nucleotide exon graph. Specifically, this pipeline consists of constructing a compact nucleotide exon graph that systematically incorporates novel splice variations, and a search tool that identifies peptides by directly searching the nucleotide exon graph against tandem mass spectra. Since our exon graph stores nucleotide sequences, it can easily represent novel splice variations and exon skipping events between incompatible reading frame exons. Searching for peptide identification from tandem mass spectra is performed against this nucleotide exon graph, without converting it into a protein sequence in FASTA format, achieving an order of magnitude reduction in the size of the sequence database storage. NextSearch outputs the proteome-genome/transcriptome mapping results in a general feature format (GFF) file, which can easily be visualized by public tools such as the UCSC Genome Browser. In addition, we developed a Hadoop/MapReduce version of NextSearch to speed up the search on a cluster of computers.
TOP
M05 - Active Data Canvas: a collaborative web-based visual analytic tool to link data to knowledge
Samuel Payne, PNNL, United States
Joon-Yong Lee, PNNL, United States
Ryan Wilson, PNNL, United States
Richard Smith, PNNL, United States
Nick Cramer, PNNL, United States
Short Abstract: Deriving knowledge from big data requires productive collaboration across disciplines; especially essential is the ability for non-computational scientists to apply their domain knowledge in data analysis. We present Active Data Canvas, a web-based visual analytic tool suite for multi-omic data. Active Data allows dynamic data interaction in familiar and intuitive contexts, like pathways and heatmaps, to facilitate user-driven data browsing. Gene Set Enrichment Analysis and other common statistical tests are performed on the fly to provide support for potential hypotheses. Users can track these clues by pinning data to the Canvas, a Pinterest style board for aggregating and assimilating thoughts. As data are pinned, the Canvas proactively searches structured and unstructured knowledge bases (e.g. KEGG, Pubmed, Wikipedia etc.) to fetch relevant information. In this manner it becomes a productive digital assistant, lessening the burden on the researcher and freeing time for contemplative analysis. The Active Data Canvas promotes collaboration by using GitHub as the back-end data store. Via the ambitious task of versioning data, analyses and emergent hypotheses, users who collaborate on a project can share discoveries in real time. We used the Active Data Canvas to analyze the CPTAC global proteomics dataset of ovarian cancer for 174 tumors. After data acquisition, quality control, and initial analysis, the R object was directly imported along with the clinical metadata. Multiple new hypotheses and conclusions were discovered via the interaction with the Active Data Canvas that had not been discovered in the months of analysis via statistical programming scripts.
TOP
M06 - FUSE: Multiple Network Alignment via Data Fusion
Noel Malod-Dognin, Imperial College London,
Vladimir Gligorijevic, Imperial College London,
Natasa Przulj, Imperial College London,
Short Abstract: Discovering patterns in protein-protein interaction networks (PINs) is a central problem in systems biology. Alignments between PINs aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. However, the complexity of the multiple network alignment problem grows exponentially with the number of networks being aligned, and the alignment methods proposed thus far are guided by pairwise scores that do not utilize the entire functional and topological information across all networks. Thus, designing a multiple network aligner that is both scalable and that produces biologically meaningful alignments is a challenging task that has not been fully addressed.

To overcome these weaknesses, we propose FUSE, a computationally efficient multiple network aligner that utilizes all functional and topological information in all PINs. It works in two steps. First, it computes novel similarity scores of proteins across the PINs by fusing from all aligned networks both the protein wiring patterns and their sequence similarities using Non-negative Matrix Tri-Factorization (NMTF). When applied to the five largest PINs from BioGRID, NMTF finds a larger number of protein pairs across the PINs that are functionally conserved than can be found by using protein sequence similarities alone. In the second step, FUSE uses a novel maximum weight k-partite matching approximation algorithm to find a multiple network alignment. We compare FUSE with the state-of-the-art multiple network aligners and show that it produces the largest number of functionally consistent clusters that cover all aligned PINs.
TOP
M08 - Integration of Cardiac Proteome Biology and Medicine by a Specialized Knowledgebase
Tevfik Umut Dincer, University of California, Los Angeles, United States
Vincent Kyi, University of California, Los Angeles, United States
Howard Choi, University of California, Los Angeles, United States
Brian Bleakley, University of California, Los Angeles, United States
Maggie P. Y. Lam, University of California, Los Angeles, United States
Edward Lau, University of California, Los Angeles, United States
Haomin Li, Zhejiang University, China
Peipei Ping, University of California, Los Angeles, United States
Short Abstract: The Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org) is a centralized platform featuring high-quality cardiac proteomics data and relevant cardiovascular phenotype information. Currently, COPaKB features 10 organellar modules, comprising 4,203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium from the Human Protein Atlas. These modules constitute the mass spectral library and are utilized by COPaKB’s novel high-performance search engine to identify and annotate proteins in the mass spectra files that are submitted by the user in mzML or DTA formats.

COPaKB aggregates and integrates data from prominent community resources and its own relational MySQL database, enabling cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest, including cardiovascular disease relevance, protein expression profiles and gene expression levels. Datasets in COPaKB are available to download in Excel XLS, XML and JSON formats and REST APIs that will enhance interoperability with other resources are in development. The APIs will also facilitate the incorporation of COPaKB into the community’s own bioinformatics workflows and other public resources.

With its intuitive interface, multi-resource integration capabilities and powerful bioinformatics tools, COPaKB empowers cardiac researchers within or outside of the proteomics field to dissect the molecular signatures of cardiovascular phenotypes and uncover hidden relationships. This unified knowledgebase aims to serve the community as a premier resource to advance cardiovascular biology and medicine.
TOP
M09 - Using the bacterial proteogenomic pipeline to refine peptide identifications and genome annotations
Julian Uszkoreit, Ruhr-University Bochum, Germany
Nicole Plohnke, Ruhr-University Bochum, Germany
Sascha Rexroth, Ruhr-University Bochum, Germany
Katrin Marcus, Ruhr-University Bochum, Germany
Martin Eisenacher, Ruhr-University Bochum, Germany
Short Abstract: Proteogenomics combines the cutting-edge methods from genomics and proteomics. While it has become cheap to sequence whole genomes, the correct annotation of protein coding regions in the genome is still tedious and error prone. Mass spectrometry on the other hand relies on good characterizations of proteins derived from the genome, but can also be used to help improving the annotation of genomes or find species specific peptides. Additionally, proteomics is widely used to find evidence for differential expression of proteins under different conditions, e.g. growth conditions for bacteria. The concept of proteogenomics is not altogether new, in-house scripts are used by different labs and some special tools for eukaryotic and human analyses are available.
The Bacterial Proteogenomic Pipeline, which is completely written in Java, alleviates the conducting of proteogenomic analyses of bacteria. From a given genome sequence, a naïve six frame translation is performed and, if desired, a decoy database generated. This database is used to identify MS/MS spectra by common peptide identification algorithms. After combination of the search results and optional flagging for different experimental conditions, the results can be browsed and further inspected. In particular, for each peptide the number of identifications for each condition and the positions in the corresponding protein sequences are shown. Intermediate and final results can be exported into GFF3 format for visualization in common genome browsers.
TOP
M10 - ProDomAs; A Web Server for Protein Domain Assignment using Neural Network
Changiz Eslahchi, Shahid Beheshti University, Iran
Elnaz Saberi Ansari, Institute for Research in Fundamental Science (IPM), Iran
Changiz Eslahchi, Shahid Beheshti University, Iran
Short Abstract: Decomposition of proteins into structural domains is a challenging task in bioinformatics. It has a particular importance in protein structure classification, protein function prediction, protein fold recognition, homology identification in distantly related proteins, determination of proteins evolutionary relationships and other proteomics problems. Structural domains are considered as basic units of protein folding that are conserved through the evolution. Due to the exponential growth of known protein structures, the need for accurate automatic domain decomposition methods becomes more essential. Previously we have developed ProDomAs algorithm, for assigning structural domains using graph theoretical methods and center-based clustering approach. Results show that ProDomAs outperforms other automatic methods. Recently we have improved the performance of ProDomAs using neural network and a direct-search method. A neural network is trained to distinguish correct domains against incorrect ones and is considered as the stopping criterion to return final domains. Features that are used to train this network are the hydrophobic moment profiles of each domain and some information that is extracted from the structure of the protein chain. Further, since the thresholds of the previous algorithm are determined manually, a direct-search method is used by applying an error function on each level of the clustering to optimize the thresholds. The new version of ProDomAs achieves higher performance in comparison with the previous one. It is implemented as a webserver using C++, Perl and Php. ProDomAs web server can be used to decompose and visualize structural domains in 1D and 3D representations and is freely available at http://bs.ipm.ir/softwares/prodomas.
TOP
M11 - iPQF: A new peptide-to-protein summarization method improving iTRAQ quantification
Martina Fischer, Research Group Bioinformatics, Robert Koch-Institute, Germany
Bernhard Y Renard, Research Group Bioinformatics, Robert Koch-Institute, Germany
Short Abstract: Isobaric labelling techniques such as iTRAQ and TMT allow for simultaneous absolute and relative protein quantification in different samples within a single run. This enables investigation of changes in protein expression across different sample conditions, which is crucial for the study of regulation processes.
Measurements of label intensities are assessed at the peptide level and must be subsequently combined to estimate the corresponding protein ratios. Generally, all peptides mapped to a protein are assumed to share the same expression profile, however, large variance heterogeneity is observed due to random and systematic errors.
Several options exist to infer protein ratios, such as computing the median, the mean, or the sum of the peptide intensities using all peptides or the top five with highest intensity. More sophisticated strategies estimate noise models, use internal experimental variation or replicate information. However, so far, most approaches focus only on quantitative peptide information, while additional characteristics of peptides are available, which also reflect the overall reliability of measurements.
We propose a new data integration approach using diverse peptide features as well as quantitative information. Features considered are e.g. the number of unique and redundant peptides and their expression similarities, identification scores, sequence length, charge, or modifications of a peptide. We investigate the impact of these features and show how their integration results in more accurate protein quantification.
We evaluate our approach on three different published iTRAQ data sets with predefined protein fold-changes and highlight its value in a comparison to eight commonly used summarization methods.
TOP
M12 - Improving protein knowledge by Integrating proteomics data with UniProt Reference protein sets
Maria-Jesus Martin, EMBL-EBI,
Short Abstract: MS-based proteomics experiments rely in searching up-to-date, stable and complete protein sequence databases. UniProt provides a broad range of Reference protein sets for a large number of species, specifically tailored for an effective coverage of sequence space while maintaining a high quality level of sequence annotations and mappings to the genomic and proteomics information. The Reference proteome data set for human, which consists of 20,196 canonical and 38,619 isoforms manually curated protein sequences and includes translations from high-quality gene models in Ensembl and RefSeq for GRCh38. For each protein we provide the corresponding chromosomal and gene coordinates and variants from 1000 Genomes and COSMIC.

Data from high-throughput proteomics experiments constitute a rich source of annotations for UniProt, providing supporting evidence for the existence of specific protein isoforms and post-translational modifications. We have developed a method to identify experimental peptides in the PeptideAtlas and MaxQB proteomics repositories and to map them to proteins in UniProtKB. All proteins in UniProtKB/TrEMBL mapped to unique peptides have been annotated with a new Keyword 'Proteomics identification' to indicate that the protein sequence has been partially or completely confirmed using publicly available mass spectrometry based data from the above repositories. The complete list of experimental human peptides mapped to UniProtKB is now distributed via the FTP.

We present how genomics and MS based experimental proteomics data and results deposited in the main public repositories are flowing into UniProt to enrich protein sequence annotations providing an integrated view of existent protein knowledge for genes in Reference species.
TOP
M13 - Tandem mass spectrometry peptide fragment ion prediction by Hidden Markov Models
Jan Refsgaard, , Denmark
Christian Kelstrup, Copenhagen University, Denmark
Aybuge Altay, İzmir University, Turkey
Christian Grønbæk, Copenhagen University, Denmark
Jesper Velgaard Olsen, Copenhagen University, Denmark
Lars Juhl Jensen, Copenhagen University, Denmark
Short Abstract: Tandem Mass Spectrometry (MS/MS) peptide fragmentation is a complex process involving multiple competing fragmentation pathways. Accurate modeling of peptide fragmentation is essential for designing better peptide spectra match (PSM) algorithms. Here we propose a novel algorithm for inferring theoretical peptide MS/MS fragmentation spectra. The data-driven approach is made feasible by the increasing availability of MS/MS data. The varying lengths of peptides prove challenging for existing algorithms. Typical solutions include: 1) The use of a different model for each peptide length, hereby only training on a subset of the available data; an example of this approach is the Random Forest used in MS2PIP. 2) Extracting a fixed number of features from the sequences; this approach is used in the Artificial Neural Network employed in PeptideArt and in the boosting algorithm employed in PepNovo+. We instead make use of Hidden Markov Models, which are invariant to the sequence length. There is thus no need to train multiple models or to extract a fixed number of features from variable lengths of peptides. Our preliminary benchmark shows average Pearson and Spearman correlation coefficients of 0.651 and 0.645, respectively, between observed and predicted y-ions.
TOP
M14 - The Rostlab Metaproteome Analysis Pipeline (RMAP)
Guy Yachdav, TUM, Germany
Jonas Raedle, TUM, Germany
Diana Iaacob, TUM, Germany
Burkhard Rost, TUM, Germany
Short Abstract: The Rostlab Metaproteome Analysis Pipeline is an automatic tool that can help in extracting biological meaning from metaproteomics data. The pipeline integrates a host of prediction methods that annotate subcellular localization, protein disorder, transmembrane segments, GO terms, PFAM domains and secondary structure. In contrast to many other pipelines, the use of prediction methods enables the analysis of sparsely annotated datasets. We perform gene enrichment on predicted GO terms to find significant adaptations to the particular environment of the dataset. We then map over/underrepresented GO terms to interesting metabolic pathways to determine which molecular functions are impacted by these adaptations. We map the pathways to subcellular localization data to confirm that the predicted functional annotations are consistent with known experimental data. The functional analysis is weighed against protein domains data to lend additional credibility to our functional interpretation. Finally we also constructed a method that calculates an "adaptability measure" for the datasets at hand by looking at protein disorder state for each of the proteins expressed in the sets. As a test case we use data from a study of the effects of a high fat diet on the gut microbiome of mice, conducted by Daniel, Clavel et al.
TOP
M15 - Proteomic analysis of spatial tumor tissue heterogeneity
Li Li, University of Cologne, Germany
Andreas Beyer, University of Cologne, Germany
Short Abstract: Introduction

The surge of targeted cancer therapies necessitates understanding of intra-tumor heterogeneity at the molecular level. High-throughput technologies have revealed substantial intra-tumor heterogeneity at the DNA and transcript level. Although antibody-based studies have confirmed intra-tumor heterogeneity, it remains unclear to what extend protein levels vary between different tumor sites. Here, we present an analysis of intra-tumor heterogeneity using quantitative proteomic analysis.

Methods
Normal tissues and two prostate carcinoma subtypes (acinar and ductal) were histo-pathologically characterized in samples from three patients. Altogether, we obtained 30 tissue punches, each of which was divided into two aliquots and analyzed using pressure cycling technology-SWATH-MS (PCT-SWATH). In order to robustly quantify inter-punch variability of protein levels, we developed a customized analysis pipeline.

Results and Discussion
3101 proteins are acquired in each punch with high correlation of technical replicates (Pearson’s coefficient of correlation > 0.9). Our analysis pipeline estimates the biological variation among punches by evaluating the difference between total punch variance and technical variance. A strong dependence of the biological variance on the patient and less dependence on tissue type were observed. To distinguish proteins that are often variable inside the same tissue from robust proteins, we averaged the biological variances across patients and scored proteins. The biological variances of eight proteins were validated by Tissue Microarray Analysis in an independent cohort.

Conclusions
Biological variations of about 2000 proteins in 30 tumorous and non-tumorous prostate biopsies from three patients were quantified. We robustly identify spatially varying proteins and subsequently categorized proteins based on their variability in different tissue types.
TOP

View Posters By Category

Search Posters:


TOP