HOME

Tweets by @ISMBinfo

Accepted Posters

Attention Conference Presenters - please review the Speaker Information Page available here.

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category M - 'Proteomics'

M01 - Primate Transcript and Protein Expression Levels Evolve under Compensatory Selection Pressures

Zia Khan, University of Maryland, United States

Short Abstract: Due to the technical and computational challenges of conducting comparative, genome-scale proteomics, essentially all studies of gene regulatory evolution across primates and other mammals have focused on mRNA levels rather than protein levels. Yet, proteins perform much of the work of the cell and are subject to regulation not revealed by mRNA levels alone. Using quantitative mass spectrometry and novel computational analysis methods, we obtained thousands of comparative mRNA and protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines. We used data from all three species to identify genes whose regulation might have evolved under natural selection, and considered jointly, our data allowed us to identify genes where lineage-specific changes might specifically affect post-transcriptional or post-translational regulation. Our analyses indicate that on an evolutionary timescale, there is surprising flexibility in primate mRNA levels, as these changes are often either buffered or compensated for at the protein level.

M02 - Interpreting copy number alterations in colon and rectal cancer though integrative proteogenomic analysis

JING WANG, Vanderbilt University, United States

Short Abstract: The Cancer Genome Atlas (TCGA) has identified 17 regions of significant focal amplification and 28 regions of significant focal deletion in colon and rectal cancers (CRCs). However, candidate driver genes for most of these regions remain unclear. Several studies have integrated mRNA expression and copy number alteration (CNA) data to identify candidate driver genes in focal alteration regions. However, it is uncertain to what extent the prioritized genes are supported by protein expression. The proteomics data for 90 TCGA CRC tumor samples generated by the Vanderbilt Center of the Clinical Proteomics Tumor Analysis Consortium (CPTAC) offered the opportunity to answer this question.
We first examined the degree of concordance between mRNA and protein variation of individual genes across all tumors. We found that mRNA abundance did not reliably predict protein expression differences between tumors. Then, we compared the impact of CNAs on mRNA and protein expression, including both cis- and trans-effects. Although CNAs showed strong cis- and trans-effects on mRNA expression, relatively few of these extended to the protein level. These results imply that integrating proteomics data can help narrow down candidate driver genes.
The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels. HNF4A in this region was not only relatively highly expressed but also demonstrated significant CNV-mRNA, CNV-protein, and mRNA-protein correlations. Using shRNA knockdown data, we further established an important role for HNF4A in CRC, which illustrate the value of the approach to understand roles of CNAs in cancer.

M03 - Detecting biomarkers on label-free mass spectrometry data using biclustering

Andreas Mitterecker, Johannes Kepler University, Austria

Short Abstract: Mass spectrometry (MS) is a major tool in proteomics that is evolving at a rapid pace. Significant advances in instrumentation lead to a high-throughput resource field lacking of suitable data driven analysis tools. Major goals in this area involve the detection of reliable biomarkers and their quantitation.

To tackle these challenges we propose a novel unsupervised approach utilizing the FABIA biclustering algorithm. The core application is to use the algorithm on MS level 1 data that is preclustered by retention time in order to find similar spectra over all samples. FABIA looks for samples as well as retention times that show similar patterns of m/z ratios. On the one hand the obtained biclusters facilitate the alignment of retention times and on the other hand they help to detect informative biomarkers. In a next step the results can further be utilized for protein quantitation.

We show that our approach outperforms competing methods on benchmark data sets and therefore conclude that pivotal contributions to the detection of differentially expression proteins and their quantitation could be made.

M04 - Optimization of statistical methods for proteome data mining

Laura Elo, University of Turku, Finland

Short Abstract: Quantitative label-free mass spectrometry is a rapidly growing technique to determine the relative quantity of proteins in a sample. Meaningful interpretation of the quantitative data is required to extract useful information, e.g., to compare healthy and diseased samples. Although several efficient methods are available for analyzing gene expression microarray data, a more limited selection of tools is currently available for proteome data mining. Furthermore, as the field rapidly innovates, consensus about the best practices are not apparent; which makes the choice of ‘best practice’ difficult. To assist in this choice we compared and optimized statistical testing for the analysis of proteome data. The primary goal was to optimize detection of differentially expressed proteins through a data-adaptive procedure that learns an appropriate statistics directly from the data. To systematically compare the methods, spiked yeast data with different concentrations of Universal Proteomics Standard proteins (UPS1, equimolar amounts of 48 proteins) were used. The performance of statistical tests was assessed in terms of identification of differentially expressed UPS proteins and number of false positive detections. Five statistical methods were evaluated: our reproducibility-optimized test statistic (ROTS), ordinary t-test, significance analysis of microarrays (SAM), rank product, and linear models for microarray data (limma). The extensive statistical analyses carried out in this study revealed that ROTS can improve the reliability of detecting differential expression in proteome data. In particular, ordinary t-test performed poorly with small sample sizes; SAM detected many false positives in our data; whereas rank product failed to detect several true positives as significant.

M05 - Sequence and Structure Analysis of Tissue-Specific Lysine Acetylation Sites

Nermin Pinar Karabulut, Technische Universitaet Muenchen, Germany

Short Abstract: Lysine acetylation is a well-studied reversible post-translational modification that regulates a broad spectrum of biological activities across various cellular compartments, cell types, tissues, and disease states. While compartment-specific trends in lysine acetylation have recently been investigated by Lundby et al., its tissue-specific preferences remain unexplored. Here we present a comprehensive analysis of sequence and structural features of lysine acetylation sites (LAS) based on the experimental data of Lundby et al. and known three-dimensional structures of proteins. We show that acetylated substrates are characterized by both tissue-specific sequence motifs and tissue-specific 3D structural environments. We further demonstrate that LAS in different tissues have different preferences for disordered regions. In particular, we find that LAS in the brain reside in disordered regions much more frequently than in other tissues while in the testis fat they tend to occur in structurally ordered regions. LAS also have a strong preference to reside in α-helices in all tissues except pancreas. Using the top level of the SCOP classification, we show that the occurrence of LAS in protein structural classes also correlates with tissue types - lung, liver, heart etc. harbor LAS preferentially residing in all-α proteins while in muscle, testis fat and brown fat LAS are found in α/β proteins. On the other hand, the analysis of domain structures reveals that LAS are enriched in tissue-specific protein domains. KEGG pathway analysis supports this finding and indicates that LAS indeed have diverse cellular functions in different tissues.

M06 - Analyzing and visualizing interactome data

James Knight, Lunenfeld-Tanenbaum Research Institute, Canada

Short Abstract: Large- and sometimes small-scale protein-protein interaction studies can present challenges for both correctly analyzing and especially visualizing data. Newer techniques for identifying protein-protein interactions such as BioID magnify these challenges. BioID is a technique for identifying protein interactors through in vivo biotinylation, and not only can direct, stable interactors be identified for a bait protein of interest, but transient/weak interactors as well, and even proteins that simply colocalize with the bait. Since there are generally more “interactors” identified for BioID relative to classical AP-MS techniques, even a network generated from a small number of baits can be quite dense and hard to interpret, necessitating appropriate methods for visualization. With this type of approach it would also be ideal to differentiate between direct interactors, indirect, strong, weak and neighbouring/colocalizing preys (i.e. neither direct nor indirect but simply vicinal). We have developed and applied approaches for efficiently distinguishing contaminants from true interactors when using BioID, clustering the resulting data, and visualizing large- and small-scale experiments. We have also made progress at distinguishing the various types of “interactors” that BioID can generate by using variations of the technique (for example, multiple lysis strategies) and by comparing results with complimentary but different methods for identifying interacting proteins. Through BioID and the techniques we are using to analyze the resulting data, we are in the process of elaborating and understanding the networks of a number of disease-related proteins as well as embarking on a larger project for mapping the entire interactome of the cell.

M07 - Meta-inference of protein-protein interactions using author collaboration networks

Jesse Lingeman, University of Massachusetts Amherst, United States

Short Abstract: Understanding protein-protein interaction networks is a critical task in bioinformatics. Researchers studying protein interaction tend to explore small groups of proteins. We hypothesize that new protein-protein links can be predicted by exploring which proteins related groups of researchers are publishing about. For example, if two research groups have a high overlap in which proteins they study and they each publish a paper about two different proteins, then a link between those two proteins would be weighted higher. As such, we present a novel approach where inference of links between proteins is augmented using a author collaboration network.

M08 - Improving protein and peptide identification in tandem mass spectrometry by peptide search space reduction

Avinash Shanmugam, University of Michigan, United States

Short Abstract: Tandem mass spectrometry (MS/MS) based shotgun proteomics is the method of choice for high throughput protein identification. It employs spectral matching algorithms and statistical models to identify the proteins present in the sample based on the MS/MS spectra generated. However these methods do not, in general, take into account any prior information available about the sample in their protein inference step. In this study, we investigate particularly the use of prior information to create a reduced search space for database searching. Results from this are also compared with previously developed methods of using prior information to re-score protein identifications.

Identification frequencies of all peptides mapped to human Ensembl proteins were retrieved from GPMdb. A reduced peptide search space database was created by filtering out rarely identified peptides and MS/MS data was searched against this reduced database.
The re-scoring method of incorporating previous information, which was also developed in our lab, uses bayes theorem to adjust identification probabilities of proteins based on GPMdb protein identification frequencies or RNAseq information. For both methods, MS/MS data was searched using the X!Tandem search engine and downstream processing was done using the Trans-Proteomic Pipeline (TPP). RNAseq data was processed using the Tophat aligner and custom R scripts using Bioconductor.
Preliminary data: Searching against a reduced search space database showed clear improvements in number of proteins identified over searching against the complete human proteome while still maintaining 1% FDR.

M09 - proBAM for proteomics data analysis, integration and visualization

Bing Zhang, Vanderbilt university, United States

Short Abstract: Recent advances of sequencing technologies has reformed our conception of genomic data analysis, storage and interpretation, instigating more research interest in exploring human proteome at a parallel scale. Shotgun proteomics holds this promise by surveying proteome both qualitatively and quantitatively. Over the last years large amount of proteomics data has been accumulated, an emerging demand is to combine these efforts to catalogue the wide dynamic range of protein expression and complexity of alternative isoforms. However, this task is daunting due to the fact that different studies use varying databases, search engines and assembly tools. Such a challenge calls for an efficient approach of integrating data from different proteomics studies and even with genomic data.
Here we propose a generic scheme that maps identified PSMs to the genome in BAM format, a binary format for efficient data storage and fast access in genomic research field. This method differs from other approaches because of its ability of building connections between peptide and genomic location and simultaneously maintaining spectra count information. PSMs are aligned under the same coordination framework regardless of the annotation systems (e.g. RefSeq, ENSEMBL) of the input proteomics data, which enables flexible protein assembly switch between different annotation or at different level (gene or protein). When genomic/transcriptomic information of the same individual is available, this approach allows the co-analysis with -omics data together.
We demonstrate the value of proBAM by confirming alternative splicing isoforms from combined proteomics data.We also discuss its potential applications in protein/gene assembly and genome-wide proteomics analysis.

M10 - Domain-centric cancer-type-specific landscapes of human missense somatic mutations

Fan Yang, University, Canada

Short Abstract: Understanding the functional impacts of mutations and selecting driver mutations is a bottleneck in the systematic cancer genome study. We depicted a novel domain view of cancer-type-specific missense somatic tumor mutation landscapes across 22 cancer types and identified domain position-based mutation hotspots in different cancers by combining functional and structural information of proteins. Here, each gene-specific protein domain of a given domain family will be described as a domain instance. We observed a cancer-type-specific mutation distribution bias among different domain instances encoded by a single gene, e.g. domain instances in EGFR and PI3KCA. The mutation distribution bias at the domain level indicates how a single gene can show distinct roles in different cancers. We revealed the significantly different domain-based distribution pattern of mutations between oncogenes and tumor suppressors. Oncogenic mutations tend to occur at several positions of the significantly mutated domain instances in a given cancer type. Instead of spreading through the whole tumor suppressors as previous reported, only a small fraction of mutations in a given cancer fall outside the corresponding significantly mutated domain instances. All the oncogenic mutations that are on the functional sites, e.g. ATP/GTP binding sites, are domain position-based mutation hotspots shared by more than five cancer types. Our study added functional and structural properties of mutations across 21 cancers and suggests the different mechanism of mutation hotspots in promoting cancer development. Here, we identified domain centric mutation-phenotype correlations to prioritize cancer mutations for further functional studies and precise cancer treatment.

View Posters By Category

Search Posters:

TOP