20th Annual International Conference on
Intelligent Systems for Molecular Biology


Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category N - 'Sequence Analysis'
N01 - Cell Subset Prediction for Blood Genomic Studies
Short Abstract: Background: Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs). In this case, accuracy is inherently limited since cell subset-specific differential expression will be diluted by RNA from other cells. While using specific PBMC subsets would improve our ability to extract knowledge from these data, it is rarely obvious which cell subset(s) will be the most informative.
Results: We have developed a computational method (Subset Prediction from Enrichment Correlation, SPEC) to predict the cellular source for a pre-defined gene list (i.e. a gene signature) using only data from total PBMCs. SPEC takes advantage of correlations of signature gene expression with subset-specific genes across a set of samples. Validation using multiple experimental datasets demonstrates that SPEC can accurately identify the source of a gene signature as myeloid or lymphoid, as well as differentiate between B cells, T cells, NK cells and monocytes. Using SPEC, we predict that myeloid cells are the source of the interferon-therapy response gene signature associated with HCV patients who are non-responsive to standard therapy.
Conclusions: SPEC is a powerful technique for blood genomic studies. It can help identify specific cell subsets that are important for understanding disease and therapy response. SPEC is widely applicable since only gene expression profiles from total PBMCs are required, and thus it can easily be used to mine the massive amount of existing microarray or RNA-seq data.
TOP
N02 - Correction of spatial bias in oligonucleotide array data
Short Abstract: Oligonucleotide microarrays allow for high-throughput gene expression profiling assays. The technology relies on the fundamental assumption that observed hybridization signal intensities (HSIs) for each intended target, on average, correlate with their target’s true concentration in the sample. However, systematic, non-biological variation from several sources undermines this hypothesis. Background hybridization signal has previously been identified as an important such source, one manifestation of which appears in the form of spatial autocorrelation. We propose an algorithm for the elimination of spatial autocorrelation in HSIs, exploiting the duality of desirable mutual information shared by probes in a common probe set and undesirable mutual information shared by spatially proximate probes. We show that this correction procedure reduces spatial autocorrelation in HSIs; increases HSI reproducibility across replicate arrays; increases differentially expressed gene detection power; and performs better than previously published spatial correction methods. The proposed algorithm thus increases both precision and accuracy, while requiring virtually no changes to users’ current analysis pipelines: the correction consists merely of a transformation of raw HSIs (e.g. CEL files for Affymetrix arrays). A free, open-source implementation is provided as an R package, compatible with standard Bioconductor tools. The approach may also be tailored to other platform types and other sources of bias.
TOP
N03 - Pathway activities in aggressive Non-Hodgkin lymphomas identified via Guided Clustering
Short Abstract: Aggressive non-Hodgkin Lymphomas (aNHL) are a group of heterogeneous malignancies derived from germinal center B (GC B) cells. aNHLs can be divided on the molecular level using gene expression profiling into distinct subgroups: molecular Burkitt lymphoma (mBL) and diffuse large B cell lymphoma (DLBCL). However the deregulated pathways that underlie the differences in lymphoma biology and to which extend these pathways are responsible for the different clinical outcome of lymphomas are still not well understood.

Therefore, we investigated gene expression modules which display the “activation status” of oncogenic pathways in aNHL.
Human transformed GC B cells in vitro were activated using distinct B cell paracrine stimuli by crosslink of the BCR, CD40L, IL21, BAFF or LPS and gene expression profiling was performed. We used a new method developed in our group called guided clustering (Maneck et al. 2011) to identify signaling modules which are uniquely regulated by one of the five stimuli and form clusters within the aNHL gene expression profiles.

We were able to identify two different signaling modules per stimuli out of which six were found to hold prognostic information. Most interestingly we could show that ABC (activated B cell-like) type lymphomas are characterized by an unbalanced BCR and CD40 signalling. Using these newly identified pathway modules we achieved a better understanding of the deregulated pathways involved in lymphoma biology and into the pathomechanisms underlying the so far poorly described “unclassified lymphoma” taking into account the need for new targeted therapies.
TOP
N04 - The Transcriptomic Landscape of Learning and Memory Formation
Short Abstract: Long-term memory reflects the persistent changes in the brain that result from learning. Several studies have shown that long term memory formation requires transcription and translation, and that this requirement is limited to defined “critical periods”. Genome-wide microarray studies in our lab 30 minutes after contextual fear conditioning show regulation of a substantial number of genes, validating the potential of a genome-wide approach to understand the transcriptional changes that underlie memory formation. In this study gene expression was examined using microarrays before contextual fear conditioning, during the established critical periods for memory consolidation (30 minutes, 4 and 12 and 24 hours after learning) and after memory retrieval in C57BL/6J mice. A similar time-course was performed without the learning experience to model the effect of circadian time. The study was randomized, collecting one sample per time-point per-day, for a total of n=9 mice per time-point. Normalization was carried out using affymetrix powertools and statistical analysis was performed using R. Our study shows that the biggest changes in gene expression happen 30 min after learning and after retrieval of memory. Up-regulated genes after acquisition and retrieval overlap greatly and are involved in transcriptional control. This was verified by q-PCR of known genes induced at 30 minutes. Interestingly, memory consolidation down-regulates chromatin assembly while retrieval down-regulates RNA processing. Almost no-transcriptional changes can be detected 4 and 12 hours after learning. In addition, several novel non-coding RNAs induced after memory formation and retrieval have been identified and selected for follow up studies.
TOP
N05 - Comprehensive analysis of Illumina Infinium 450K data
Short Abstract: DNA methylation is the most widely studied epigenetic mechanism and was shown to be essential to the normal development and cell differentiation. Aberrant methylation events have been linked to a variety of diseases.
Illumina Infinium 450K assay is a microarray-based technique for the detection of DNA methylation in human. The main advantages of the assay are its high quality and reproducibility, comparatively low cost, as well as its ability to quantify methylation as a fraction between 0 and 1.
We developed a flexible computational pipeline that provides comprehensive quality control and methylation analysis of Illumina Infinium450K assays. Our analysis pipeline extends current approaches to process Infinium data (such as Genome Studio and the methylumi R package) by providing downstream analysis and visualization. The four steps of the pipeline can be defined as quality control, filtering, profile inspection and identification of differential methylation. The first step provides an overall assessment of the quality of bisulfite conversion and hybridization. The second step involves filtering out unreliable probes and samples based on a set predefined criteria. The third step quantifies the intra- and intergroup variability in methylation. The fourth step performs a ranking of genomic regions based on the statistical significance of the differential methylation observed between two or more sample groups. We provide a customizable definition of differential methylation. The generated figures and tables in each step are summarized in HTML reports. The pipeline is fully implemented in R, making it platform-independent, as well as easy to install, use and modify.
TOP
N06 - PAA - A New R Package for ProtoArray Data Analysis
Short Abstract: Background: Protein microarrays like the ProtoArray® (Life Technologies, Carlsbad, California, USA) are used for autoimmune antibody screening studies to discover biomarker panels. For ProtoArray data analysis the software Prospector (provided by the ProtoArray vendor) is often used, because it comes with an advantageous feature ranking approach (“M score”). Unfortunately, Prospector provides no capabilities regarding multivariate feature selection, classification, manufacturing batch normalization and computational biomarker candidate validation.
Results: Therefore, we have adopted Prospector’s M score approach and implemented a new R package called Protein Array Analyzer (PAA) that provides these features and a complete data analysis pipeline. Besides ProtoArray data, PAA is also suitable for all other single color microarray data that comes in GenePix® results (gpr) file format. After optional data pre-processing and M score-based feature pre-selection a multivariate feature selection is performed. For this purpose, a backwards elimination (wrapper) approach (“gene shaving” using random forests for feature sub-group evaluation) has been implemented. To validate the performance of the selected protein features, a test set classification is performed. Furthermore, different plots and results files can be obtained to outline the data analysis results.
Conclusions: We propose the new R package PAA for protein microarray data analysis. PAA has been used to successfully analyse several different ProtoArray data sets (e.g. “Parkinson”, “Alzheimer”, “Amyotrophic Lateral Sclerosis”). Thereby, its suitability for protein microarray data analysis has been shown. Meanwhile PAA is the default tool for protein microarray analysis at our facility. The first publicly available version will be published in the next months.
TOP
N07 - Comparative analysis of gene expression pattern similarity between human and mouse by gene functions
Short Abstract: The difference of gene sequences is smaller than the difference of phenotypes between species. A mutation of coding region easily could be deathblow, while a mutation of non-coding region such as promoter region has smaller effect. Therefore, the evolution rate of non-coding region is faster than the rate of coding region. Since gene regulation region is more divergent than coding region, it’s important to compare gene expressions to understand differences between species. However, it’s hard to perform experiments in the same conditions. We can experiment in roughly same organs, but it’s very hard to do in same time-course or fine tissue. To overcome this condition problem, we compared coexpression patterns instead of gene expression patterns. Coexpression represents similarity of two gene expression patterns. We defined coexpression similarity as number of common genes in top N genes in a table of two coexpression tables.
In 14,604 homologous gene pairs between human and mouse in COXPRESdb, about 20% genes have lower expression similarities than random conservation. To understand which gene functions are conserved between human and mouse, we compare mean of coexpression similarities every gene ontology terms. We compared 14028 GO terms and found that 299 GO terms have significant differences (p≤0.05 with Bonferroni correction). We found housekeeping genes are well conserved between human and mouse, and that genes involved in nervous system and signal transductions are divergent. This observation suggests divergence of nervous system and signal transductions is one of key factors of evolution.
TOP
N08 - Pathway correlation profiles for the identification of pathway perturbation
Short Abstract: Identification of pathway perturbation is essential for understanding changes in biological processes within an experiment. Previously developed methods identify important pathways by calculating a pathway’s enrichment from a set of differentially expressed genes. However, these methods cannot account for small, coordinated changes in gene expression that amass across the whole pathway, nor identify important pathways when few if any genes are differentially expressed. In order to overcome these limitations, we use microarray gene expression data to identify pathway perturbation based on pathway correlation profiles. This method first identifies the correlation between gene-gene pairs within a specified KEGG pathway. Using this correlation distribution for a specified condition, pathway perturbation and dysregulation can be quantified through these changes in gene-gene pair correlation profiles. Pathways can then be ranked based on how these profiles change between two conditions. Using this method, we have successfully shown differences between two experimental conditions in Escherichia coli and changes within time series data in Saccharomyces cerevisiae. Our method made significant predictions as to the pathway perturbations that are involved in these experimental conditions. In addition, this method can be adapted for specific disease datasets as well as utilizing additional predefined gene ontologies.
TOP
N09 - Gene signature of papillary thyroid carcinoma induced by BRAF V600E mutation
Short Abstract: Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer, representing 80% of all thyroid cancer cases. BRAF V600E activating point mutation is the major genetic alternation of PTC, observed in 40-70% of cases, while RET rearrangements, the second major genetic factor, is exhibited only in 5-10% of cases. It was observed that presence of the BRAF mutation is associated with more aggressive disease and poor prognosis.
The aim of the study was to develop gene signature characteristic for PTC induced by BRAF V600E mutation. Additionally gene expression profiles of BRAF-associated PTC, RET/PTC, other PTCs and normal thyroid were characterized.
Mouse Gene Arrays (Affymetrix) were employed to analyse mouse model of PTC induced by V600E mutation, obtained in our laboratory. Totally, material from 21 mice was analysed, including 10 BRAF-associated PTC, 2 boarderline lesions and 9 healthy thyroids. Large microarray dataset of human samples, including 181 PTC and 71 normal thyroid samples (Giordano et al., Genrisk-T and Gliwice cohorts) were also used.
Almost 3000 genes differentiating PTC and control mice samples were revealed. To distinguish genes associated specifically with the presence of BRAF mutation, series of two-way ANOVA analyses were performed on human samples regarding presence or not of BRAF mutation and RET rearrangements.
The specific gene signature possessed by BRAF-positive PTCs was proved including genes responsible for poor diagnosis and oncogenes known from other cancers.
This project is supported by Polish MSHE grant no NN401612440.
TOP
N10 - Weighted Pricipal Component Analysis for Gene Set Enrichment Analysis
Short Abstract: In analyzing data obtained from Microarray experiments, Gene Set Enrichment Analysis (GESA) is an advantageous method compared to the traditional method that deals with genes individually. In contrast, GESA organizes genes into gene sets based on prior knowledge, such as KEGG pathway, Gene Ontology, and so on. In this way, subtle but coordinated changes in a given gene set can be more readily identified. Principal component analysis (PCA) is used to achieve standard dimension reduction, so that result of GESA can be better interpreted.

Our proposed methodology is to use the experimental data itself as guidance in conducting PCA. Specifically, when calculating the covariance of a given gene set, a weighted method will be used. The weighting coefficients, can be designed to reflect the correlation rj (or other ranking metric) of that particular gene. The formula to calculate is described below. Suppose one gene set has N genes, gj. The correlations of these genes are rj. We can define the weighting coefficients as cj=rj/∑rj. The total sum of variance that is used in PCA thus will be V=∑cj*var(gj) By utilizing this supervised weighting methodology on the data, we can get preliminary result.

References upon request
TOP
N11 - Cross-experimental Gene Expression Analysis on Obesity
Short Abstract: Obesity, a major problem in developed nations, which was once considered the result only from personal diet and sports habits, is believed to be a genomics-related disease[1]. A large amount of obesity-related gene expression microarray studies were done[2,3,4], and their corresponding data are publicly available. Therefore cross-experimental analysis on those datasets is possible and necessary due to the noisy nature of gene expression data[5], and it can further uncover new biological knowledge on obesity.
We plan to obtain obesity-related gene expression microarray datasets from Gene Expression Omnibus(GEO). After selection of comparable datasets, we will do a cross-experimental analysis to discover significantly differential expressed genes, by the analysis, the power of the study increases, and more genes could be found while the significance of the result is hidden by noise in single study. In addition, as there are some potential signaling pathways related to obesity, we will use gene set enrichment analysis (GSEA) for further discovery of significant KEGG pathways and Gene Ontology (GO) terms. The result will be significant if we consider genes in the same signaling pathways together, while the study of those genes individually might not.
Some studies were done in aging related gene expression by meta-analysis of public microarray datasets[5], and we expect our obesity-related study fruitful. In our result, significant differential expressions of well accepted obesity-related genes will be confirmed, and newly discovered significant genes along with significant signaling pathways will be consider potential related to obesity, and are worth of further studies.

References upon request
TOP
N12 - Expectation Maximization for Finding Common Motifs of Body Proteins and HIV-1 Virus
Short Abstract: Protein-protein interactions are frequently mediated by protein linear motifs, described as short (usually 3-8 length), common sequences of polypeptide chains that are associated with a specific function. Some viruses, such as HIV, compete human proteins by mimicking part of their structure (the motif) so it can hijack their role in interacting with the target protein. We focused on HIV-1 as the pathogen and human as the host. The objective is to find the proteins that interact with HIV-1 by extracting common motifs among body proteins and HIV-1 virus. We applied Expectation Maximization (EM) algorithm to learn the common motif model and find the start position of the motif in body protein and HIV-1 sequences. HIV-1 protein has several mutant sequences that are very similar. We compress these mutants as one sequence with Conditional Random Field probabilistic model. The compressed sequence participates in motif extraction together with body proteins. EM is a very powerful algorithm and we showed that it performs efficiently for short genetic sequences. However, our experimental results indicate that the performance of EM in this problem depends on the probabilistic model assumed for common motifs. Assuming the conditional dependence of each amino acid in motif on the amino acid at previous position was observed to perform better than independence assumption, in finding common motifs for HIV-1 and body proteins. Our method was capable of extracting 26% of known interacting body proteins.
TOP

View Posters By Category

Search Posters:


TOP