ISMB/ECCB 2011 Posters

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Accepted Posters

Category 'N'- Microarrays'

Poster N01

Power and minimal sample size for multivariate analysis of microarrays

Maarten Iterson Leiden University Medical Center

Judith Boer (Erasmus MC – Sophia Children’s Hospital Rotterdam, Pediatrics – Oncology and Hematology); Renee Menezes (VUmc Amsterdam, Epidemiology and Biostatistics); Jose Ferreira (RIVM, -);

Short Abstract: Choosing the appropriate sample size for high-throughput experiments, such as those involving microarrays and next-generation sequencing is complicated. Traditional univariate sample size determinations relate power and significance level to sample size, effect size and sample variability. However, for high-dimensional data these quantities need to be redefined: average power instead of power, significance level needs to take multiple testing into account, and both effect sizes and variances have many values.
Some authors (Ferreira and Zwinderman, 2006 and Dobbin and Simon, 2005) have proposed such methods for two-group comparisons of high-dimensional data. Ferreira's method is the most general albeit assuming normality of test statistics. This method uses the entire set of test statistics from pilot data to estimate the effect size distribution, power and minimal sample size.
We aimed at a generalization of power and sample size estimation more applicable to high-throughput genomics data. First, we extended Ferreira’s method to the case of a Student-t test. Second, we considered t-test statistics generated by testing if a coefficient of a general linear model is equal to zero. Furthermore, we considered Student-t tests that use a shrunken variance estimator, such as those produced by empirical Bayes linear models. These extensions represent a considerable improvement on the power and sample size estimation compared to when the normal assumption is used, which we illustrate via a simulation study. The extensions will be implemented as part of our BioConductor package SSPA (van Iterson et al., 2009), forming a valuable tool for experimental design of microarray experiments.

Poster N02

A Genomic Variation Analyzing Tool using Array-based Comparative Genome Hybridization.

Kwang Su Jung Korea National Institution of Health

Mikyung Lee (University of California, David Geffen School of Medicine); Kiejung Park (Korea National Institution of Health, Division of Bio-Medical Informatics); Sanghoon Moon (Korea National Institution of Health, division of structural and functional genomics); Young Jin Kim (Korea National Institution of Health, division of structural and functional genomics);

Short Abstract: It has been generally known that most genes exist in two copies in a genome. However, recent investigations have reported that large segments of DNA from thousands to millions base pairs can vary in copy number. Genes that were considered to always express in two copies per genome have now been discovered to be present in one or more than two copies. Sometimes, genes are missing altogether. Copy number variation (CNV) has important roles both in human disease and drug response since they often include genes. Realizing a whole process of CNV formation could be useful to better grasp human genome evolution. To handle this issue, we have implemented a java-based program named Conovar that discovers CNVs through array CGH data and analyzes them in user-friendly interface. The Smith-Waterman Array algorithm is embedded in our system to identify copy number variants. Our system summarizes statistics of the user-selected CGH region among samples. Conovar displays CGH values of samples chosen by users in order to compare differences of log ratio per sample. Conovar proposes another map viewing difference of CNV regions per sample as well. The proposed system has an ability to automatically report the well-known CNV regions notified in Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) since users want to verify whether CNV regions found by themselves have been already reported or not. Conovar needs to connect MySQL database to use DGV data, thus users are needed to handle MySQL database. DGV offers contents of the genomic variants as text files.

Poster N03

Survival prediction of invasive bladder tumor based on gene expression signature

In-Sun Chu Korean Bioinformation Center

Short Abstract: Bladder cancer patients with superficial and invasive tumors have remarkably different 5-year survival rates reflecting their biological differences. It is important to use microarray technology for bladder cancer survival prediction by the profiles of gene expression signatures. Gene expression data were collected from tumor specimens from 165 patients with Korean bladder cancer. We focused on the survival prediction of invasive bladder tumor that compare with the progression of superficial-to-invasive bladder tumors.

In this study, we selected genes use Cox proportional hazard model whose expression patterns are proportionally associated the length of cancer-specific survival (CFS). The expression levels of 256 probes were correlated with CFS time (p<0.005). In poor prognosis subtype, for example, the expression of STK6 (Aurora kinase A), AURKB (Aurora kinase B), and TPX2 (regulator of Aurora kinases) are all up-regulated.

From the these results, invasive bladder cancer patients based on gene expression signatures with a poor prognosis may wish to pursue more aggressive therapy following surgery, while those with a good prognosis may be spared from unnecessary treatment. To apply the gene expression profile in clinical practice, we will need to improve the predictive ability of the profile and confirm the reliability of survival profile in prospective studies.

Poster N04

Comparative study and optimization of Mutual Information based algorithms for the inference of gene regulatory networks using transcriptomic data

Jorge Ayuso University Pontificia of Salamanca

Manuel Martín-Merino (University Pontificia of Salamanca) Javier De Las Rivas (Cancer Research Center of Salamanca (CIC-IBMCC, CSIC/USAL), Functional Genomics and Bioinformatics);

Short Abstract: The inference of gene regulatory networks from genome-wide expression data provides valuable insight to understand biological systems. Several learning algorithms based on Mutual Information (MI) measures have been applied to this aim with encouraging results. Two-way MI methods such as ARACNE or CLR focus on the discovery of direct regulatory interactions and are not able to detect more complex relationships between gene regulators and targets. Other methods have been proposed to address this issue based on three-way mutual information, sinergy or partial correlation indexes. They are able to identify more complex relationships between genes, allowing discovery of causal relationships. However, the performance and properties of the different algorithms remain difficult to assess in the context of real biological problems.

We present an extensive empirical study of several MI learning algorithms for the inference of gene regulatory networks. To this aim we have considered two public available expression datasets (E. coli regulators and MYC oncogene regulation) for which the regulatory interactions are well studied. The algorithms have been compared considering several objective measures. Additionally, a biological analysis of the cofactors and the regulatory interactions have provided a deeper understanding of different methods.

The experimental results suggest that the specific MI estimator considered does not have great impact on the results, though the best estimator is based on the empirical distribution. The results also suggest that Three-Way Mutual Information helps to discover more complex regulatory mechanisms that involve more than two genes and helps to reduce false positive and negative interactions.

Poster N05

caArray: make microarray data easily accessible and exchangeable

Xiaopeng Bian National Cancer Institute

Juli Klemm (National Cancer Institute, Center for Biomedical Informatics and Information Technology); Maureen Colbert (National Cancer Institute, Center for Biomedical Informatics and Information Technology); Rashmi Srinivasa (5 AM solution Inc, n/a); Makioko Duncan (5 AM Solution Inc, n/a);

Short Abstract: caArray is an open-source, open development, web and programmatically accessible array data management system developed at National Cancer Institute. Comprehensive in annotation yet easy to use has always been a challenge to any data repository system. To alleviate this difficulty, caArray accepts data upload using the MAGE-TAB, a spreadsheet-based format for annotating and communicating microarray data in a MIAME-compliant fashion. MAGE-TAB is built on community standards – MAGE, MIAME, and Ontology. The components and work flow of MAGE-TAB files are organized in such a way which is already familiar to bench scientists and thus minimize the time and frustration of reorganizing their data before submission. The MAGE-TAB files are also structured to be machine readable so that they can be easily parsed into database. Users can control public access to experiment- and sample-level data and can create collaboration groups to support data exchange among a defined set of partners. All data submitted to caArray at NCI will go through strict curation by a group of scientists against these standards. The purpose of data curation is to ensure easy comparison of results from different labs and unambiguous report of results. Data deposited in caArray are readily accessible through the portal or through API. The data can also be imported from or exported into other MAGE-TAB compliant database such as GEO or ArrayExpress without much effort. Interested parties are encouraged to review the installation package, documentation, and source code available from https://array.nci.nih.gov.

Poster N06

A combination technique for microarray dataset to improve classification accuracy

Minseok Seo University of Dankook

Sejong Oh (University of Dankook)

Short Abstract: Classification analysis has been developed continuously since 1990. In that area, data preprocessing has been advanced with developing efficient classifiers. Especially, feature selection is very active topic in the preprocessing area. In spite of advantages of feature selection, it has a critical problem that missing original information of dataset. LDA and MDS are proposed to minimize losing of information. In this study, we devise a combined voting algorithm that can be used with previous feature selection algorithms. Proposed algorithm reduces losing of information on feature selection process, and as a result it improves classification accuracy.
Firstly, we calculate central points of each features of whole dataset, and find nearest class for each feature. Secondly, we make classification data from selected features that are produced by some feature selection algorithm. Thirdly, we combine both results and decide final class value for an unknown sample.
To compare proposed algorithm and previous algorithms, we have experiment of classification. We choose 9 real microarray datasets and test well known classifier such as k-nearest neighbor(KNN) , and support vector machine (SVM). The experiment results prove that proposed algorithm is useful to improve quality of feature selection.

Poster N07

Using Generalized Additive Mixed Models to identify genes similarly expressed through time in microarray experiments

Sergey Mastitsky German Cancer Research Center

Navid Bazzazzadeh (German Cancer Research Center, Theoretical Bioinformatics); Benedikt Brors (German Cancer Research Center, Theoretical Bioinformatics);

Short Abstract: Microarray experiments that involve biological and/or technical replicates are commonly used to examine the temporal changes in gene expression (e.g., in studies of the cell cycle, organism development, response to a drug treatment, etc.). One of the main goals in such experiments is to identify genes whose expression changes similarly through time. The detection of such genes is not a trivial statistical task as the resultant time-series datasets are highly multidimensional and may contain strongly correlated observations that were obtained close in time and/or from the same experimental units. We present a novel statistical pipeline that makes use of the Generalized Additive Mixed Models in combination with the Cluster Analysis to detect groups of genes similarly behaving through time. The results of a simulation study we conducted suggest that our approach can be successfully used to reveal statistically significant nonlinear temporal changes of the gene expression levels, while accounting for the aforementioned possible correlations among observations. The identified groups of similarly behaving genes can be subjected to further examination (e.g. pathway analysis) to lend deeper insight into the biological mechanisms of a phenomenon under study.

Poster N08

Feature Selection using LASSO Regression for Gene Expression Data Classification

Ho Sun Shon Chungbuk National University

Keun Ho Ryu (Chungbuk National University, School of Electrical & Computer Engineering);

Short Abstract: There has been a great deal of research into the phenomenon of life or the origins of disease that classifies or diagnoses the state of a cell. These are usually achieved by the strength of the gene expression under certain circumstances using microarrays, which can observe tens and thousands of gene expression profiles. We propose a feature selection method using a technique which combines the filter method with wavelet transforms and the LASSO regression method based on a statistical regression analysis. We select the features using the discrete wavelet transform and the filter method by applying the two methods mentioned above and by applying the LASSO method combined into the WF-LASSO method. Using the significant genes extracted through the proposed feature selection, and then applying them into the classification algorithm, we construct the classification and prediction model. We obtain the best classification results by applying, in order, the DWT, the filter method, and then finally the LASSO. That is, the feature selection method with the best classification performance was the WF-LASSO method. It means that we first eliminated the noise using the DWT, then selected the significant genes through the filter method, and then applied the LASSO model
The contribution of this paper is in that it is possible to solve problems by reducing the dimensionality of a high volume of data by using the proposed method. It is possible to apply this to medical science, for use in such things as biomarkers or drug targets.

Poster N09

ASPIRE: An improved tool for the analysis of alternative splicing using AltSplice microarrays

Melis Kayikci MRC Laboratory of Molecular Biology

Matteo Cereda (Scientific Institute IRCCS 'E.Medea', Medicine and Life Sciences);

Short Abstract: Objectives:

Alternative splicing microarrays have often been used as a tool to understand the regulation of alternative splicing on a global scale. We developed a new version of ASPIRE (Analysis of Splicing Isoform Reciprocity) software, which analyses data produced by high-resolution AltSplice Affymetrix microarrays.

Methods:

ASPIRE version 3 undertakes pairwise comparisons (between experimental (knockdown of an RNA-binding protein) and control (wildtype) samples) to determine the changes in abundance of full transcripts, which is used to normalise the probe signals before analysing splicing changes. Exons with splicing changes are then identified by analysing the reciprocal changes in signal of probesets detecting exon including and exon skipping isoforms and these exons are ranked according to their significance.

Results & Conclusions:

We show the identification of exons regulated by three RNA-binding proteins, TIA1/TIAL1, hnRNP C and TDP-43 which are validated by RT-PCR with around %85-%95 success rate. We present the comparison of ASPIRE results with MADS+, another program to analyse AltSplice microarray and with RNA-Seq data which shows the identification of exons regulated by hnRNP C. We discuss the relative benefits of these three methods. A new feature of ASPIRE is that we show thorough annotation of exons which allows us to design and draw new RNA maps for each type of exons.

Poster N10

Meta-analysis of gene expression microarrays with missing replicates

Fan Shi National ICT Australia

Gad Abraham (The University of Melbourne, Computer Science and Software Engineering); Christopher Leckie (The University of Melbourne, Computer Science and Software Engineering); Izhak Haviv (Baker IDI Heart and Diabetes Institute, Baker IDI Heart and Diabetes Institute); Adam Kowalczyk (National ICT Australia, Victoria Research Laboratory);

Short Abstract: Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as incomplete genes, may also be informative and useful.
We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the Incomplete Gene Meta-analysis and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.

Poster N11

Identification of miRNA promoters modulated by epigenetic signatures

Kwang Hee Lee Seoul National University

Kiejung Park (National Institute of Health, Division of Bio-Medical Informatics); Taejeong Oh (Genomictree Inc, Research Center); Sungwhan An (Genomictree Inc., Research Center);

Short Abstract: MicroRNAs(miRNAs) are ~22-nucleotide(nt), non-coding RNAs that play important roles in post-transcriptional gene regulation. Although several promoters involved in the expression of miRNA genes have been characterized in recent year, the miRNA promoters regulated by epigenetic signatures, such as DNA methylation, are poorly understood. To elucidate the epigenetic regulation of miRNA genes, we developed Agilent-platform miRNA array-chips with the designed probes, which are composed of the 60-mer sequences retrieved from the 1,000bp up- or down-stream regions (candidate promoter region) of the 1,049 miRNA genes in Homo sapiens. We identified several significant motifs sequences modulated by the epigenetic changes (DNA methylation). We observed that two-third of these motifs are located in the known promoters, which are occupied by RNA polymerase II, and the rest are novel motifs. Moreover, it verified by RT-PCR that the expression of miRNA genes is affected by the modification of these motifs. We show that transcription of miRNAs is regulated by epigenetic changes as well as traditional transcriptional-regulation mechanism.

Poster N12

Feature selection for specifying experimental conditions that activate gene transcriptions

Sho Ohsuga Osaka University

Shigeto Seno (Osaka University, Graduate school of Information Science and Technology); Yoichi Takenaka (Osaka University, Graduate school of Information Science and Technology); Hideo Matsuda (Osaka University, Graduate school of Information Science and Technology);

Short Abstract: Gene expression profiles are used to estimate gene regulatory networks. As genes have been measured their expressions in many experimental conditions, the number of profiles in Gene Expression Omnibus are numerous.?Gene transcriptions is activated not on all the experimental conditions in the database, but on specific experimental conditions.?It is important to select applicable experimental conditions in two points. Specifying the experimental conditions that activate gene transcriptions will directly be coupled with the improvement of the accuracy to infer their regulations. And knowing there conditions can promote understanding of the gene transcriptional regulations.

To satisfy the demand to find the combination of gene transcriptional regulations and the active experimental conditions, we propose a method that utilizes Gini's coefficient, decision trees and Support Vector Machines. The method are composed of four steps:
Firstly, it calculates the Gini's coefficient of each experimental condition from known gene transcriptions. Next, it selects features from experimental conditions that based on Gini's coefficient. The selected features is used to making multiple classifiers by supervised learning.?Finally, the multiple classifiers are used to find gene transcriptional regulations and applicable experimental conditions.

We used E.coli to evaluate the effectiveness of our method, because it is one of the most well-studied organism on gene regulations.?The results indicate that the experimental conditions selected by our method can improve the accuracy to infer gene transcriptional regulations.?

Poster N13

A Method for Inference of Gene Regulatory Network based on Bayesian Network with Uniting of Partial Problems

Yukito Watanabe Osaka University

Shigeto Seno (Osaka University, Graduate School of Information Science and Technology); Yoichi Takenaka (Osaka University, Graduate School of Information Science and Technology); Hideo Matsuda (Osaka University, Graduate School of Information Science and Technology);

Short Abstract: Bayesian Network is one of the most robust models against the experimental noises for estimating gene regulatory networks.
The faults of this model are that the search space grows exponentially to the number of genes and that the estimated network cannot have cyclic structures.
We propose Bayesian Network based method which is able to estimate the networks with cyclic structures and decreased search space.

In this algorithm, the search space is decreased by dividing the whole problem into partial problems and the target network is constructed by uniting these divided partial problems.
As the first step, proposed method divides the whole problem into n(n-1)(n-2)/6 sets of partial problems constructed of three genes, and the number of genes in the network to be estimated at once is decreased.
Secondly, the networks of each partial problem are estimated by Bayesian Network independently.
Finally, the networks of each partial problem are united and target problem is solved.
This algorithm decreases search space without degrading the solution accuracy, and the order of computation time is cube of n where n is the number of genes.
Proposed method is able to estimate networks including cyclic structures, while it is realized by uniting a set of networks.

We verified the effectiveness of proposed method through the known gene regulatory networks with cyclic structures and their expression profiles.
The results indicate our method can reduce the computational time drastically without degrading the solution accuracy.

Poster N14

Non-parametric probe-level estimation of differential expression

Sten Ilmjärv University of Tartu

Hendrik Luuk (Bispebjerg Hospital, Department of Clinical Biochemistry); Sven Laur (University of Tartu, Institute of Computer Science); Eero Vasar (University of Tartu, Department of Physiology); Jaak Vilo (University of Tartu, Institute of Computer Science);

Short Abstract: The new generation of Affymetrix Expression Arrays are designed to enable the investigation of phenomena such as alternative splicing, alternative promoter usage, and alternative termination while also providing more probe level data to determine the overall rate of expression at particular genomic loci. We propose a non-parametric method for estimating differential gene expression on Affymetrix Exon ST and Gene ST arrays directly from probe-level data.

Briefly, probe-level differential expression is established using Wilcoxon’s rank sum test on relative rank scores of raw probe signals. Thereafter, the probes are grouped based on expression profile identity, and the number of probes targeting each transcript in each group is calculated. Finally, hypergeometric probability is calculated for each transcript in reference to each expression profile, indicating how likely it is to find the observed number of probes with identical expression dynamics by chance. After correcting for multiple testing the resulting significance values serve as an estimate of confidence for assigning the particular expression profile to the transcript in question.

We use computer simulations with added gaussian noise and real-time PCR experiments to demonstrate the sensitivity and reliability of this method.

Poster N15

Prediction of specific versus non-specific RBP-RNA interaction

Eleonora Kulberkyte Technical University of Denmark

Christopher Workman (Technical University of Denmark, Department of Systems Biology);

Short Abstract: Recent studies have shown that RBPs (RNA-binding proteins) mediate post-transcriptional activity of RNA transcripts. In cellular quality-control mechanisms, such as NMD (nonsense-mediated-decay), RBPs recognize mRNAs carrying PTCs (premature translation-termination codons). However, our current understanding about specific versus non-specific binding of RBPs to their target RNAs, is very limited.

Up to date an extensive high-throughput screening of putative protein-RNA interactions has been performed both in vivo and in vitro in order to reveal abundant protein-RNA interactions. Experimental data derived from methods such as RIP-ChIP (Microarray profiling of RNAs associated with immuno-purified RNA-binding-proteins) CLIP (isolation of cross-linked RNA segments and cDNA sequencing) HITS-CLIP/PAR-CLIP (High-throughput sequencing of RNAs isolated by cross-linking immunoprecipitation) provide comprehensive basis to reveal determinants for specific versus unspecific protein-RNA interaction.

Currently available protein-RNA co-crystals, experimental high-throughput protein-RNA interaction data and computational-machine learning methods has enabled to analyze and predict possible mechanisms that mediate specificity of RBP-RNA interactions.

Poster N16

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate

Djork-Arné Clevert Johannes Kepler University Linz

Andreas Mitterecker (Johannes Kepler University Linz, Institute of Bioinformatics); Andreas Mayr (Johannes Kepler University Linz, Institute of Bioinformatics); Günther Klambauer (Johannes Kepler University Linz, Institute of Bioinformatics); Marianne Tuefferd (Johnson & Johnson, Pharmaceutical Research & Development); An De Bondt (Johnson & Johnson, Pharmaceutical Research & Development); Willem Talloen (Johnson & Johnson, Pharmaceutical Research & Development); Hinrich Göhlmann (Johnson & Johnson, Pharmaceutical Research & Development); Sepp Hochreiter (Johannes Kepler University Linz, Institute of Bioinformatics);

Short Abstract: Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with the disease, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html

Poster N17

Identification of Interconnected Markers for T-cell Acute Lymphoblastic Leukemia

Emine Guven Maiorov Koc University

Ozlem Keskin (Koc University, Chemical and Biological Engineering); Attila Gursoy (Koc University, Computer Engineering);

Short Abstract: T-cell acute lymphoblastic leukemia (T-ALL) is a very complex disease, resulted from proliferation of differentially arrested immature T-cells. The molecular mechanisms and the genes involved in the causes of T-ALL remain largely undefined. In this study, we aimed to find biomarkers to differentiate individuals with T-ALL from the non-leukemia/healthy conditions, to discover markers that are not differential themselves but interconnect highly DE genes, and to provide new suggestion for pathways involved in the causes of T-ALL. To do that, we integrate gene expression data of T-lineage Acute Lymphoblastic Leukemia (T-ALL) samples and healthy samples with a human protein-protein interaction network to discover diagnostic biomarkers not as individual genes but as subnetworks. With using the network-based approach, we have identified 19 significant subnetworks, containing 109 genes. The classification/prediction accuracies of subnetworks are considerably high, as high as 98%, meaning that the subnetworks can correctly predict the diseased sample 98% of the time. Some genes in subnetworks were already found to be associated with T-ALL, but the others were not. The subnetworks were rich in transcription factors whose ectopic activation is known to be one of the reasons behind T-ALL. After experimental validation, these novel genes may help investigators to diagnose/differentiate individuals with T-ALL.

Poster N18

COLOMBOS: Access Port for Cross-Platform Bacterial Expression Compendia

Kristof Engelen KULeuven

Qiang Fu (KULeuven, Department Of Microbial And Molecular Systems (M2S)); Pieter Meysman (KULeuven, Department Of Microbial And Molecular Systems (M2S) ); Aminael Sánchez-Rodríguez (KULeuven, Department Of Microbial And Molecular Systems (M2S) ); Riet De Smet (KULeuven, Department Of Microbial And Molecular Systems (M2S) ); Karen Lemmens (KULeuven, Department Of Microbial And Molecular Systems (M2S) ); Ana Carolina Fierro (KULeuven, Department Of Microbial And Molecular Systems (M2S) ); Kathleen Marchal (KULeuven, Department Of Microbial And Molecular Systems (M2S) );

Short Abstract: Microarrays are the main technology for large-scale transcriptional gene expression profiling, but the large bodies of data available in public databases are not useful as is due to the large heterogeneity. There are several initiatives that attempt to bundle these data into expression compendia, but such resources for bacterial organisms are scarce and limited to integration of experiments from the same platform or to indirect integration of per experiment analysis results.

We have constructed comprehensive organism-specific cross-platform expression compendia for three bacterial model organisms (Escherichia coli, Bacillus subtilis, and Salmonella enterica serovar Typhimurium) together with an access portal, dubbed COLOMBOS, that not only provides easy access to the compendia, but also includes a suite of tools for exploring, analyzing, and visualizing the data within these compendia. It is freely available at http://bioi.biw.kuleuven.be/colombos. The compendia are unique in directly combining expression information from different microarray platforms and experiments, and we illustrate the potential benefits of this direct integration with a case study: extending the known regulon of the Fur transcription factor of E. coli. The compendia also incorporate extensive annotations for both genes and experimental conditions; these heterogeneous data are functionally integrated in the COLOMBOS analysis tools to interactively browse and query the compendia not only for specific genes or experiments, but also metabolic pathways, transcriptional regulation mechanisms, experimental conditions, biological processes, etc.

Poster N19

COMODO: an adaptive coclustering strategy to identify conserved coexpression modules between organisms

Ana Carolina Fierro Katholieke Universiteit Leuven

Peyman Zarrineh (Katholieke Universiteit Leuven, Department of Electrical Engineering); Aminael Sánchez-Rodríguez (Katholieke Universiteit Leuven, Department of Microbial and Molecular Systems); Kristof Engelen (Katholieke Universiteit Leuven, Department of Microbial and Molecular Systems); Kathleen Marchal (Katholieke Universiteit Leuven, Department of Microbial and Molecular Systems);

Short Abstract: Increasingly large-scale expression compendia for different species are becoming available. By exploiting the modularity of the coexpression network, these compendia can be used to identify biological processes for which the expression behavior is conserved over different species. However, comparing module networks across species is not trivial. The definition of a biologically meaningful module is not a fixed one and changing the distance threshold that defines the degree of coexpression gives rise to different modules. As a result when comparing modules across species, many different partially overlapping conserved module pairs across species exist and deciding which pair is most relevant is hard. Therefore, we developed a method referred to as conserved modules across organisms (COMODO) that uses an objective selection criterium to identify conserved expression modules between two species. The method uses as input microarray data and a gene homology map and provides as output pairs of conserved modules and searches for the pair of modules for which the number of sharing homologs is statistically most significant relative to the size of the linked modules. To demonstrate its principle, we applied COMODO to study coexpression conservation between the two well-studied bacteria Escherichia coli and Bacillus subtilis. COMODO is available at: http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Information_Zarrineh_2010/comodo/index.html.

Poster N20

Statistical methods for estimating microRNA expression from targets' expression data

Tauno Metsalu University of Tartu

Raivo Kolde (University of Tartu, Mathematics-informatics department); Jaak Vilo (University of Tartu, Mathematics-informatics department);

Short Abstract: MicroRNAs (miRNAs) are small endogenous RNAs whose main function is to repress expression of target mRNAs by binding to them. MiRNA binding leads to degradation of target mRNAs and it has been shown that in some cases it is possible to predict miRNA activity from target expression data. In this work we study methods to improve the quality of the prediction. The work is based on data derived from simulations, miRNA overexpression, miRNA expression profiling and HITS-CLIP procedure. We compared different target prediction and statistical algorithms. The results are mixed since prediction availability and algorithm applicability varies given different input datasets. However TargetScan targets and different gene set enrichment analysis techniques give the most robust results. We also test if methods are more sensitive when looking at the 3' ends of the mRNAs, since these are degraded first. Using this idea yielded consistently better results and really improved the prediction quality.

Poster N21

eqtl weighted gene co-expression network analysis

yask gupta department of dermatology,Luebeck

Short Abstract: The past decade has pushed the application of statistical genetics towards steadily more molecular phenotypes in particular in animal models. Major data sources are the combined analyses of genotype and transcriptome, adjunct to often dozens of disease phenotypes. This led to the development of field of system genetics.

This work presents the application of correlation gene-network analysis on such data for identifying putative gene interactions, candidate hub genes and their association with clinical parameters. With appropriate further processing, i.e. the specification of thresholds, this was proven tremendously useful in many previous studies.

Expression QTL analyses typically provide so many samples that correlation analyses may also be reliably performed for subpopulations. When defining such for groups of individuals at very different quantiles of a particular phenotype, and with that phenotype to develop over the course of the disorder, one gets to differentiate the early observable correlations from the late ones. This may give some of the observed connections a direction, i.e. a causal relationship for the development of individual phenotypes.

We present the respective analysis with the Spatial Preferred Attachment model and compare it with a differential analysis on subpopulations. The effort is based on the R package WGCNA and implemented as an extension to the TIQS interactive QTL system (http://eqtl.berlios.de).

Poster N22

Applications of Extreme Value Theory in Estimating Gene Expression Relevance

Steven Eschrich Moffitt Cancer Center

Gregory Bloom (Moffitt Cancer Center, Biomedical Informatics); Gang Han (Moffitt Cancer Center, Biostatistics); Andrew Hoerter (Moffitt Cancer Center, Biomedical Informatics); Neera Bhansali (Moffitt Cancer Center, Biomedical Informatics); Matthew Schabath (Moffitt Cancer Center, Cancer Epidemiology); David Fenstermacher (Moffitt Cancer Center, Biomedical Informatics);

Short Abstract: False discovery rate (FDR) estimation corrects the statistical significance of gene expression tests when multiple testing issues arise. The Bonferroni correction does not consider the distribution of findings, while FDR incorporates this information but does not provide comparative information on the significance of findings. We develop a measure of comparisons based on extreme value theory. The probability of the p value being ranked first in the experiment is calculated from an empirical distribution function of minimum p values from a simulated null distribution. This measure provides an interpretable probability based on the idea of the “best” finding within the experiment. Further, this technique provides a measurement of the separability of the groups being analyzed compared to this simulated distribution. Experiments were performed using a publicly available lung cancer gene expression dataset for several dichotomous outcomes, including gender and clinical stage. An extreme value distribution was constructed from the smallest p values calculated from 1000 simulated datasets. Simulated datasets used uniform random values and the t test was performed for p values. For gender, 50 probesets were significant (p<0.001, FDR q value < .1%) with 95% probability of being the smallest p value in the dataset (compared to the null distribution). 242 probesets were significant (p<0.00029, q<1.9%) with non-zero probability. The smallest p value (2.68e-111) had 100% probability using this measure, indicating gender has a large expression effect as expected. These results suggest that extreme value theory provides a method for comparing findings both within and between gene expression analyses.

Poster N23

Improving biomarker list stability by integration of biological knowledge in the learning process

Tiziana Sanavia University of Padova

Fabio Aiolli (University of Padova, Department of Pure and Applied Mathematics); Giovanni Da San Martino (University of Padova, Department of Pure and Applied Mathematics); Andrea Bisognin (University of Padova, Department of Biology); Barbara Di Camillo (University of Padova, Department of Information Engineering);

Short Abstract: This poster is based on Proceedings Submission 120.

Motivation

The identification of robust molecular biomarker lists related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. A possible approach to improve list stability is to integrate biological information from genomic databases in the learning process. This work aims to investigate the effect of the integration of different biological information in the learning process on biomarker list stability.

Methods

Biological information is codified into similarity matrices and the feature space is transformed such that the more similar two features are, the more closely they are mapped. Eight specific pair-wise gene similarity measures are considered: semantic similarities for the annotations on Gene Ontology (GO), topology-based similarity measures for protein-protein interactions and correlation for gene expression data. A more stable ad-hoc variant of the Bayes Point Machine (Herbrich et al., 2001) is used. Results are evaluated in terms of both feature ranking stability by the Canberra distance and prediction accuracy.

Results

Our approach has been tested against three publicly available breast cancer datasets. All types of biological information are able to decrease the average Canberra distance over the biomarker lists with respect to the standard classification approach. In particular, matrices obtained from semantic similarity measures on GO annotations and from the normalized geodesic distance on protein-protein interactions are the best performers in improving list stability (Canberra distance decreases by 45%) maintaining almost equal prediction accuracy (variations ranging between -4% and +1%).

Poster N24

Simulation strategies to evaluate the performance of pathway-based gene expression analysis methods

Shailesh Tripathi Queen's University Belfast

Ricardo de Matos Simoes (Queen's university, Belfast , Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences); Frank Emmert-Streib (Queen's university, Belfast , Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences);

Short Abstract: The performance of pathway-based gene expression analysis methods needs to be analyzed carefully in order to obtain a clear understanding of their power. A variety of strategies can be used to simulate gene expression data to study expression changes in pathways. In our study we develop a framework which considers important factors such as pathway-size, sample-size, fraction of differentially expressed genes in a pathway (detection call) and the correlation structure in the data. We generate different types of simulated expression data by using a multivariate normal and a Gaussian graphical model. The latter model allows to generate simulated data with a more realistic correlation structure. The simulation framework is used to evaluate three pathway-based methods that define different null hypothesis of pathway expression change. We study the influence of the sample size for pathways consisting of 5 to 200 genes and model differentially expressed pathways by varying the detection call ratios for each pathway . Our study shows that the power of the individual pathway-based methods are effected to a different extent by the sample-size, pathway-size, detection call and the degree of correlation in the gene expression data. The presented simulation framework might be helpful for future developments of null hypothesis that consider changes in the gene interaction structure of pathways in order to detect pathological pathways in complex diseases.

Poster N25

Signaling Pathway Coupling Phenomena

Michele Donato Wayne State University

Valentina Pedoia (University of Insubria, DICOM); Sorin Draghici (Wayne State University, Computer Science);

Short Abstract: Three radically different approaches are currently available to identify the signaling pathways that are significantly impacted in a given condition: enrichment analysis, functional class scoring, and impact analysis. All these approaches calculate a p-value that aims to quantify the significance of the involvement of the given pathway in the condition under study. These p-values were thought to be inversely proportional to the likelihood of their respective pathways being involved in the given condition, and hence be independent. Here we show that various pathways can affect each other's p-values in significant ways. Thus, the significance of a given pathway in a given experiment has to be interpreted in the context of the other pathways that appear to be significant. In certain circumstances, pathways previously found to be significant with some of the existing methods may not be so.
We hypothesize that the phenomenon is related to the amount of common genes between different pathways. Here we present results obtained by analyzing pathways obtained from the KEGG signaling pathways database.
Also, we present a factorization of the p-value of a pathway that allows the expression of such p-value in terms of the p-value of the overlap with another pathway. With these results, we assess the importance of the overlapping part between two pathways with respect to the importance of the non overlapping parts, in order to determine a cause-effect relation among the two pathways.

Poster N26

Discovering evolution, biogenesis, expression and target predictions of porcine micro-RNAs: new regulatory gene expression networks in different tissues.

Paolo Martini University of Padova

Stefano Cagnin (University of Padova, CRIBI); Gabriele Sales (University of Padova, Statistical Science); A. Gandaglia (University of Padova, Scienze Biomediche Sperimentali); F. Naso (University of Padova, Scienze Biomediche Sperimentali); Mattia Brugiolo (University of Padova, CRIBI); Cristiano De Pitta' (University of Padova, CRIBI); G. Gerosa (University of Padova, Scienze Cardiologiche); M. Spina (University of Padova, Scienze Biomediche Sperimentali); Chiara Romualdi (University of Padova, Biology); Gerolamo Lanfranchi (University of Padova, CRIBI);

Short Abstract: The pig is used as model organism in the field of the cardiovascular research, but the gap of knowledge between it and the two most used model organisms for human health (mouse and rat) is far to be filled.
Nowadays, more than 90% of the porcine genome has been sequenced, providing the description for 21,567 Ensembl transcripts, 51,576 UniGene clusters and only 255 miRNAs .
We developed a new computational approach to predict specie-specific and conserved miRNAs that were experimentally confirmed by a modified RNA-primed Array-based Klenow Extension (RAKE) assay and specific PCR reactions allowing the identification of 489 conserved and 1,178 novel pig-specific miRNAs, increasing our tally of confirmed miRNAs to 1,667. The systematic analysis of pre-miRNAs showed, for the first time, that most of the primary transcripts of intergenic miRNAs are transcribed as 3.5 kb long transcripts while a small proportion presents a greater length (up to 10 kb).
MicroRNA expression was then tested in 14 different tissues by the RAKE technology and correlated with the expression of mRNAs from the same tissues. The identification of miRNA targets evidenced that miRNAs are important in the normal tissue function maintenance and that miRNA isoforms are biologically meaningful. Specifically, the regulation of genes involved in skeletal muscle development, insulin dependent, and the calcium homeostasis appears to be under the control of many miRNAs in skeletal muscle, while miRNAs expressed in the myocardium seem to regulate the ubiquitin proteasome system.

Poster N27

Alternative Splicing in Malignant Lymphoma

Karin Zimmermann Humboldt Universität zu Berlin

Marcel Jentsch (Humboldt Universität zu Berlin, Department of Computer Science); Dido Lenze (Charité-Universitätsmedizin Berlin , Institute of Pathology); Michael Hummel (Charité-Universitätsmedizin Berlin, Institute of Pathology); Ulf Leser (Humboldt Universität zu Berlin, Department of Computer Science);

Short Abstract: Alternative splicing (AS) is known to greatly contribute to the variety of the eukaryotic transcriptome. Though a great part of this variation can be attributed to the required diversity of biological processes, alternative splicing also plays an important role in many diseases, first and foremost in cancer. Technologies like RNAseq or exon arrays aim at elucidating the degree of which aberrant isoforms are present and thus provide the basis for the study of mechanisms underlying the perturbations or that are caused by perturbations.

We report on a study on the role of alternative splicing in 7 different lymphoma subtypes, based on a data set of 116 exon arrays. To this end, we performed an exhaustive evaluation of different algorithms that aim at detecting differential splicing from exon array datasets. Based on this optimised algorithm choice, we investigated the different lymphoma subtypes according to their exon expression profile.

Poster N28

Pathway states as a robust way to characterize a sample

Raivo Kolde University of Tartu

Priit Adler (University of Tartu, Institute of Molecular and Cellular Biology); Jaak Vilo (University of Tartu, Institute of Computer Science);

Short Abstract: Pathways are a very useful way to conceptualize the processes in a cell. They are often used as a way to characterize the state of a cell or changes in it. Usually it is done by looking at the enrichment of pathway genes. Although this indicates the importance of the pathway, it does not highlight what type of changes happened to the pathway. Gene expression data compendiums containing vast variety of biological conditions and cell types offer a unique way to characterize behaviour of pathways as a whole. Theoretically there is an exponential number of possible expression configurations the genes of a pathway. However, observations on actual expression data show that the number of distinct gene expression patterns for a pathway can be rather small. The aim of this work is to robustly extract these pathway "steady states" from large collections of gene expression data. We can associate biological interpretation to these states, by using sample annotations. For characterizing a new biological sample, we can find a corresponding steady state for all the pathways and use these as features in subsequent analyses. This type of characterization has the advantages of being easily interpretable and stable.

Poster N29

Flanking sequence effects on oligonucleotide hybridization

D. Andrew Carr Accelerated Technoloy Laboratories

Jennifer Weller (University North Carolina Charlotte, Department of Bioinformatics and Genomics); Saeed Khoshnevis (University North Carolina Charlotte, Department of Bioinformatics and Genomics); Donald Kolva (Accelerated Technology Laboratories, Bioinformatics);

Short Abstract: Microarrays platforms measure both sequence content and prevalence of sequence by leveraging oligonucleotide hybridization. Computational probe design centers on short regions of high sequence similarity, most often between 25 and 50 base pairs in length. A variety of pattern-matching algorithms specific to the probe length are employed to determine the uniqueness of a target against its background, and a subset of these consider internal structure of the probe as a contra-indicator of successful hybridization. Since it is assumed that the target structure will mirror that of the probe, no consideration is given to the flanking sequences. In this study, we start to examine on the effects of total target sequence length in the context of common hybridization assay conditions. There are two considerations: due to the flexible nature of the single-stranded backbone, loops and bubbles can still yield stable hybridization between species with less than perfect sequence homology; the flanking sequences may loop back and occupy or obstruct the intended duplex region. Selecting a small subset of 33mer probes from the Affymetrix SNP6.0 Array we generated a pool of potentially cross-hybridizing sequence regions found by using the SeqNFind™ platform. Pools of length variant targets centered around the potentially cross hybridizing sequence were generated from human reference genome 36.3 and computationally examined using OMP™ to calculate the affinity constants. To demonstrate that the modeled properties affect measurements a small subset of these targets and probes were tested on microarrays.

Poster N30

Elimination of spatial autocorrelation in microarray data

Philippe Serhal Université de Montréal

Sébastien Lemieux (Université de Montréal)

Short Abstract: Oligonucleotide microarrays allow for high-throughput gene expression profiling, usually across multiple biological samples in order to identify significant differences. The task of assessing biologically interesting variation is significantly hampered by several sources of obscuring, non-biological variation, such as RNA purity and amplification efficiency, hybridization efficiency and spatial nonuniformity, scanner calibration, punctual spatial artefacts (e.g. smears and debris), and algorithmic processing of raw image data. Random error resulting from these sources is not significantly problematic because it can be blindly eliminated by replication, at both the probe and array levels; on the other hand, systematic errors cannot be explicitly tackled by replication and thus must be addressed by other means. Current analysis pipelines address this indirectly by transforming data such that all arrays and all gene expression measures in a given experiment become more directly comparable – though inevitably discarding information in the process. The identification, quantification, and subsequent correction of systematic errors requires an independent source of information, which is not easily obtainable. We posit that expected mutual information shared by probes targeting a common transcript can be leveraged along with expected independence of probes grouped by some other given criterion to serve as a proxy for such a source. As a concrete example, we show how we applied this to quantify and correct spatial autocorrelation in some Affymetrix U133A arrays found in GEO, and how this improves the reliability of expression measures. We then briefly formalize the general framework and propose further applications.

Poster N31

A Computational Study of Carotenoid Gene Expression using Micro-array Data

Firdous Khan University of the Western Cape

Samson Muyanga (University of the Western Cape, SANBI); Alan Christoffels (University of the Western Cape, SANBI);

Short Abstract: This poster is based on Proceedings Submission for ISMB/ECCB 2011.

Carotenoids are organic compounds found in plastids of plants, fungi, algae and one form of aphid. Only Beta-carotene, alph-carotene and astaxanthin of 600 known carotenoids can be converted to vitamin A in the human body. Vitamin A deficiency can lead to blindness especially in children under the age of five years and reduces body immune system1,2.The aims of the project include discovering the regulatory network of carotenogenesis by querying micro-array data for the carotenoid biosynthetic pathway genes and co expressed genes in various sets of stimuli in wild type and selected mutants of Arabidopsis thalania. Previous studies have shown that growth conditions, such as light1, plays a major role in the biosynthesis of carotenoids3,4. A systems biological approach will be used to analyse micro-array data sets derived from Arabidopsis thaliana grown under various conditions and treated with different stimuli.The expected outcome would be to identify and record putative conditions that influence gene expression and in turn to monitor the increase of production of pro-vitamin A in A. thaliana by identifying molecular switches that can be modulated to control the biosynthetic pathway.

Poster N32

A survey of batch effects in public gene expression data

Paul Pavlidis University of British Columbia

Raymond Lim (University of British Columbia, Psychiatry / Centre for High-Throughput Biology); Jesse Gillis (University of British Columbia, Psychiatry / Centre for High-Throughput Biology); Willie Kwok (University of British Columbia, Psychiatry / Centre for High-Throughput Biology);

Short Abstract: We present the first large-scale analysis of batch effects in publicly-available genomics data sets. We considered over 1000 microarray data sets, using scan date stamps in raw data files to group samples into batches.

We applied PCA to identify data components correlated with batch groupings. Over 400 of the data sets had strong relationships between batch and the first principal component. Many had experimental designs which confounded batch and experimental variables of interest. We then tested the effect of batches on differential expression analysis, in a subset of data sets which were initially deemed “correctable” and which had additional covariates which could be examined. Of these, ~75% have probes showing differential expression with respect to batch at a false discovery rate of 0.01. Removal of batch effects using “ComBat” (Johnson et al., [2007], Biostatistics 8:118-127) and then reanalyzing each data sets with respect to the remaining covariates yielded more differentially expressed probes, on average than before correction. We will also report on the impact of these effects on coexpression analysis, and on functional interpretation of the data.

Our findings agree with recent suggestions that many published genomics studies are adversely affected by undocumented technical artifacts. It is clear that a failure to account for these effects would dramatically affect many potential re-uses of the data. The quality control measures described above are integrated into the “Gemma” genomics data meta-analysis system (www.chibi.ubc.ca/Gemma) and work is underway to routinely include corrective measures in Gemma pipelines.

Poster N33

The EXPANDER suite for accessible analysis of microarray data

Hershel Safer Tel Aviv University

Adi Maron-Katz (Tel Aviv University, School of Computer Science); Ran Elkon (Tel Aviv University, Faculty of Medicine); Igor Ulitsky (MIT, Whitehead Institute); Chaim Linhart (Tel Aviv University, School of Computer Science); Amos Tanay (Weizmann Institute, Department of Computer Science and Applied Mathematics); Roded Sharan (Tel Aviv University, School of Computer Science); Eyal David (Tel Aviv University, School of Computer Science); Dorit Sagir (Tel Aviv University, School of Computer Science); Yosef Shiloh (Tel Aviv University, Faculty of Medicine); Ron Shamir (Tel Aviv University, School of Computer Science);

Short Abstract: As the throughput of data generation has increased, so has the complexity of analyzing the data. Relevant analysis tools must be selected, and the output of each must be transformed for use in subsequent tools. Analyzing the results of high-throughput experiments is now beyond the capabilities of the typical biologist who performs the experiment.

EXPANDER (EXpression Analyzer and DisplayER) provides an integrated environment for analyzing the results of DNA microarray experiments. It incorporates all the tools needed for a typical end-to-end analysis of gene expression profiles. Its easy-to-use interface is accessible to biologists so that they can analyze their own data. The biologist can focus on selecting the analyses, rather than worry about the technicalities of operating the tools.

EXPANDER reads the common file formats of microarray data and then manages the raw data and analytical results. It handles data preprocessing and normalization, identification of differentially expressed genes, and clustering and biclustering.

EXPANDER also incorporates advanced tools for downstream analyses: recognizing enriched GO categories (Tango), identifying enriched cis-regulatory motifs (Prima), predicting human microRNA function and activity (Fame), reconstructing functional modules in a protein interactions network (Matisse), integrated analysis using highly curated signaling pathways (Spike), and more. It supports analysis of 14 species, ranging from human to E. coli.

The EXPANDER toolset has been downloaded more than 10,000 times and has been cited in more than 200 studies that cover a wide range of applications.

This work was supported by the Trireme and Apo-Sys grants from the EU.

Poster N34

SAD: a novel approach for the analysis of microbial transcripts with tiling arrays

Silvia Bottini Novartis Vaccines and Diagnostics srl

Antonello Covacci (Novartis Vaccines and Diagnostics srl, Research Center); Claudio Donati (Novartis Vaccines and Diagnostics srl, Research Center); Alessandro Muzzi (Novartis Vaccines and Diagnostics srl, Research Center);

Short Abstract: High-density DNA tiling microarrays are a powerful tool for conducting unbiased genome-wide studies. In particular, when applied to the analysis of RNA expression signals in microbial organisms, they reveal comprehensive transcriptomic information of the target organism including all coding and non-coding portions of the genome. Current methods for determining these RNA transcription units are still underdeveloped. Existing data processing software for traditional microarrays cannot be used since the considerably larger size and the different nature of tiling array data require a new analysis strategy.
Here we present SAD, a novel approach in the framework of proximity-based heuristic methods. SAD provides a presegmentation smoothing computing the pseudomedian of signal intensity within sliding windows. Signal areas are determined moving a sliding window along the chromosomal axis and measuring the evidence for the presence of transcripts by computing a scan statistic at each window step. The method has been applied to several experiments and gives promising results concerning the estimation of the number and localization of transcribed regions on the genome.

Poster N35

From hybridization theory to microarray data analysis : performance analysis

Fabrice Berger Katholieke Universiteit Leuven

Enrico Carlon (Katholieke Universiteit Leuven, Institute for Theoretical Physics);

Short Abstract: Motivation : High density oligonucleotide array manufacturers assume that the probe intensities are linearly correlated with the concentration of a target sequence (exon, transcript, gene). Molecular biologists are used to design PCR primers with regards to the sequence composition (GC-content), in order to get 3' and 5' primers hybridizing/melting at the same temperature, whatever the length of the oligonucleotides. Affymetrix Genechip probes are 25-mers oligonucleotides, whatever the sequence composition. As the hybridization free energy depends on the probes sequences, their ability to detect the same transcript is not equivalent. Focusing on Affymetrix expression arrays, Carlon E. & Heim T. showed that sequence-dependent assessment of hybridization free energies and probe intensities can be used to predict target concentrations, thanks to the extended Langmuir isotherm.

Results : Latin-square and tissue datasets provided by Affymetrix were used to assess the performances of the differential expression analysis when the concentrations are computed with regards to the hybridization theory. Ignoring background noise, the results obtained outperforms usual intensity-based statistical preprocessing methods, including GC-RMA, when the analysis is performed with variants of the student t-test that stabilizes the variance.

Perspectives : Using an appropriate background correction strategy prior to the estimation of probes concentrations may further increase the performances of microarray data analysis. Using hybridization theory, probe-level estimates of target concentrations should be identically distributed. A probe-level multivariate analysis will be compared to the univariate analysis of usual probe-set summarized expression data.

Poster N36

Using R through a Tiki web interface to implement bioinformatics pipelines

Alex Sanchez Pla Research Institute at Vall d'Hebron University Hospital

Xavier Pedro Puente (Research Institute at Vall d'Hebron University Hospital)

Short Abstract: Data analysis and visualization for the Omic sciences using the free R statistical software has become very popular. However, an easy and general purpose web interface for R scripts is still missing, despite the current efforts to provide some high-tech solutions to this need like the one from Revolution Analytics, or others like RApache or RStudio.

An alternative approach is to use some long standing and general purpose Wiki CMS / Groupware solution such as Tiki (http://tiki.org), with his PluginR companion (http://doc.tiki.org/pluginR), both released with free/libre open source software licenses. This can be used to create simple interfaces allowing to interact safely with custom R programs located in a remote server.

We illustrate the possibilities of this approach with a real case application that can be used to set up the parameters and guide the execution of a standard microarray analysis workflow used in a core facility. This has allowed hiding all the complexity of defining long parameter files using instead simple forms with many pre-defined values, so that a user not willing to enter into the complexities of the workflows can easily define and execute a simple analysis. The system is highly versatile allowing the insertion of custom code for more complex analyses.

References:
De Pedro, X. and Sánchez, A. 2011. "Web 2.0 for R scripts & workflows: Tiki & PluginR". useR, The R User Conference. Coventry, UK.

Poster N37

Combining omics data and biological knowledge using multiple factor analysis

Jose Luis Mosquera Institut de Recerca Vall d'Hebró (VHIR)

Short Abstract: During the last decade, the advent of high throughput technologies, such as DNA microarrays or Next Generation Sequencing has raised many challenges. One of them is how to obtain (the most) biological knowledge from the huge quantity of data generated by high throughput experiments.

It is generally accepted that combining information collected from various sources will provide a better interpretation of the biological underlining processes that data from a single one. Between the many approaches that have been suggested to perform this integration, Multiple Factor Analysis (MFA) has proved to be a very powerful one. MFA is a method that allows to simultaneously combine and represent multiple data sets from different sources and to build on the graphical outputs extra information. The core of this method is to build a common structure based on the Principal Component Analysis (PCA) with supplementary quantitative and qualitative variables. Then, biological knowledge from resources such as the GO, KEGG or others can be used to build gene sets, which are projected on the resulting plots. However, this way to proceed dos not take into account that many resources of biological knowledge have an structure in themselves.
In this work, we present how to apply MFA to integrate multiple sources of biological data such as gene expression, clinical variables, epidemiological data, histological measures and biological knowledge by taking into account the underlying structures of the gene networks. The result provides a representation that is more robust and easier to interpret than previously developed approaches.

Poster N38

Set-wise differential correlation analysis reveals prognosis-related change of pathway genomic loci and expression interaction.

SungBum Cho Center for Genome Science, National Institute of Health, Korea

Short Abstract: Copy number abnormality or aberration (CNA) is a key pathophysiologic mechanism in many types cancer. It causes cancer progression and unfavorable prognosis. The CNA can induce the prognostic change through mRNA expression. Previous studies revealed that some CNAs were related to mRNA expression changes in cancer. In most cases, these studies investigated relationship between single gene or probe-wise copy number and gene expression. In this study, differential coexpression of gene sets (dCoxS) was applied to find relationship between CNA and gene expression set that is defined under biological knowledge. Moreover, non-segmented CAN data was used for analysis. Simulation study results indicated that interaction score(IS), which is a part of dCoxS function, that measures similarity between two sets, outperformed the other metrics in detecting similarity change. Two breast cancer arrayCGH and microarray datasets were analyzed with single probe-wise and set-wise approaches. While the single probe-wise analysis detected no significant results, set-wise method revealed change of similarity between pathway arrayCGH and gene expression profiles between conditions. The results of set-wise method were consistent with previous biological knowledge for breast cancer pathophysiology. In conclusion, set-wise method is useful for exploring significant relationship between copy number change and gene expression profiles.

Poster N39

Possible sources of wrongly performing probes: from oligonucleotides arrays to NGS

Noura Chelbat Bioinformatik Institute, Johannes Kepler University

Sepp Hochreiter (Bioinformatik Institute, Johannes Kepler University, Bioinformatics); Ulrich Bodenhofer (Bioinformatik Institute, Johannes Kepler University, Bioinformatics);

Short Abstract: A number of factors such as composition, number and location of nucleotides within the probes oligomers are directly affecting the binding of probes and target sequence representing undoubted sources of variation of probe sensitivity and performance. Here we present possible sources of bad performing probes as probes annotation and sequence, Alternative CDFs, Chip manufacturing (Photolithography process and Biophysics of GeneChip technology) and physical location of the probes on the array. These causes could be extrapolated from oligonucleotides arrays to high-throughput technologies such as NGS.

Accepted Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Vienna Poster Printing Services
Poster Categories
Search for a Poster

Attention Poster Authors: The ideal poster size should be max. 1.30 m (130 cm) high x 0.90 m (90 cm) wide. Fasteners (Velcro / double sided tape) will be provided at the site, please DO NOT bring tape, tacks or pins. View a diagram of the the poster board here

Posters Display Schedule:

Odd Numbered posters:

Set-up timeframe: Sunday, July 17, 7:30 a.m. - 10:00 a.m.
Author poster presentations: Monday, July 18, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Monday, July 18, 2:30 p.m. - 3:30 p.m.*

Even Numbered posters:

Set-up timeframe: Monday, July 18, 3:30 p.m. - 4:30 p.m.
Author poster presentations: Tuesday, July 19, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Tuesday, July 19, 2:30 p.m. - 4:00 p.m.*

* Posters that are not removed by the designated time may be taken down by the organizers and discarded. Please be sure to remove your poster within the stated timeframe.

Delegate Posters Viewing Schedule

Odd Numbered posters:
On display Sunday, July 17, 10:00 a.m. through Monday, June 18, 2:30 p.m.
Author presentations will take place Monday, July 18: 12:40 p.m.-2:30 p.m.

Even Numbered posters:
On display Monday, July 18, 4:30 p.m. through Tuesday, June 19, 2:30 p.m.
Author presentations will take place Tuesday, July 19: 12:40 p.m.-2:30 p.m

Want to print a poster in Vienna - try these options:

Repacopy- next to the congress venue link [MAP]

Also at Karlsplatz is in the Ring Center, Kärntner Str. 42, link [MAP]

If you need your poster on a thicker material, you may also use a plotter service next to Karlsplatz: http://schiessling.at/portfolio/

View Posters By Category

Search Posters:

↑ TOP