Posters

20th Annual International Conference on
Intelligent Systems for Molecular Biology

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category A - 'Bioinformatics of Disease and Treatment'

A01 - iFad: an integrative factor analysis model for drug-pathway association inference

Short Abstract: Pathway-based approaches for drug discovery consider the therapeutic effects of compounds in the global physiological environment. However, for many compounds, the target pathways and mechanism of action are still unknown. In addition, rationally designed drugs may also have unexpected off-target effects. Therefore, the inference of drug-pathway associations is a crucial step to fully realize the potential of system-based pharmacological research. On the other hand, pathway activities are also reflected by the gene expression levels. We developed a new Bayesian sparse factor analysis model to jointly analyze the paired gene expression and drug sensitivity datasets measured across the same panel of samples. This model enables direct incorporation of prior knowledge regarding gene-pathway and/or drug-pathway associations to aid the discovery of new association relationships. A collapsed Gibbs sampling algorithm was implemented for inference. Satisfactory performance of the proposed factor analysis model has been achieved on both simulated datasets with various patterns, and the real datasets from the NCI-60 cell lines. Our study demonstrates that the combination of pathway analysis, gene expression and drug response is a promising approach for the prediction of drug targets. This model also provides a general statistical framework for pathway-based integrative analysis of multiple types of Omics data.

A02 - Single-cell analysis of Taxol (paclitaxel) resistance in breast cancer

Short Abstract: Breast cancer is the most common cancer in women, but drug resistance remains a widespread problem. Previous studies investigated resistance mechanisms of chemotherapies, but genomic data was gathered from cell populations. We use single-cell analysis to study Taxol (paclitaxel) resistance in breast cancer. We treated breast cancer cell lines (MDA-MB-231) to Taxol, a microtubule-targeting chemotherapy which prevents proper mitotic spindle assembly and arrests the cell in anaphase. We performed whole-cell RNA-Sequencing (RNA-Seq) on individual cells from untreated (n=6), treated (n=6), and survivor (n=5) breast cancer cell line populations via Illumina Hi-Seq 2000, then mapped these reads using Bowtie. We performed differential expression and single nucleotide polymorphism (SNP) analyses using a variety of open-source packages including BEDtools, SAMtools, HTSeq, DESeq, NOISeq, and VCFtools. Preliminary SNP analysis has shown that untreated cells have fewer SNPs in common with each other than the treated cells, indicating that the treated cells have a more chaotic genomic state. Initial differential analysis showed that after accounting for cell-cycle differences, individual cells vary greatly in expression. We also found that on a group-level, the treated cells had lower overall coverage on average, which is consistent with the stress imposed on these cells. More concrete results will be presented at ISMB.

A03 - Tracking Antiviral Responses Following Infection With Lassa Fever Virus

Short Abstract: Lassa fever is severe hemorrhagic fever caused by the Lassa virus. Its symptoms are often confused with those of other infections, such as Influenza, which can delay proper care. There are currently no effective therapies that can be offered to individuals infected with Lassa. We were interested in determining whether analyzing the circulating immune system would allow us to identify “markers” of Lassa virus infection. To test this hypothesis, we investigated the gene expression patterns of cells that were extracted from Lassa infected animals over the course of infection. We used Agilent microarrays to examine the gene expression levels in peripheral blood mononuclear cells extracted from non-human primates at different stages of infection. PBMCs perform an important role in the immune response to infection and are easily extracted from human blood. This makes them suitable for use as a diagnostic tool, even in remote areas where necessary expertise and modern medical facilities are scarce. We used a linear regression model to estimate the significance of the fold expression changes between the reference pre-infection samples and those obtained during early, middle and late stages of infection. This gene set was further filtered to only include those genes whose expression appeared sustained during the intermediate stages of infection, discarding those genes that had either small or transitory changes in expression. From this analysis we have identified a set of genes whose expression profiles in blood cells appear to be specific to Lassa infection. We are currently carrying out experimental validation of their suitability as biomarkers of early-infection.

A04 - Identification of liver specific genes using next generation sequencing technology

Short Abstract: Introduction: The next generation sequencing technology offers an entirely new perspective for clinical research and will speed up personalized medicine. In this work this new technology was used to identify a gene profile highly specific for healthy liver tissues, which could deal as a diagnostic tool to distinguish between a healthy or diseased liver.Material and Methods: Initially next generation sequencing data were downloaded from EMBL (ENA ERP000257). This genome-wide expression compendium originates from eleven, healthy, human tissue samples pooled from multiple donors. The expression level were estimated by mapping and counting reads to single gene sequences derived from the UCSC genome browser. To make the expression levels comparable across tissues the RPKM normalization was applied. Finally, genes were defined as liver specific if its liver RPKM is equal or greater than 10 and the reference tissues RPKMs are less or equal than 2.Results: A profile of 98 genes was identified, which meet the criteria for a specific liver gene. With 518 RPKMs the gene complement component 9 ranks on top of the list. Investigation on pathway and functional levels revealed significant associations to lipid metabolism, molecular transport and coagulation systems.Conclusion: Next generation sequencing identified a profile of 98 genes highly specific to healthy liver tissues, which shows significant associations to normal liver functions. Thereby this gene profile demonstrates the state of a healthy liver and thus, upon further validation, this gene profile could deal as a robust diagnostic tool to test for normal or pathological liver behavior.

A05 - Genomic signatures characterize leukocyte infiltration in myositis muscles

Short Abstract: Leukocyte infiltration plays an important role in the pathogenesis and progression of myositis, and is highly associated with disease severity, though cell type heterogeneity within patient specimens from muscle compounds expression patterns generated from transcriptional profiling. We developed an invasion model to address this heterogeneity and further quantify the inflammatory cell infiltration in myositis. Muscle biopsies from 31 myositis patients and 5 normal healthy donors were profiled by microarray in parallel with microRNA (miRNA) expression analyses. Four primary gene signatures, including a leukocyte index, a type 1 interferon (IFN), an MHC-1 and an immunoglobulin composite, were developed to characterize myositis patients at the molecular level. Among these signatures, the leukocyte index was verified by pathological assessment of immune cell infiltration and was further utilized to evaluate expression changes of transcripts due to leukocyte infiltration in myositis muscle biopsies. The ability to distinguish different sources of altered gene expression in heterogeneous tissues increased our understanding of the complex interactions crucial to the pathogenesis of myositis. One application of the leukocyte index comparing miRNA and mRNA expression profiles revealed a complex interaction between miR-146a expression and the regulation of the type 1 IFN pathway in dermatomyositis. Collectively, the distinct miRNA and mRNA signatures identified in this study may contribute to the development of new therapeutic targets and provide utility as molecular biomarkers for characterizing inflammatory myopathies.

A06 - New Flexible Minimum-Spanning Tree (MST)-Based Non-Parametric Multivariate Tests for the Analyses of Differentially Expressed Gene Sets

Short Abstract: Motivation. The analysis of differentially expressed gene sets (pathways) became a routine in the analyses of gene expression data (microarrays and RNA-Seq). There is a multitude of tests available, ranging from aggregation tests that summarize gene-level statistics for a gene set characterization to true multivariate tests, such as Hotelling’s T2 and N statistics, accounting for the correlation structure between genes. However, non-parametric multivariate distribution-free tests that do not rely on assumptions (and, typically, expression data do not conform to these assumptions), were never considered in this context.

Results. Here we compare the power and Type I error rates of minimum-spanning tree (MST)-based non-parametric multivariate tests and several multivariate and aggregation tests frequently used for pathways analyses. In a simulation study we demonstrate that non-parametric multivariate tests have power, in many setting comparable to the power of conventional approaches, but outperform them in specific region of a parameter space, where the setting mimics the frequent biases in real biological data sets. In addition, the MST structure allows a great flexibility: MST tree can be constructed using any dissimilarity measure, the data points can be ordinal or continuous, and subtrees ranking allows finding subtle specific difference (e.g in overall variance) between two groups of genes. For all tests considered we analyze their robustness and specificity on a large set of solid cancer types (colon, kidney, liver, lung and pancreas) expression arrays. We demonstrate that robustness and specificity are important test-specific characteristics.

Availability. R code is available from the first author upon request.

A07 - Complex-disease networks of trait-associated single-nucleotide polymorphisms unveiled by information theory

Short Abstract: Thousands of complex-disease single-nucleotides (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between and across complex clinical traits. We hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex disease traits and offer insights for drug repositioning. We developed a novel method to quantify trait-trait similarity anchored in Gene Ontology annotations of human proteins and information theory using trait-to-polymorphism (SNPs) associations confirmed in GWAS. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1,438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and two additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2x10-16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher’s exact test p=0.001 and 3.5x10-7, respectively). In summary, we report an imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs. In addition, we demonstrated that small shortest path of protein interactions correlates with complex disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single gene and subjective canonical pathway approaches.

A08 - Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data

Short Abstract: Selecting a small number of signature genes for accurate classification of samples is essential for the development of diagnostic tests. However, many genes are highly correlated in gene expression data, and hence, many possible sets of genes are potential classifiers. We present a method that uses expert knowledge and predicted functional relationships to guide our search for signature genes. We aim to select robust signature genes that are both predictive and biologically relevant to the mechanism underlying the disease of interest.

There are no established molecular predictors of transplant outcome in chronic myeloid leukemia (CML) and treatment outcomes are poor in advanced CML. Applying our integrated method to a gene expression data studying the progression of CML, we identified small sets of signature genes that are highly predictive of disease phases and that are more robust and stable than using expression data alone. The accuracy of our algorithm was evaluated using cross validation on the gene expression data. We subsequently profiled selected signature gene sets from ~200 independent patient samples using quantitative PCR (qPCR) and showed that our signature genes are predictive of CML phase progression in this independent test set. Lastly, because therapy response is so dependent on CML phase, we hypothesized that gene sets associated with advanced phase CML would predict relapse after allogeneic transplantation. We showed that our gene signatures of advanced phase CML are predictive of relapse even after adjustment for known risk factors associated with transplant outcomes.

A09 - Computational Workflow for Epitope Prediction in L. braziliensis for Vaccine Development against Canine Visceral Leishmaniasis

Short Abstract: Vaccine development against parasite diseases has been a challenge for researchers for many decades. In the last years, new tools were developded based on genomic advances, including epitope prediction by computational methods. Besides, epitope prediction has the potential for vaccine development with less time and cost, and the possibility of testing every proteins of a parasite for its ability of eliciting a protective immune response.
Leishmania spp is the causative agent of visceral leishmaniasis, an important zoonose in which dogs are implicated as reservoir of the parasite. In this work, we developed a structural database approach to identify epitopes that could be experimentally tested for vaccine development. Predictions from nine algorithms including MHCI, MHCII and B–Cell epitope predictors jointly with protein subcellular localization prediction and protein-protein interaction network are integrated in a relational database, which made it possible to manipulate and extract the relationships between entity classes.
Results show that 7 proteins from L. braziliensis have characteristics to be good vaccine candidates, that is: peptides that have affinity binding with 19 MHCI alleles and 15 MHCII alleles (human and mouse); peptides predicted to bind immunoglobulins; proteins located on plasmatic membrane or secreted; and more than 15 interactions in the network. From these 7 proteins, one is a synthetase, one is a regulatory protein, two are ATPases, two are conserved hypothetical proteins and the last one is a monooxygenase-like protein.
We are now expressing proteins selected to make experimental tests to confirm its immunogenicity.
FINANTIAL SUPPORT: FAPEMIG, CAPES, FIOCRUZ, CNPq, UFOP

A10 - Computationally repurposing drugs for lung cancer: new candidate therapeutics from an integrative meta-analysis of cancer gene signatures and chemogenomic data

Short Abstract: Background
Using gene signatures to computationally repurpose FDA-approved drugs can accelerate the development of new therapeutics. Though existing methods for signature-based repurposing are based on the analysis of individual signatures, for many diseases dozens of gene signatures are in the public domain. We develop CMapBatch to exploit these data. CMapBatch is a computational meta-analysis pipeline that takes as input a collection of gene signatures of disease and outputs a list of drugs predicted that consistently reverse pathological gene changes back to normal across signatures. We apply CMapBatch to identify new therapeutics for lung cancer.

Results
We applied CMapBatch to a collection of 21 gene expression signatures of lung cancer. We compared CMapBatch drug candidates to those of previous methods, and found that CMapBatch produces more stable sets of drug candidates when gene signature size is varied or when different subsets of gene signatures are used as input. Our meta-analysis revealed that 247 drugs consistently reversed lung cancer gene changes across these 21 signatures. In silico validation on the NCI-60 collection showed that drug candidates inhibit growth in lung cancer cell lines. Common protein targets of drug candidates included CALM1 and PLA2G4A. We characterized these drugs’ chemical properties and drug-target network, and applied multiple criteria to rank them in terms of therapeutic promise.

Conclusions
CMapBatch can improve signature-based drug repurposing by leveraging the large number of disease signatures; we have made this method publicly available at ophid.utoronto.ca/cmapbatch. We applied CMapBatch to identify a prioritized list of candidate drugs for lung cancer.

A11 - Data Based Prediction of Tumor Diagnoses and the Role of Tumor Markers

Short Abstract: We have used machine learning approaches for identifying classification models that enable the estimation of tumor marker values on the basis of standard blood parameters; estimators for tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified using genetic programming as well as support vector machines, artificial neural networks, k-nearest-neighbor classifiers, and linear regression optimized using genetic algorithms.
Based on these models as well as measured tumor markers and standard blood parameters, classifiers have been trained for predicting tumor diagnoses for breast cancer, melanoma, and respiratory system cancer; 81.50 %, 74.19 %, and 90.67% of the analyzed samples were correctly classified, respectively, using 5-fold cross validation.
For calculating the relevance of blood parameters, virtual tumor markers, and measured tumor markers we analyze their role and importance in learning tumor diagnosis models with high prediction qualities. As we use evolutionary algorithms for optimizing feature selection and model structures, the importance of variables can be measured considering how frequently they are selected by the evolutionary processes. Evaluation based variable impact analysis methods are also used: The deterioration of the performance of the identified models after removing the information stored in variables defines the importance of these variables with respect to the models.
Data of 20,819 patients (stored in 48,580 measurement samples) collected at General Hospital Linz (2005-2008) have been used as basis for this research work; all machine learning methods have been implemented using HeuristicLab, an open source framework for heuristic optimization (http://dev.heuristiclab.com/).

A12 - Discovery of a phospho-signature predicting response to dasatinib in non-small cell lung cancer

Short Abstract: Targeted drugs are less toxic than traditional chemotherapeutics; however, the proportion of patients that benefit from these drugs is often smaller. A marker that confidently predicts the patient’s response to a specific therapy would allow an individual therapy selection most likely to benefit the patient. Measuring protein phosphorylation levels enables the monitoring of over-expression and repression of disease-specific signaling pathways. Recent advances in mass spectrometry(MS)-based proteomics allow for the quantitative analysis of phosphorylation events in a global and unbiased manner.

We sought to identify a signature of phosphorylation sites that predicts the response to dasatinib, a tyrosine kinase inhibitor, in non-small cell lung cancer (NSCLC) cell lines. Therefore, we developed a cross validation workflow that includes normal distribution-based data imputation and robust ensemble feature selection based on the Wilcoxon rank-sum test. A linear support vector machine (SVM) was used as predictor.

From the phosphoproteome profiles, we identified a signature of twelve phosphorylation sites that can accurately predict dasatinib sensitivity. The achieved prediction accuracy in cross validation was 94%. Four of the phosphorylation sites belong to integrin beta 4, a protein that mediates cell-matrix or cell-cell adhesion. We investigated the robustness of the selected predictive features, and confirmed the predictive power of the signature in an independent set of breast cancer cell lines.

We demonstrated that identifying a predictive signature from global, quantitative phosphoproteomic data is possible, and might open a new path to discovering molecular markers for response prediction.

A13 - The Genetic Basis of Isolated Tetralogy of Fallot

Short Abstract: Tetralogy of Fallot (TOF) accounts for up to 10% of all congenital heart disease (CHD), which are the most common birth defect in human. CHD are most likely caused by a panel of genetic variations with each effecting protein function or expression only modestly and manifest as disease only when combined with additional genetic, epigenetic or environmental alterations. In the past, the discovery of oligo- or multigentic disorders has been less amenable to conventional genetic techniques.
To identify the genetic basis of isolated TOF we performed a multilevel study comprising targeted resequencing of over 1,000 genes and microRNAs in TOF cases, parents and controls as well as whole transcriptome (mRNA-seq) and miRNome (microRNA-seq) analysis in TOF cases and healthy unaffected individuals using next-generation sequencing techniques (87 samples). Genes were assessed according to the presence of deleterious variations and their rate of mutation in TOF subjects compared to healthy controls (200 cases).
We identified an oligogenic architecture consisting of at least 28 TOF genes which discriminate TOF cases from controls. On average, TOF subjects show a combination of novel and inherited variations in five genes. Neither microRNA sequence alterations nor differential splicing show a significant impact on the disease. In summary, isolated TOF is a genetic disorder involving multiple genes. Although subjects show a range of individual mutations, the phenotypic outcome is similar because TOF genes show common patterns of functional interactions. The computational approach developed within this study opens a new perspective for the analysis of oligo- or multigentic disorders in general.

A14 - An online game for improving human phenotype predictions

Short Abstract: An important goal for biomedical research is to produce genetic and genomic predictors for human phenotypes (i.e. disease prognosis, diagnosis, drug response, etc.). Researchers can now quantify a large and rapidly expanding number of potential biomarkers for any given sample. In fact, a single biological sample could reasonably be described by millions of molecular variations in DNA, RNA, proteins, and metabolites. However, this breadth of measurable variables is typically accompanied by a dearth of samples, and attempts to use this kind of data to build phenotype predictors often face problems of multiple testing and overfitting.

Biological insights can be used to improve the results generated from purely data-driven statistical approaches to predictor inference, but they require extensive, high-level human labor that is often difficult to acquire. Recently, several online games have shown the potential to harness huge amounts of human effort to help solve difficult biological problems including: protein folding (Foldit), RNA structure design (EteRNA), and multiple sequence alignment (Phylo).

In this poster, we will present a novel online game that will enable players to contribute to the development of human phenotype predictors. The poster will provide details about the design of the game as well as preliminary results from a proof-of-concept experiment. In addition, the game prototype will run live during the conference allowing attendees to both contribute their knowledge and provide the developers with important feedback.

A15 - The Human Imprintome Content in RNA Genes Comprises 15 Non-Coding Macro RNAs and 14 Small or Micro RNAs

Short Abstract: Human imprinted genes comprise a relatively small number of genes (around 64) marked with their parental origin in a process termed genomic imprinting. The importance of genomic imprinting goes far beyond its critical relevance as a mechanism to balance parental resource allocation, playing important roles in mammalian development and growth, as well as in many major human disorders including cancer. To obtain insight into the human imprintome (the great territory in which imprinted genes, DMRs and ICRs are protagonists), we have analyzed its content on RNA because compared to imprinted protein-coding genes, imprinted ncRNAs show different imprinting features and are more responsible than imprinted protein-coding genes for the mechanism of genomic imprinting. It is imprinted ncRNAs, rather than protein-coding genes, that coexist with large imprinted regions and may contribute to evolution and regulation of genomic imprinting. Experimental designs, data and procedures used here were exclusively in silico and in agreement with open data access guidelines. We have analyzed lncRNAS towards their complete characterization within the human imprintome because, since they can mediate epigenetic silencing of a chromosomal domain in trans, several important implications have arisen for dissecting their functional roles. We, then, provide a list of 15 imprinted lncRNAs (besides 14 small/micro RNAs, Tables and Figures), an impressive RNA content within the currently known human imprintome. Among these 15 imprinted lncRNAs are Air, HYMAI and HOTAIR, the latter which is the first example of an RNA expressed on one chromosome influencing transcription on another chromosome.

A16 - Computational Approaches and Tools for Infectious Disease and Network-based Bioinformatics

Short Abstract: We report recent improvements to a collection of software tools that aid comparative and network-based study of pathogens and their hosts. To facilitate systems-level analysis of the host response to infection, we’re expanding the capabilities of InnateDB (www.innatedb.com), a database and analysis platform of all available human, mouse and bovine interactome data, integrated with additional manually-curated, experimentally-verified interactions involved in innate immunity (>17,000 interactions curated from >4,000 papers to date), plus tools to facilitate network or pathway-based analyses. For pathogen analysis we are expanding the capabilities of IslandViewer (www.pathogenomics.sfu.ca/islandviewer) for identification of genomic islands involved in pathogenicity and developing an OrtholugeDB for more robust comparative analyses across species. We have expanded the curated www.pseudomonas.com database to further facilitate viewing of RNA-seq, SNP, and comparative/ortholog analysis, plus incomplete genome data. We have further developed an analysis of pathogen-specific genes, to identify candidate anti-infective drug targets and drugs which we are studying further in the laboratory. By better understanding the complex interplay of factors that influence both pathogen and host during the infection process, plus identifying anti-infective drugs that disarm a pathogen (vs killing them, selecting less for antibiotic resistance), we aim to improve upon current approaches for infectious disease control.

A17 - A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data

Short Abstract: The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges. The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size (O(n)). Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes. We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data.

A18 - Discovering the relationship between drug's side effects and target proteins based on protein-protein interactions

Short Abstract: In drug discovery and development process, reducing drugs' side effect is an important issue because drug's side effect is a crucial factor to receive FDA approval. However, it is still unknown for many drugs which proteins are involved in the occurrence of side effects. Recently, a large scale of drug-target interactions and drug-side effects databases have been constructed, promoting studies about drug's side effects.
In this study, we investigate the relationship between drug's side effects and target proteins. To discover the relationship, we apply a network analysis approach by integrating a drug-side effect network, a drug-target interaction network, and protein-protein interactions (PPIs). We developed perturbation scores of proteins that are based on the distance between proteins and side effect causing drugs in the multi-level network. Evaluation is conducted using Michigan Molecular Interactions (MiMI) as a gold standard data set. To show contribution of PPIs, we compared our proposed approach with the case when PPIs is removed from the network. Finally, we found literature evidences for high-scored candidates from PubMed.
Our result showed that the multi-level network is a useful approach to find relationships between drug's side effects and target proteins, and it gives a chance to improve drug discovery process.

A19 - Discovery: a resource for the rational selection of drug target proteins and leads for the malaria parasite, Plasmodium falciparum

Short Abstract: Few rational approaches have been successfully followed in the selection of promising drug target proteins in the malaria parasite. The emergence of widespread drug resistance, even against current drugs is making the effective selection of new drug targets together with lead compounds essential and urgent, requiring optimal approaches to be put in place for this process.

The Discovery project is aimed at providing a publicly available informatics resource where comprehensive information on the parasite and host proteins are stored, together with the results from relevant 3rd-party investigations as well as results from our own high-throughput analysis. The system is developed in Java using NetBeans and Hibernate to connect to a MySQL database. The comprehensive data included in the resource is aimed as wide as possible, including protein, gene-ontology, orthology, metabolic, structural, expression and chemoinformatics information. This is combined with a data-mining interface for researchers to perform the selection of putative drug target protein and lead compounds according to their specific highly-flexible criteria.

Protein information includes data from the human, mosquito and the various malaria genome projects. Additionally, data from the drug trials database relevant to malaria is associated with compounds in the Discovery database. Chemical information is from the ChEMBL database. Information includes basic annotations, motifs, domains, binding sites, structural features, orthology information, ontology terms, protein-ligand interactions and comparative genomics information. Chemical information includes protein interactions and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties.

A20 - Identification of Cancer Specific Biomarkers using the Microarray Gene Expression Profile

Short Abstract: Carcinogenesis involves complex biological processes. In these processes, the cancer cells change their gene expression patterns. The genes, which show expression patterns that are specific for the different cancers, are used as cancer bio-markers. Gene expression profile meta-analysis is an effective method for analyzing gene expression dataset in order to predict cancer specific gene expression pattern. In our research, by using cancer gene expression data from TCGA and GEO, differentially expressed genes (DEGs) were selected from the individual cancer gene expression profiles by the Significance Analysis of Microarrays. Out of the selected genes from DEGs of 8 different cancers, genes unique for each cancer were chosen. An Area Under the Curve (AUC) score was calculated by using LibSVM classification methods for selected genes. 37 of these cancer-specific genes were filtered out through the survival test, which measures if genes were associated with survival. 3 cancers did not reveal any survival-associated genes, therefore the further analysis was performed with the remaining 5 cancers. We validated out results by comparison with the cancer gene expression profiles from GEO. We conclude that these 37 genes may be used as cancer bio markers.

A21 - Wavelet-based identification of DNA genomic aberrations from next generation sequencing data

Short Abstract: Copy number aberrations (CNAs) play important roles in cancer development, but a wide range of aberration makes it hard to distinguish key driving genes from the other neighboring genes. With the high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, several methods were developed to identify cancer-driving genes. For instance, our previous work, the wavelet-based identification of focal genomic aberrations (WIFA), was successfully applied to SNP microarrays from glioblastoma (GBM) and lung cancer patients. It integrated DNA aberration regions from multiple samples and detected focal aberrations consistent across multiple samples with high accuracy.

Since next generation sequencing data from several cancer data sets are currently available, there is a growing chance that more accurate focal aberration regions might be detected. Therefore, we applied our WIFA approach to next generation sequencing data from GBM patients, and compared with identified regions from SNP microarray data. Our work show that consistently aberrant regions and differently identified regions in two different data platforms.

A22 - A Metagenomic Study of Diet-Dependent Interaction Between Gut Microflora and Host in Infants

Short Abstract: Gut flora species and functional composition strongly affect the health and well-being of the host. With the advent of genomic-based personalized medicine, it is important to develop a synthetic approach to study the host transcriptome and the microbiome simultaneously. Early microbial colonization in infants is critically important for directing neonatal intestinal and immune development, and is especially attractive for studying the development of human-commensal interactions. Here we report the results from a study of the gut microbiome and host epithelial transcriptome of three month old exclusively breast-fed and formula-fed infants. Both host mRNA expression and the microbiome phylogenetic and functional profiles provided strong feature sets that distinctly classified the formula-fed from the breast-fed infants. To determine the relationship between host epithelial cell gene expression and the bacterial metagenomic-based profiles, the host transcriptome and functionally profiled microbiome data were subjected to novel multivariate statistical analyses. Gut microbiota metagenome virulence characteristics strongly differed between the formula-fed and the breast-fed infants, while concurrently immunity/mucosal related gene expression in epithelial cells differed as well. Our data provide insight into the integrated responses of the host and microbiome to dietary substrates in the early neonatal period. We demonstrate that differences in diet can affect, via gut colonization, both infant gut development and the innate immune system. Furthermore, the methodology presented in this study can be easily adapted to assess other host-commensal and host-pathogen interactions using genomic and transcriptomic data, providing a synthetic genomics-based picture of host-commensal relationships.

A23 - Prioritizing disrupted pathways inherent to congenital heart diseases through integration of cardiogenic Transcriptome and disease-centric Interactome

Short Abstract: Mammalian heart development is highly conserved and mutations or functional perturbation in its critical control genes cause a wide spectrum of congenital heart diseases (CHD), the most common type of birth defects and the leading cause of birth defect related deaths. Investigation of the gene expression profiles and their functional association networks in cardiogenesis is crucial for defining the causes of CHD. In this study, a time-course microarray experiment was designed to include 9 time-points from early embryonic stages to adult to study natural cardiogenesis using a temporally regulated mouse model. Differential expression patterns were identified and the expression profiles gauged according to disease-related genes. The gene-interaction information was obtained from Interactome database for a list of 36 CHD genes as well as for 237 disease-related genes resulting from the top 10 genes interacting with CHD genes. Functional association networks were built based on expression profiles (Transcriptome) and interaction information (Interactome) for these prioritized genes. Enriched pathways affected by CHD genes were investigated to reveal embryonic origins for the molecular mechanisms of CHD. Our results highlight disturbance of development pathways such as regulation of epithelial-to-mesenchymal transition, BMP signaling, NF-AT signaling, and Notch signaling as conserved pathways across the spectrum of congenital heart diseases. Our innovative strategy that combines embryology and time-course dependent bioinformatics provides a pipeline to integrate different types of data to capture the common disease associated networks and may lead to the application of novel pathway-guided therapeutic interventions for the management of congenital heart diseases.

A24 - Efficient assessment of the effects of sequence variation on RNA secondary structure

Short Abstract: Structural characteristics are essential for the functioning of many non-coding RNAs and cis-regulatory elements of mRNAs. Single Nucleotide Polymorphisms(SNPs) may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. So far, global comparisons of the structural ensembles have been used. Major drawbacks of this approach are the substantial computational resources and the limited accuracy of the folding programs when larger RNAs are considered.

In this study, we discuss several related measures of structural disruption, including the previously used Pearson correlation coefficient (Halvorsen et al., 2010). With the implementation of selective measures we present a method to compare the local folding patterns that can be helpful for genome-wise investigation of SNP effects on RNA structure.

We tested the efficiency of the method with know SNPs whose structural change has been studied experimentally. We performed a large scale analysis with all human mRNA transcripts downloaded from Ensemble, and searched for the individual effect of all possible SNPs in the UTR region. Of these, 10% of them were predicted to cause significant local structural change (p-value < 0.05) in the UTR regions. These significant variants overlap with 7078 (out of 264,669) common variants available in dbSNP (Sherry et al., 2001) and 78 (out of 1658) disease-associated variants available in GWASdb(Li et al., 2011). Some of the diseases related to the significant variants are Coronary heart disease, Parkinson's disease, Lung cancer and Rheumatoid Arthritis.

A25 - Novel mutations in the glycine receptor β subunit gene (GLRB) in startle disease

Short Abstract: Startle disease is a rare, but potentially fatal, neuromotor disorder characterized by noise- or touch-induced non-epileptic seizures. Molecular genetic studies have revealed that startle disease is primarily caused by inherited mutations in the genes encoding the postsynaptic glycine receptor (GlyR) or presynaptic glycine transporter GlyT2. Although mutations in the GlyR α1 subunit gene appear to be a major cause of this disorder, there are few individuals with mutations in the GlyR β subunit gene (GLRB).

DNA sequencing of GLRB in affected individuals revealed two new missense mutations – resulting in L285R and W310C substitutions within the membrane-spanning domains. Our aim was to assess the structural and functional consequences of substitutions using molecular modelling, high-content yellow fluorescent protein Cl- sensors and patch-clamp electrophysiology. We found that the mutation L285R results in the insertion of a positively-charged side chain into the Cl- channel lumen, has no apparent effect on glycine sensitivity, but reduces assembly/cell-surface trafficking of α1β GlyRs. Patch-clamp electrophysiology reveals that Cl- influx can occur in cells expressing α1βL285R GlyRs in the absence of agonist. By contrast, the W310C substitution is predicted to interfere with hydrophobic side chain stacking, has no apparent effect on glycine sensitivity and reduces assembly/cell-surface trafficking of α1β GlyRs.

Until recently, the GlyR β subunit was assumed to play a purely structural role, mediating interactions of GlyRs with the synaptic clustering molecule gephyrin. However, our study demonstrates that the GlyR β subunit influences the correct assembly and trafficking of GlyRs and is an important determinant of ion-channel conductance.

A26 - A novel computational approach to construct a miRNA-TF regulatory network for cancer- glioblastoma as a case

Short Abstract: Recent studies revealed that patterns of microRNA (miRNA) expression in cancer tissues are different from those in normal tissues, suggesting that miRNAs play critical roles in the pathogenesis of cancer. However, little is yet known about which miRNAs play central roles in the pathology of cancer and their regulatory mechanisms. To address this issue, we utilized glioblastoma (GBM) as an example to explore the main regulation format (feed-forward loops, FFLs) consisting of miRNAs, transcription factors (TFs) and their impacting GBM-related genes, and developed a computational approach to construct a miRNA-TF regulatory network. We compiled GBM-related miRNAs, GBM-related genes, and known human TFs and then identified 1,128 3-node FFLs and 805 4-node FFLs with statistical significance. By merging these FFLs together, we constructed a GBM-specific miRNA-TF mediated regulatory network. To illustrate that the GBM-specific regulatory network is promising for identification of critical miRNA components, we examined a Notch signaling pathway subnetwork. Our topological and functional analyses of the subnetwork revealed that six miRNAs (miR-124, miR-137, miR-219-5p, miR-34a, miR-9, and miR-92b) might play important roles in GBM, including some results that are supported by previous studies. The observation of critical miRNAs in the Notch signaling pathway, with partial verification from previous studies, demonstrates that our network-based approach is promising for the identification of new and important miRNAs in GBM and, potentially, other cancers.

A27 - Identification of genetic changes under selection for drug resistance in breast cancer cell lines

Short Abstract: Treatment of cancer often involves the use of chemotherapeutic agents that preferentially target tumor cells, but often leads to aquired drug resistance. Understanding this process will require the identification of reliable predictive biomarkers for each drug. Currently, we are developing a framework for systematic biomarker discovery by using a combination of gene expression and CGH arrays to keep track of consistent changes that take place during resistance acquisition in cell lines towards two anti-cancer drugs: doxorubicin and paclitaxel. By monitoring changes at two different levels (DNA and RNA) of the genome and analyzing multiple cell lines developing resistance against the same drug under identical conditions, we were able to separate relevant changes from spurious ones and thus reduce the noise of the experimental system. Doxorubicin is an anthracycline that exerts its anticancer effect through intercalation into DNA and inhibition of topoisomerase II, whereas paclitaxel stabilizes microtubules and disrupts the mitotic spindle. We use expression and copy number data from two cell lines, MDA-231 and MCF-7, that were grown in the presence of doxorubicin (n=16), paclitaxel (n=13), or vehicle control (n=3). We observed a distinct pattern of chemotherapy-induced genomic changes - small gains and losses that appear to target particular genes. The differentially expressed genes from target loci are validated on external patient cohorts. The most promising ones will be chosen for functional validation by RNAi knock down. Successful validation will improve understanding of drug resistance mechanisms, suggest future drug targets, and enable more efficacious treatment of cancer patients.

A28 - Critical assessment of candidate gene prioritization methods

Short Abstract: Motivation: Gene prioritization aims at identifying the most promising
candidate genes among a large pool of candidates—so as to maximize the yield and biological relevance of further downstream validation experiments and functional studies. During the past few
years, several gene prioritization methods have been defined and some of them have been implemented and made available through freely available web tools. In this study, we aim at comparing the predictive performance of eight publicly available prioritization methods on novel data. We have performed an analysis in which 42 recently reported disease gene associations from literature are used to benchmark these tools before the underlying databases are updated. Our
approach mimics a novel discovery, and therefore the estimation of the performance is more realistic than when benchmarking through cross-validation on retrospective data.
Results: Our benchmark indicates that although the observed performance
is slightly lower than for benchmarks on retrospective data, several methods can still efficiently identify the novel disease genes. There are however marked differences, and methods that rely on more advanced data integration schemes appear more powerful.

A29 - A new computational model using multi-objective optimization to identify activated cancer signaling networks from heterogeneous genomic data

Short Abstract: This poster is based on Proceedings Submission of paper 13.

Motivation: The Cancer Genome Atlas (TCGA) has employed a variety of platforms to address the genomic abnormalities of cancer. However, it remains unclear how to make use of these multiple types of genomic abnormalities to identify pathway activation for cancer. In this article, we propose a multi-objective optimization (MOO) model to elucidate the activation of cancer signaling using these abnormalities.
Methods: We developed an integrative computational model, CSB-MOO (Cancer Signaling Bridge–Multi-Objective Optimization), that uses a new type of network elements, CSBs, to identify candidate signaling networks for a specific cancer and the MOO method to address the activated signaling networks from the identified candi-date signaling networks using the heterogeneous types of genomic data. For any candidate-signaling sub-network, the association be-tween its activation and one type of the genomic data can be char-acterized by one objective of MOO. The CSB-MOO integrates the heterogeneous genomic data by using multiple objectives.
Results: The CSB-MOO model was applied to heterogeneous ge-nomic data for Glioblastoma (GBM) patients with and without tumor recurrence. The model identifies the activated signaling molecules that can differentiate the GBM patients with tumor recurrence from those without recurrence reliably and predict the patient survival duration significantly. Specifically, the model identifies three im-portant mutated genes for GBM, EGFR, TP53, and PTEN, as well as additional activation of PI3K and CALM signaling uniquely for the tumor-recurrent patients, which may explain the tumor recurrence after surgery in these patients.

A30 - Pathway analysis on differentially expressed genes linked to sporadic Amyotrophic Lateral Sclerosis

Short Abstract: Amyotrophic Lateral Sclerosis (ALS) is a heterogeneous, complex motor neuron disease whose etiology is poorly understood. Peripheral blood lymphocytes (PBLs) and microglia are thought to be involved in the pathogenesis of ALS. However, it is unclear whether PBLs are directly affected by the causative agent(s) or not. PBL subsets might respond to signals from decaying cells. These signals might provide clues to the original source of the pathology and constitute the basis for diagnosis of ALS. A recent study extended the number of common, significantly modulated genes found in peripheral blood cells. Thus, we examined the molecular pathways connecting those genes, including common regulatory elements.
Despite increasing evidence that peripheral blood can yield informative biomarkers for neurological diseases, very few studies have used blood samples from ALS patients. The initial stage of this work compared microarray data of PBLs from sporadic ALS (sALS) patients to those from healthy controls to determine differentially expressed genes. Using those genes as input, we identified significant pathways, biological interactions and transcription factors networks from the GeneGo maps and MetaCore. Many of the upregulated genes are engaged in normal cell processes that are probably heightened because of increased cell turnover, activation of the immune response, apoptosis and responses to chemical stimulus in sALS. Validation through comparison with other neurodegenerative diseases and longitudinal studies should help determine whether these genes provide a good panel of biomarker candidates to determine diagnosis, rate of progression or response to drug in sALS.

A31 - Network-based Survival Analysis on Ovarian Cancer

Short Abstract: This poster is based on Proceedings Submission 183. Survival analysis is commonly used to predict the time to an event of interest and identify relevant features in cancer genomics studies. Existing survival models suffer from the high-dimensionality and strong dependence in genomic features, and often lead to inconsistent relevant features across independent datasets for similar studies. We investigate a network-based cox proportional hazards model called Net-Cox to cope with the dependence and high-dimensionality by exploring the structural relation among the genomic features in a network. In this study, we focused on studying the survival and recurrence in ovarian carcinoma since there is no available molecular signatures that can predict the events. We applied Net-Cox on three independent ovarian cancer gene expression datasets including the TCGA ovarian cancer dataset that only became available recently. In the analysis of the three ovarian cancer datasets, Net-Cox with the network information from gene co-expression or known gene relations can significantly improve the accuracy of survival prediction over the Cox model in cross-validation on the same dataset or across the three datasets. Net-Cox also identified more consistent relevant genes across the three independent datasets and in addition, the top-ranked genes compose dense protein-protein interaction sub-networks and enrich known cancer pathways. Our literature survey confirmed 16 signature genes with supporting evidences. We further validated with tumor array in an independent patient cohort from Mayo Clinics that FBN1, the gene ranked highest by Net-Cox, is a signature that could predict recurrence after 12 months of treatment. Availability:http://compbio.cs.umn.edu/Net_Cox/.

A32 - is-rINDEL: in silico regulatory INDEL discovery

Short Abstract: This poster is based on Proceedings Submission 250.

Small insertions and deletions (INDELs) account for a significant amount of variation between human individuals. Genome-wide association studies and large scale genome sequencing projects such as the International Cancer Genome Consortium and The Cancer Genome Atlas frequently identify small INDELs which are associated with disease. Many of these INDELs may be the causal variation contributing towards disease susceptibility. Few in silico tools exist which assist in determining the potential functional impact of such INDELs. To date, these tools have mainly been limited to identifying small INDELs which cause frame shift mutations. Small INDELs, however, can impact on normal genomic function in a variety of ways, including disrupting the binding of transcription factors (TFs). We have therefore developed a tool which aims to identify small INDELs which impact on the binding of a TF to DNA.

We assessed the performance of our algorithm, is- rINDEL, using two datasets: a collection of four experimentally verified regulatory INDELs; and another comprising a set of putative regulatory INDELs created by combining ChIP-SEQ determined differential binding of the TF NFκB and whole-genome sequencing for the same 10 human individuals. is-rINDEL successfully predicted disruption of 3 out of the 4 experimentally verified SNPs and successfully explained the differential binding of NFκB through identification of both disrupted NFκB sites and disruption of cofactors of NFκB. This shows is-rINDEL is a valuable tool for study of functional impact of small INDELs identified by large-scale human sequencing projects and genome-wide association studies.

A33 - A large-scale microbial whole genome sequencing pipeline for outbreak investigation

Short Abstract: Large-scale genome sequencing projects are increasingly gaining widespread popularity among researchers owing to decreased sequencing costs and improved sequencing methodologies. However, the plethora of data generated will inherently bring new challenges such as bottlenecks in computing capabilities and bioinformatics pipelines which also need to be scalable. Additionally, the overwhelming amount of data makes data analysis, interpretation and management of large-scale genomic projects difficult to achieve under time constraints.

Here we present our large-scale microbial whole genome sequencing (WGS) pipeline starting from raw sequence data acquisition to end-point comparative data analysis and interpretation in the context of water and foodborne outbreaks. The pipeline components include a modified version of GenDB for automated multiple genome annotation, GView Server for genome visualization and comparative genomic analyses, an automated and parallelized version of orthoMCL for grouping common and unique genes within a group of related strains, a customized core SNP pipeline for phylogenomics, a collection of scripts for gene content heat maps, and PHAST for phage identification. We have integrated many of our bioinformatics tools into our in-house Galaxy platform to allow biologists with minimal programming skills access to our bioinformatics pipeline with a web interface.

The use of microbial WGS and comparative analysis for outbreak investigation is in its early stages, however, as pipelines improve and researchers learn to harness the large amount of data produced in large-scale sequencing projects, microbial WGS has the potential to become the status quo in outbreak surveillance and response.

A34 - PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis

Short Abstract: The major mechanism by which cancer arises is through somatic mutations. This can lead to alterations in gene regulation and changes in protein structure and function. It is critical to distinguish mutations that have an important role – driver mutations – from unimportant ones – passenger mutations. Differentiating driver and passenger events is essential for understanding cancer disease mechanisms, which can help guide treatment decisions as well as identify novel targets for treatment.

We are developing a method called PARADIGM-SHIFT based on integrated pathway analysis to discriminate loss-of-function, neutral, and gain-of-function mutations. Utilizing regulatory interactions annotated for a given gene, we can detect a shift in the downstream effects of an altered gene compared to what is expected. We show that a score based on this shift is highly predictive of the presence of a mutation and that the directionality of this shift also reflects the gain- or loss-of-function.

Application of our method to a set of known driver mutations reveals that there is a significantly strong signal for loss- and gain- of functional mutations in the surrounding network, demonstrating the sensitivity of this approach. In addition, when applied to the negative control of passenger mutations, the method predicts little pathway impact, indicating this approach also has high specificity. Application of this approach to recurrent mutations in cancers from the TCGA project identifies several important driver mutations across these cohorts. We also highlight the novel utility of this specific approach by comparison to earlier published approaches including SIFT, PolyPhen2, and MutationAssessor.

A35 - Mercury: Next Generation Sequencing Data Analysis and Annotation Pipeline

Short Abstract: The use of whole exome capture and whole genome sequencing employing next generation sequencing requires analysis and annotation for biological significance. Increasingly, these techniques are being used in the clinical setting. Our Mercury pipeline is designed to function as a clinical and research analysis and annotation pipeline.

A36 - Statistical Analysis of Deaths due to Lung Cancer in Heavy Smokers - Exploring Cancer Resistance

Short Abstract: Lung cancer is the major cause of death all around the world and is
hardly cured due to lack of early prognosis. About 87% of lung cancer
deaths are caused by smoking. But not all heavy smokers die of cancer
and have a normal life expectancy. What advantages do these smokers
have, over those who develop cancer? Developing an accurate risk
prediction model to estimate the probability of lung cancer onset will
eventually help in its early prognosis. The crucial analysis in this
poster tries to focus on developing an accurate risk prediction model
that can be useful in understanding the cancer risk level in heavy
smoker candidates. Here we compared 7 clinical symptoms including
Dyspnea, Dysphonia, Dysphagia, Cachexia, Angina pectoris, Mental
Depression and effects on aggravation of Fibromyalgia and CFS, for a
population of 100 heavy smokers against 100 healthy smokers. Further
we collected records from hospitals of 57 lung cancer patients who
were also smokers, to examine the most frequently occurring symptom
which could have later developed as cancer. Pearson correlation
coefficient was used to determine the most frequently and commonly
observed symptoms in healthy smoker and cancer patients. 23.3156
percent correlation between the healthy smoker and cancer candidate
symptoms were found. Our result suggests different significance level
of symptoms that play vital role in determining the onset of neoplasia
in lung epithelial tissue. Determining the cancer risk levels in heavy
smoker will further help in effective designing of clinical trials for
lung cancer prevention.

A37 - Transcriptomic analyses of murine and human peripheral nerves affected by diabetes

Short Abstract: Diabetic neuropathy (DN) is the most common, debilitating and costly complication of diabetes. Despite extensive research, the mechanism of development and progression of DN is still not fully understood; therefore, no targeted therapies to cure DN are available. In order to identify and better understand pathways underlying the pathogenesis of DN, we have performed microarray gene expression profiling assays in peripheral nervous tissues (dorsal root ganglia, sciatic nerve, and sural nerve) obtained from multiple mouse models (db/db, BTBR ob/ob, high-fat diet fed, and Streptozotocin-treated) and human subjects including both types of diabetes. Our in-house bioinformatics analysis pipeline was used to identify differentially expressed genes (DEGs) and perturbed pathways for each model by comparing diabetic and non-diabetic samples, except for the human tissues, where we compared progressive and non-progressive DN. We have developed a Diabetic Neuropathy Microarray (DNM) database providing convenient web-based data exploration and summarization. Across multiple DN models, the most highly and consistently over-represented biological functions were the immune system and inflammatory responses, particularly, those associated with chronic inflammation. It is still not clear which cells of the innate and adaptive immune systems are promoting neuronal damage in diabetes, and what toxic substances are produced by each of these cells at or near the neuron/axon interface. Despite this uncertainty, our comprehensive gene expression database will provide the starting point for the future researches to elucidate the underlying mechanisms of the pathogenesis of diabetic neuropathy and lead to an opportunity to develop new therapeutic intervention.

A38 - Predictions of protein interactions with coding and noncoding transcripts in neurodegenerative diseases

Short Abstract: What is the role of coding and non-coding transcripts in neurodegenerative diseases? Increasing evidence indicates involvement of protein-RNA associations in the etiopathogenesis of disorders, but a comprehensive view is currently lacking and we need new tools to understand the principles that regulate functional and aberrant associations of protein-RNA complexes. We recently introduced a theoretical framework to study interactions between protein and RNA molecules (Bellucci et al. Nat. Methods 8, 444-445 2011). Our algorithm, catRAPID, provides the first quantitative approach for predicting protein-RNA associations. Here we employ catRAPID to understand the basic cellular processes that are impaired in disorders such as fragile X mental retardation, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer’s and Parkinson’s diseases. We analyse a number of mechanisms such as autogenous regulation of gene expression, pathways for the maintenance of metabolites homeostasis, non-coding mediated regulation and RNA trafficking. We find remarkable agreement between our predictions and experimental evidence provided by a number of techniques, including electrophoretic motility shift, UV-crosslinking and immunoprecipitation assays as well as high-throughput RNA sequencing experiments. Our approach will allow processing of a large amount of protein-RNA pairs relevant to neurogenerative disease and will lead to finding previously unknown interactions.

A39 - In-silico experiment of the human CFTR sequence to identify common disease causing mutation and development of small drug therapies for cystic fibrosis.

Short Abstract: Cystic fibrosis, an autosomal recessive genetic disease is associated with the mutation in the human CFTR protein sequence. So understanding of the CFTR Protein’s structure and component’s is important for the development of small drug therapies for cystic fibrosis. To start the experiment, the amino acid sequence of CFTR was obtained from the database and thus analyzed using some available software. The major regions, domains and motifs of the protein were identified. Mutations were evaluated which can potentially cause cystic fibrosis. Based on the analysis we can understand the protein components of the CFTR protein which is important for the development of the small drug therapies. We have presented the structural components of the CFTR protein which is our subject of interest. Then as we know it has a large number of mutations in the gene thus it allows us to evaluate the individual mutations which are responsible for cystic fibrosis. To address specific research question, we identified three disease causing mutations in this gene that are known to be associated with CF. Till now the conventional way of treating cystic fibrosis is by the use of physiotherapy, antibiotics and pancreatic supplements. We can also suggest gene therapy for the CF. We do this by identifying the mutation responsible for the disease and then transferring a normal copy of the CTR gene into the lung of the cystic fibrosis patients. This can be effectively done by using viruses and liposomes. With the development of better new viruses and liposomes these problems can be solved.

A40 - Vaccination priority for minimizing economic losses during a pandemic influenza outbreak

Short Abstract: Background: During the 2009 pandemic influenza outbreak, vaccination priorities, accepted
in most countries have been mainly based on the political equity standard, which disregard
losses, derived from economic inefficiency. Predicting economic impacts according to various
vaccination priorities would aid better administrative decision makings. In this study, we
propose an algorithm that estimates economic losses through pandemic simulation and a
vaccination priority that would minimize the losses.
Description: Demography of the metropolitan area of Japan was classified into 14
occupational groups based on the recent census and field trip data from People Flow Project.
Economic preference of each occupational group was hypothesized to be relative to its
contribution on gross domestic product (GDP). We virtualized a pandemic influenza outbreak
in Japan (transmission coefficient 1.4-1.6) using the pandemic multi-agent simulation (MAS)
technology to obtain infection ratios and numbers of the occupation groups. When the
economic preferences (economic efficiency criterion) were not considered, house wives/
husbands, students aged from 7 to 18, followed by university students, were the most
challenged group with high degree of freedom, but, when considered, the service sector
marked the highest priority, followed by the specialist group. Our results suggest that people
in the service sector are in the highest vaccination priority for minimal economic losses.
Conclusion: A simulation model that predicts infection ratios and of 14 occupational groups
and economic losses was proposed. Our model is useful for introducing economic factors into
policy making processes (for example, vaccination priority) under a pandemic crisis.

A41 - LRRfinder2.0: an integrative web application for the identification of conserved leucine-rich repeat motifs

Short Abstract: Leucine-rich repeat (LRR) motifs are shared by over 6000 functionally distinct proteins. Among these proteins are Toll-like receptors (TLRs), the most widely studied pattern recognition receptors (PRRs) and essential components of the innate immune system. Within TLRs, their extracellular ligand-binding domain (ECD) consists of at least 20 tandemly arranged LRRs. More than 2000 TLR sequences have been identified by high-throughput technologies, resulting in the identification of polymorphisms associated with susceptibility and resistance to infectious and autoimmune diseases. However, less than 1% of TLRs have resolved structures for the variable pathogen-recognition domain responsible for species-specific immune responses. Homology modelling is commonly applied to reduce the gap between the number of sequences and resolved structures, relying upon accurate alignments between template and subject. Identification of conserved LRR motifs provides fixpoints for building modelling alignments. Here we present LRRfinder2.0, an application for identifying highly conserved regions of LRR motifs within sequences. The underlying matrices were generated using over 9000 LRRs from TLR sequence alignments to known structures. Comparing LRRs between taxonomic groups and their position within the ECD has highlighted differences in regularity and amino acid preference. With species- and TLR-specific amino acid usage in consideration, we have implemented taxon and TLR position-specific scoring matrices (PSSMs) for LRR prediction. LRRfinder2.0 is therefore an ideal tool to identify LRRs in proteins, allowing for easy comparison within protein families and between species. Such information will improve homology model generation as well as explaining the importance of species-specific amino acids within the LRR motif.

A42 - Gerontome: a web server for aging-related genes and analysis pipelines

Short Abstract: Aging is a complex and challenging phenomenon that requires interdisciplinary efforts to unravel its mystery. Insight into genes of the aging process would offer the chance to delay and avoid some of their deteriorative aspects by preventive methods. To assist basic research on aging, comprehensive database and analysis platform for aging-related genes are required. The Gerontome is integrated database for aging-related gene information and user-friendly analysis pipelines. To construct the Gerontome database, aging-related gene and their annotation data were integrated by automated pipelines. The aging-related genes were categorized by a set of structural terms from Gene Ontology (GO). The analysis pipelines for promoter analysis and protein-ligand docking were developed. The promoter analysis pipeline allows users to investigate age-dependent regulation of gene expression. The protein-ligand docking pipeline provides information of the position and orientation of a ligand in age-related protein surface. Gerontome can be accessed through web interface for querying and browsing. The server provides comprehensive age-related gene information and their analysis pipelines. Gerontome is freely available at http://gerontome.kobic.re.kr.

A43 - Transcriptome Analysis of the Aging Eye and Age-related Macular Degeneration

Short Abstract: Age-related macular degeneration (AMD), the leading cause of irreversible blindness among the elderly, is a disease of the neural retina, retinal-pigmented epithelium (RPE), and choroid tissue complex. While aging is the single universal risk factor for AMD, genetic variants have implicated altered complement regulatory activity and lipid metabolism in AMD etiology. To further investigate the molecular pathways involved in AMD, we transcriptionally profiled RPE-choroid tissues derived from the macular and extramacular regions of 31 normal and 37 AMD postmortem eye specimens. By examining gene expression changes that naturally occur with age, we found significant differences in transcript levels associated with general cell viability, mitochondrial and lysosomal activities, and lipid biosynthesis. Unexpectedly, a dramatic change in RPE-choroid gene expression was found to occur at around age sixty, when AMD symptoms may first appear. Next, by employing a variety of bioinformatic and statistical approaches, we identified higher levels of genes regulating cell-mediated immune responses in all three major AMD phenotypes [Dry AMD, choroidal neovascularization (CNV), and geographic atrophy (GA)], and validated twenty of these genes using an independent donor cohort. We also identified increased transcript levels in apoptosis and angiogenesis programs that respectively distinguish GA and CNV from earlier AMD stages. Finally, using functionally-enriched expression signatures, a detailed AMD interactome was assembled that both delineates and interconnects Dry AMD, GA, and CNV. Overall, our work demonstrates the existence of age-related changes, global biomarkers, and phenotype-specific functional networks in AMD. These results will hopefully inform the development of novel AMD pharmaceuticals and diagnostics.

A44 - Computational Discovery of Master Regulators in Basal vs. Luminal Breast Cancer.

Short Abstract: The motivation for this project is the need for more targeted and effective treatments for basal breast cancer. We developed a method for computational prediction of master regulators in a genetic pathway derived from inferred differential protein activity in basal versus luminal breast cancers. We applied a Google PageRank-like method called SPIA (Tarca et al. 2009) that searches a basal signature network of 6917 protein-coding genes to identify putative “master regulators” involved in specifying basal cell character. Like Google’s algorithm that is designed to up-weight web pages that are highly referenced by other pages, simply running SPIA would result in finding all of the “sinks” (genes pointed to by many genes) in the network. To address this issue, we reversed all of the regulatory interactions in the network before running the algorithm. Our hypothesis is that genes that receive high SPIA scores in the reversed network will match our intuition of master control points and be good drug target candidates. We applied our method to a dataset of 15 basal and 21 luminal breast cancer cell lines from the Gray lab at OSHU (Heiser et al. 2011). We intersected these results with RNAi data from Alan Ashworth's lab in which over 700 different genes were knocked out and the growth rates of several basal and luminal cell lines were recorded. We found that many genes with high SPIA scores had associated RNAis that produced differential loss of growth in the basal compared to luminal cell lines.

A45 - GWASfunc – a post-GWAS analytic tool for functional interpretation of GWAS results

Short Abstract: In recent years, more than a thousand GWAS studies have been performed, associating >3000 genes to >500 diseases or traits. However, interpretation of GWAS results has been hampered because GWAS screens only for correlation between genetic variations and target traits and the underlying gene-gene network still remains to be elucidated. Here we present GWASfunc, a web-based tool for post-GWAS data analysis, which integrates information on biological pathways and networks, especially including miRNA annotation and targets. One effective way to interpret GWAS results is to analyze the associated genes in the context of biological pathways and networks.(Curtis, et al., 2005; Wang, et al., 2007) GWASfunc is designed to facilitate functional interpretation of GWAS results allowing SNP-to-gene mapping and the calculation of combined p-value for multiple SNPs mapped to a single gene. It is also equipped with a GSA (Gene Set Analysis) module for the identification of significantly enriched pathways and other gene lists (e.g. Gene Ontology, chromosomal band, validated/predicted miRNA targets etc) among the hit genes from GWAS results. The hit genes can be visualized interactively using an in-house network viewer in the context of various biological networks such as physical protein-protein interaction, TF-target and miRNA-target network. Our web service provides a simple and easy-to-use user interface using Adobe Flex front end and all the background analysis were relied on the server side JAVA backend.

A46 - BS-Analyzer: A feature-rich post-processing analysis and visualization tool for bisulfite sequencing

Short Abstract: Epigenetic gene regulation is a biological process that enables a stable modification of gene expression that occurs in cellular development, and in response to environmental conditions. DNA methylation, a mechanism of epigenetic regulation, influences cell differentiation, cancer, and aging. Through sequencing bisulfite (BS) converted DNA, we can detect the methylation states of cytosines, as methylated cytosines remain C’s while unmethylated cytosines are converted to T’s. Bisulfite aligners provide precise mapping of BS treated reads for whole genome DNA methylation studies and also provide mapping of CpG enriched regions through reduced represented bisulfite sequencing (RRBS), however, there is still no program for post-processing the resulting. Here we propose a fully automated Bisulfite Sequence Analyzer program, BS-Analyzer, for the post-processing of BS alignments. BS-Analyzer provides coverage of reads and methylation levels for the whole genome and specific features, such as, genes, inter-genic regions, exons, introns, transposons, and CpG-islands. We illustrate BS-Analyzer’s capability to detect differentially methylated regions of DNA across multiple samples using mouse RRBS methylomes. We also demonstrate the versatility of BS-Analyzer by showing how it can easily process data from different genomes, such as plants and fungi, by generating summary values and plots, like meta-plots and chromosomal views of global methylation levels. Technical features of the program include the option for multithreading to efficiently process large genomic data, OS independence, and a user friendly interface. We present BS-Analyzer as a robust and accessible post-processing BS sequencing tool for epigenetic investigation and analysis.

A47 - Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug

Short Abstract: The process of drug discovery and development is time-consuming and costly, and the probability of success is low. Therefore, there is rising interest in repositioning existing drugs for new medical indications. When successful, this process reduces the risk of failure and costs associated with de novo drug development. However, in many cases, new indications of existing drugs have been found serendipitously. Thus there is a clear need for establishment of rational methods for drug repositioning.
In this study, we have established a database we call “PharmDB” which integrates data associated with disease indications, drug development, and associated proteins, and known interactions extracted from various established databases. To explore linkages of known drugs to diseases of interest from within PharmDB, we designed the Shared Neighborhood Scoring (SNS) algorithm. And to facilitate exploration of tripartite (Drug-Protein-Disease) network, we developed a graphical data visualization software program called phExplorer, which allows us to browse PharmDB data in an interactive and dynamic manner. We validated this knowledge-based tool kit, by identifying a potential application of a hypertension drug, benzthiazide (TBZT), to induce lung cancer cell death.
By combining PharmDB, an integrated tripartite database, with Shared Neighborhood Scoring (SNS) algorithm, we developed a knowledge platform to rationally identify new indications for known FDA approved drugs, which can be customized to specific projects using manual curation. The data in PharmDB is open access and can be easily explored with phExplorer and accessed via BioMart web service (http://www.i-pharm.org/, http://biomart.ipharm.org/).

A48 - Of Ice and Fire: Evolutionary Extremes in the Influenza Proteome

Short Abstract: Seasonal and pandemic strains of Influenza A have caused severe health and economic crises. According to the CDC, from April 2009 to April 2010 there were an estimated 61,000,000 cases of H1N1. By integrating structural bioinformatics and evolutionary genomics approaches we have revealed an intriguing phenomenon of 100% conserved regions of protein surfaces in the H1N1 proteomes across multiple strains and years. This finding contrasts with the fact that the surface residues, on average, were found less conserved than the core residues for all 10 proteins of H1N1. Furthermore the conservation was found to be associated exclusively with the intra-viral interactions, where proteins of H1N1 interact with each other or with viral RNAs. We next ask whether regions of extreme conservation as well as extreme divergence are the common evolutionary features exhibited by the proteomes of all four pandemic subtypes, including H1N1, H2N2, H3N2, and H5N1. To do so, we have developed an automated pipeline that relies on studying structures of the comparative models of influenza proteins across multiple strains. The obtained findings may be helpful in identifying new antiviral drug targets.

A49 - CLC bio’s integrated framework for identification and comparison of genomic variants in mendelian diseases

Short Abstract: The use of personal genomics for predictive medicine is becoming increasingly popular in clinical settings due to its potential for identifying optimal drug treatments for particular individuals based on their genetic background. However, the bioinformatics analysis underlying the foundation of personal genomics, including the identification of “disease relevant” genomic variants, is still a bottleneck.
CLC bio’s Workbench and Server solution has been extended to an integrated framework for comparison and functional analysis of genomic variants with the purpose of supporting clinicians and scientists in the identification of clinically relevant variations.
In this work, we show how the analysis of a publicly available data set from 11 patients with inherited hearing loss can be performed using the CLC Workbench. We will show the complete re-sequencing workflow, from mapping of the sequencing reads to the reference sequence, through to the identification of genomic variants. Furthermore, we demonstrate a process for the annotation, filtering and comparison of identified genomic variants to emphasize clinically relevant variants and interesting new findings.

A50 - Enabling the reuse of methods while unraveling the epigenetic factors of Huntington's Disease

Short Abstract: We present an e-Science approach where the results of software developers support the collaboration between bioinformaticians and biologists. We used Huntington's Disease (HD) as a case study, a neurodegenerative disorder for which no cure has been found yet. Our objective is to explore the possible role of chromatin modification in HD while exploiting collaboration between different disciplines. We developed knowledge mining Web Services that can be combined in workflows by bioinformaticians, and tools to automatically turn a workflow into a Galaxy tool or Web Application.

We present results of analysing the role of CpG islands in HD by workflows created in Taverna. Using our approach, we statistically determined the putative role of CpG islands in transcriptional dysregulation in HD. In addition, we could predict and prioritise the most likely candidate proteins that could interact with mutant HTT. We used workflows to discuss experimental design, while the Galaxy and Web Application tools were used to explore the basic functionality of the workflow taking advantage of a user-friendly work environment for our collaborators.

In conclusion, we applied Web Services, Workflows, and worfklow dissemination tools in order to gain insight into the role of CpG islands in HD. The simplified interfaces of Galaxy and the Web Application are useful to disseminate our work. The workflows will be reused to investigate the role of other epigenetic markers in HD and other (neurological) diseases. Finally, we participate in the Digital Library project 'Workflow Forever' to further improve the preservation and reproducibility of bioinformatics analyses.

A51 - Application of pathway descriptors to detect similarities between human diseases with little genetic overlap

Short Abstract: Identification of molecular changes shared between distinct diseases is important for understanding of the underlying mechanisms and diversity of human pathologies and such applications as patients stratification, personalized medicine and drug repositioning. Standard computational approaches for detection of disease-to-disease similarities largely focus on gene level information, i.e. identification of genes and variants shared by the diseases. Unfortunately, clinically similar diseases or cases of the same disease (e.g. individual patients' tumors) can be strikingly different in gene expression and genetic alterations patterns. Nevertheless, one can detect higher-order similarities between human clinical phenotypes at the level of biological pathways.

We applied a pathway-based approach to compare 1190 human diseases using the collection of 1,200 pathways and the gene-to-disease mapping from the MetaCoreTM knowledge base (Thomson Reuters). For each disease, a functional profile was generated by assigning each pathway with a P-value of its enrichment with the genes linked to the disease. Dozens of disease pairs with no or negligible overlap (P > 0.01) at the gene level shared statistically significantly enriched pathways (P < 10-10). Hierarchical clustering of the diseases based on their pathway profiles further confirmed the existence of disease groups where the overlap was detectable only at the level of pathways. These observations can be explained by convergence of distinct molecular alterations in their effect on cellular functions. The pathway-based approach can be used to dissect the diversity of human diseases and, potentially, to support drug repositioning studies.

A52 - Integrative analysis of Lung Genomics Research Consortium (LGRC) data

Short Abstract: Chronic Obstructive Pulmonary Disease (COPD) is one of the leading cause of disability in the US. While smoking is a primary cause of COPD, up to 25% of cases occurs in non-smokers, indicating strong genetic contribution. While recent studies have shown both genetics and epigenetics contribute to the disease predisposition and progression, none has investigated the interplay between genetics, epigenetics, and environment (i.e. smoking). It has also been shown that smoking affects male and female differently, suggesting different pathologic pathways. The Lung Genomics Research Consortium (LGRC) has assayed genotypes, gene expression profiles, and epigenomes (methylomes), in addition to both smoking status and relevant clinical variables, of a large case-control cohort. This data allows us, for the first time, to integrate different data types in a meaningful way. A preliminary integrative analysis of gene expression and methylation through Fisher's method uncovered genes whose expression changed in concert with percent methylation. Gender-specific analyses revealed gene sets likely to contribute to the sexual dimorphism observed at the epidemiological level. Gene set enrichment analysis and pathway analysis further suggested specific biological pathways that are up or down regulated in gender-specific manners, among which is Lipocalin which has been shown, with limited evidence, to have an association with COPD severity. These findings help expand the list of candidate genes for COPD diagnosis and therapeutics as well as demonstrate the power of integrative genomics approach in the analysis of complex diseases.

A53 - MalaCards – the integrated Human Malady Compendium

Short Abstract: We introduce MalaCards, an integrated database of human maladies and their annotations (malacards.weizmann.ac.il), modeled on the architecture and richness of the popular GeneCards human genes database, (www.genecards.org). MalaCards mines varied sources to generate a ‘card’ for each disease via: 1. Identifying sources of nomenclature/annotation, targets for disease data mining; 2. Developing algorithms for merging heterogeneous disease names, and defining unique identifiers. For example, alzheimer’s disease, ad, dementia alzheimer’s type, are merged under Alzheimer Disease, acronym AD, ID=ALZ001, with others listed as aliases (see malacards.weizmann.ac.il/card/index/ALZ001); 3. Engineering scripts to mine annotations; 4. Building MalaCards V1.01(alpha), with thousands of user-friendly ‘cards’ for all incorporated maladies, containing a variety of sections; 5. Implementing a strategy whereby detailed gene-disease relationships within GeneCards are used to create disease-specific content, leveraging the GeneCards relational database and search engine; 6. Constructing a second-tier annotator, based on GeneDecks Set Distiller, a GeneCards suite member. For example, diseases related to the key disease are computed to be those maximally associated with the set of found genes. Similarly, we obtain drugs/compounds, publications and mouse phenotypes contextually related to the disease; 7. Formulating scores for prioritizing derived annotations; 8. Initiating QA based on extensive knowledge within the Crown Human Genome Center. As our R&D continues, we plan to expand the list of annotation sources and sections, and include genetic variation details. This will be enhanced by collaborations with researchers outside of our group, and expanded by the initiation of systems biology tools, towards the goal of enabling novel biomedical discoveries.

A54 - GenePatentDB: a database server for gene-related patents

Short Abstract: This poster is based on Proceedings Submission nnn. As the number of patents containing biological sequences has been rapidly increased, gene-based patents also increased. The gene-based patents (especially those involving human genes) can be useful resources to biological researchers who plan to file gene- or biological sequence-based patent applications. It may serve as a guideline for deciding what kind of researches are to be conducted. Here, we present a genepatent, a database server for biological sequence annotation and analysis in issued patents. The followings are main features in the genepatent database. 1) We provide a user-friendly keyword-search interface for patent sequence and gene-patent map. 2) We built disease-patent and pathway-patent maps for human genes, using OMIM and KEGG databases. The two maps give the human diseases and pathway information garnering the most patent interest (e.g. in biotechnology and medical fields). 3) We also provide a patent sequence BLAST (psBLAST) to allow users to compare their sequences with patent sequences.

A55 - Inferring sparse multivariate models to predict disease phenotype from genotype

Short Abstract: Genome-wide association studies hold great promise for unraveling the genetic basis of complex diseases. While moderately successful, the results delivered by classical statistical methods have fallen short of their clinical goals of building disease predictive models. Several attempts have been made to use supervised learning techniques to predict disease outcome based on genotypes. These studies typically learn classifiers relying on embedding the data in a high dimensional space, or highly redundant bootstrapping, making interpretation of the results challenging. Furthermore, most of these methods pre-filter the examined variants, using statistical approaches, making them a poor alternative for the discovery of new disease associations. We use Adaboost, a large-margin classifier, to learn sparse models that predict case-control status in two independent cohorts of type I diabetes mellitus (T1D), demonstrating state-of-the-art classification performance. We suggest a simple yet powerful method for overcoming limitations of classification methods due to the linkage structure between genetic variants. We demonstrate significant overlap in boosting selected regions across the two cohorts, including 28 replicated genes which are not detected through the use of classical statistical tests. Of these genes, 13 have been previously implicated in the literature. We show how genes selected by boosting across both cohorts are enriched in T1D pathways. Finally, three pathways are found enriched using boosting on both cohorts and are not enriched when using p-value based methods. Our results suggest that through the use of large-margin classification algorithms we can discover a landscape of disease associated genes, not identified through other existing methods.

A56 - Chemical-Protein Interactome and its Application in Personalized Medicine and Drug Repositioning

Short Abstract:

A57 - A regression model approach to enable cell morphology correction in high-throughput flow cytometry

Short Abstract:

A58 - MetaFlux: a new software tool to develop Flux Balance Analysis Models

Short Abstract:

A59 - Quality Assessment Using Network Representation of Genome Assemblies

Short Abstract:

A60 - A Systems Biology Approach to Identify Drug Combinations Against Intracellular Pathogens

Short Abstract:

A61 - eDating for Receptors and Ligands: Computational Prediction and Prioritization of Receptor-Ligand Pairs

Short Abstract:

TOP

View Posters By Category

Search Posters:

TOP