HOME

Tweets by @ISMBinfo

Accepted Posters

Attention Conference Presenters - please review the Speaker Information Page available here.

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category A - 'Bioinformatics of Disease and Treatment'

A01 - Linking Signaling Pathways to Transcriptional Programs in Breast Cancer

Hatice Ulku Osmanbeyoglu, MSKCC, United States

Short Abstract: Cancer cells acquire genetic and epigenetic alterations that often lead to dysregulation of oncogenic signal transduction pathways, which in turn alters downstream transcriptional programs. Numerous methods attempt to deduce aberrant signaling pathways in tumors from mRNA data alone, but these pathway analysis approaches remain qualitative and imprecise. Here, we present a statistical method to link upstream signaling to downstream transcriptional response by exploiting reverse phase protein arrays and mRNA expression arrays in The Cancer Genome Atlas breast cancer project. Formally, we use an algorithm called affinity regression to learn an interaction matrix between upstream signal transduction proteins and downstream transcription factors (TFs) that explains target gene expression. The trained model can then predict the TF activity given a tumor sample’s protein expression profile or infer the signaling protein activity given a tumor sample’s gene expression profile. Breast cancers are comprised of molecularly distinct subtypes that respond differently to pathway-targeted therapies. We trained our model on the breast cancer data set and identified subtype-specific and common TF regulators of gene expression. Finally, inferred protein activity predicted clinical outcome within the METABRIC Luminal A cohort, identifying high- and low-risk patient groups within this heterogeneous subtype.

A02 - MOLECULAR DYNAMICS ANALYSIS OF CHRONIC MYELOID LEUKEMIA GATEKEEPER MUTATION

Ambuj Kumar, University of Florida, United States

Short Abstract: ABL1 is a non-receptor protein tyrosine kinase that plays a key role in regulating several pathways associated with cell growth and survival. ABL1 kinase domain gatekeeper mutation T315I is considered as a major problem for patients with chronic myeloid leukemia by inducing resistance towards wide range of protein tyrosine kinase inhibitors. In this work we have performed long-term molecular dynamics simulation to deduce the atomic changes associated with observed phenotype as a consequence of mutation. The radius of gyration and solvent accessibility area results strongly suggested that the mutant structure attains constricted and rigid conformation as compared to the native structure, which can clearly explain the reason behind drug resistance observed in the patients. Furthermore, DSSP results showed that the C-terminal region of mutant protein attains turn conformation as compared to the alpha helices observed in native structure. The results obtained in this work will help in designing more promising drug inhibitors against ABL1 T315I mutation induced drug resistance.

A03 - Clonality Inference in Multiple Tumor Samples Using Phylogeny

Salem Malikic, Simon Fraser University, Canada

Short Abstract: Intra-tumor heterogeneity presents itself through the evolution of subclones during cancer progression. While recent research suggests that this clonal diversity is a key factor in therapeutic failure, the determination of subclonal architecture of human tumors remains a challenge. To address the problem of accurately determining subclonal frequencies in tumors as well as their evolutionary history, we have developed a novel combinatorial method named CITUP (Clonality Inference in Tumors Using Pyhlogeny). An important feature of CITUP is its ability to exploit data from multiple time-point and/or regional samples from a single patient in order to improve estimates of mutational profiles and subclonal frequencies. Using extensive simulations and real datasets comprising tumor samples from two leukemia drug-response studies, we show that CITUP can infer the evolutionary trajectory of human tumors with high accuracy.

A04 - VarElect: phenotype-based NGS variant prioritization using GeneCards and MalaCards

Marilyn Safran, the Weizmann Institute of Science, Israel

Short Abstract: Progress towards deciphering genetic causes of human diseases has been greatly accelerated by next generation sequencing (NGS) technologies. In typical analyses of whole exome sequences, the number of non-reference coding variants is ~25,000. First-tier pipelines sift these SNPs/indels by rarity, predicted protein damage, and by segregation in affected families based on inheritance modes. The resulting variant short-list often contains dozens or even hundreds of candidates. Researchers/clinicians then face the hurdle of connecting one (or just a few) variant-carrying genes to the phenotype. This last step of effective zooming-in on target variants requires judicious identification of often subtle connections between gene candidates and keywords that describe the disease/symptoms, using bioinformatics. The comprehensive data harbored within GeneCards (www.genecards.org) and MalaCards (www.malacards.org), our human gene and disease compendiums, provide an ideal platform to achieve this goal. We constructed VarElect, a new Variant Election facility that attains phenotype-dependent variant prioritization, leveraging the rich information within GeneCards/MalaCards. Users submit phenotype/disease keywords related to a sequenced individual, and a list of genes/variations (typically 100-300); VarElect analyzes the input and produces a prioritized list of scored, contextually annotated genes, inferring direct and indirect links between genes and phenotypes. Two successes involve congenital general anosmia (CGA), and unexplained congenital diarrhea.

A05 - Chimera: A Bioconductor package for secondary analysis of fusion products

Raffaele Calogero, Università di Torino, Italy

Short Abstract: Gene fusions arising from chromosomal translocations have been implicated in cancer. The discovery of novel gene fusions can lead to a better comprehension of cancer progression and development. RNA sequencing has opened many opportunities for the identification of this class of genomic alterations, leading to the discovery of novel chimeric transcripts in melanomas, breast cancers and lymphomas. Nowadays, various computational approaches have been developed for the detection of chimeric transcripts. These computational methods produce outputs that do not follow a standard structure and, as far as we know, no tools are available to analyze and manipulate, these outputs. Thus, we have developed chimera, a Bioconductor package, for downstream processing of data obtained by the following fusion detection tools: bellerophontes, deFuse, FusionFinder, FusionHunter, mapSplice, tophat-fusion, FusionMap, chimeraScan and STAR. The outputs are reorganized in a common data structure. Chimera implements various filters to reduce false positive (spanning/encompassing reads threshold, selecting genomic region associated to coding genes, excluding fusions encompassing long introns, retaining fusions encompassing in-frame fused peptides, etc.). It implements a de novo validation of the fusion junction and produces a structured output that provides the all the information needed for further laboratory characterization of fusion events.

A06 - STATegra EMS: An Experiment Management System for complex next-generation omics experiments.

Ana Conesa, CIPF, Spain

Short Abstract: We will present STATegraEMS a web-based user-friendly Experiment Management System for High-throughput sequencing and NGS-based assays which provides research laboratories with an integrated system for simple and effective experiment annotation and tracking of analysis pipelines from raw data to ready-to-use measurements, using free, open source software technologies. STATegraEMS is available at http://bioinfo.cipf.es/stategraems/

A07 - Leveraging cross-population Gene/Sub-network Meta-Analysis to recover signal underlying ethnic difference in disease

Emile Rugamika Chimusa, University of Cape Town, South Africa

Short Abstract: Despite numerous successful GWAS, detecting variants that have low disease risk still poses a challenge. GWAS may miss the disease genes with weak genetic effects or strong epistatic effects due to the single locus testing based-approach in GWAS. GWAS may thus generate false negative or inconclusive results, suggesting the need for novel methods to combine effects of SNPs within a gene from different GWAS studies to increase the likelihood to fully characterizing the susceptible gene. Because distinct populations exhibit substantial variation in genetic disease risk, genetic disease scoring statistics have exhibited different outcomes from different population for a particular disease. Designing a post disease scoring analysis that combine the effect of different SNPs within gene or pathways from multiple independent studies in a single analysis may be helpful in identifying associations with small effect sizes in order to reveal larger effects and to provide valuable information that will be useful to prioritize the most important results. This approach is known as meta-analysis, which aims to pool information from multiple genetic disease scoring statistic from different populations to increase the chances of finding associations with small effect sizes. Here, we present a meta probabilistic and graph-based approaches to integrate the association signal from different independent GWAS studies in order to de-convolute the interactions between genes underlying the pathogenesis of complex diseases. We implement the approach in ancMETA.

A08 - Uncovering synthetic lethal interactions for colorectal cancer therapeutics via an integrated approach

Grace S. Shieh, Academia Sinica, Taiwan

Short Abstract: Two genes are said to have synthetic lethal (SL) interaction if their simultaneous mutations lead to cell death, but each individual mutation does not. Using synthetic lethality-based methods to develop cancer-specific therapeutics has been rapidly adapted due to its translational impact. Here, we present an integrated computational and experimental approach to uncovering SL pairs in colorectal cancer (CRC). Our pilot study showed that certain verified SL pairs were simultaneously differentially-expressed in high percentages of cancerous tissues. Thus, we hypothesized that cancer cells depend on some of these gene pairs for survival and/or proliferation. Protein levels of ~20 selected genes were evaluated by immunohistochemistry using 171 CRC patients, and their pairwise combination were correlated to clinicopathological features. This resulted in 11 predicted SL pairs, including the previously verified (MSH2, POLB) and (CSNK1E, MYC). Additionally we validated two novel SL pairs of TP53 using RNAi, small-molecule inhibitor and mouse model, indicating that these SL pairs can be readily translated to preclinical studies in treating TP53-mutant CRC patients. Kaplan-Meier estimates demonstrated that four IHC pairs were correlated with poor survival. Finally, multivariate Cox regression analysis showed that these four protein pairs and stage can predict CRC patient overall survival, suggesting their clinical application in decisions for adjuvant treatment. Our approach is readily accessible and applicable to other cancers.

A09 - Inferring novel relationships of drugs and genes based on phenotypic features

Jeanette Prinz, Helmholtz Zentrum Munich, Germany

Short Abstract: The molecular mechanisms that translate chemical perturbations into phenotypic effects are largely unknown. Consequently, in the context of drug safety and drug design, there is an urgent need for novel approaches to uncover these molecular relationships. Here, we exploited drug and gene phenotypic information to search for novel molecular associations of drugs. For that purpose, we annotated side effects of 1,667 marketed drugs and phenotypic traits of 5,384 genes from the Mouse Genome Informatics repository with the MedDRA ontology and measured the phenotypic linkage between drugs and genes by utilizing a semantic similarity approach. The analysis of high scoring drug-gene associations shows that the genes in these pairs bear properties of drug targets: they have a central function in protein-protein interaction networks, tend to be specifically expressed across tissues and their expression profiles correlate with those of the associated known drug targets. Furthermore, we benchmarked the predicted relationships with drug targets from the STITCH database and observed a strong enrichment of physical as well as indirect gene-drug associations. The successful experimental validation of a formerly unknown drug-gene relation suggests that our method is able to reveal novel modes of action of drugs. In summary, we demonstrate that our systems biology approach is able to detect direct as well as indirect drug-target associations providing new insights into molecular mechanisms underlying chemical perturbations. Thus, it may help to find new therapeutic applications for drugs and may improve the rational design of medicines.

A10 - MicroRNA-gene association as a prognostic biomarker in cancer exposes disease mechanisms

Rotem Ben-Hamo, Bar Ilan University, Israel

Short Abstract: The transcriptional networks that regulate gene-expression and modifications to this network are at the core of the cancer phenotype. MicroRNAs, a well-studied species of small non-coding RNA molecules, have been shown to have a central role in regulating gene expression as part of this transcriptional network. Further, microRNA deregulation is associated with cancer development and with tumor progression. Glioblastoma Multiform (GBM) is the most common, aggressive and malignant primary tumor of the brain and is associated with one of the worst 5-year survival rates among all human cancers. To study the transcriptional network and its modifications in GBM, we utilized gene expression, microRNA sequencing, whole genome sequencing and clinical data from hundreds of patients from different datasets. Using these data and a novel microRNA-gene association approach we introduce, we have identified unique microRNAs and their associated genes. This unique behavior is composed of the ability of the quantifiable association of the microRNA and the gene-expression levels, which we show stratify patients into clinical subgroups of high statistical significance. Importantly, this stratification goes unobserved by other methods and is not affiliated by other subsets or phenotypes within the data. To investigate the robustness of the introduced approach, we demonstrate, in unrelated datasets, robustness of findings. Among the set of identified microRNA-gene associations, we closely study the example of MAF and hsa-miR-330-3p, and show how their co-behavior stratifies patients into prognosis groups and how whole genome sequences tells us more about a specific genomic variation as a possible basis for patient variances.

A11 - Therapeutic target database update 2014: a resource for targeted therapeutics

Chu Qin, National University of Singapore, Singapore

Short Abstract: Here we describe an update of the Therapeutic Target Database (http://bidd.nus.edu.sg/group/ttd/ttd.asp) for better serving the bench-to-clinic communities and for enabling more convenient data access, processing and exchange. Extensive efforts from the research, industry, clinical, regulatory and management communities have been collectively directed at the discovery, investigation, application, monitoring and management of targeted therapeutics. Increasing efforts have been directed at the development of stratified and personalized medicines. These efforts may be facilitated by the knowledge of the efficacy targets and biomarkers of targeted therapeutics. Therefore, we added search tools for using the International Classification of Disease ICD-10-CM and ICD-9-CM codes to retrieve the target, biomarker and drug information (currently enabling the search of almost 900 targets, 1800 biomarkers and 6000 drugs related to 900 disease conditions). We added information of almost 1800 biomarkers for 300 disease conditions and 200 drug scaffolds for 700 drugs. We significantly expanded Therapeutic Target Database data contents to cover >2300 targets (388 successful and 461 clinical trial targets), 20 600 drugs (2003 approved and 3147 clinical trial drugs), 20,000 multitarget agents against almost 400 target-pairs and the activity data of 1400 agents against 300 cell lines.

A12 - Toward personalized nutrition: Combining terminology sources from agriculture and biomedicine

Richard Linchangco, University of North Carolina at Charlotte, United States

Short Abstract: Advances in sequencing and the volume of available data bring us closer to the goal of personalized nutrition. Research into personalized nutrition aims to produce knowledge of plant-based nutrients and their effects on human health, in turn enabling individualized diets. Plant and human knowledge is plentiful but exists in many disparate databases. The two predominant literature resources are: Agricola for agricultural citations, and PubMed for biomedical citations. The differences in subject content and controlled vocabulary annotations in Agricola and PubMed make it impractical to integrate data between them. Integrating the controlled vocabularies for these two resources requires new techniques for merging differing terminology usage between subject domains.

To facilitate interoperability between Agricola and PubMed, a method was developed to combine the controlled vocabularies of Agricola and PubMed (the National Agricultural Library Thesaurus [NALT] and the Medical Subject Headings [MeSH], respectively). To address the differences in terminology usage, syntactic and semantic methods were employed. Matched concepts in the combined controlled vocabulary are validated based on ontology-based parent-child relationships and manual curation.

Our method produces a combined controlled vocabulary encompassing all annotations in NALT and MeSH. This work empowers further research into personalized nutrition and the ability to discover the effects of tailored diets on individualized disease treatment.

This work is part of the Plant Pathways Elucidation Project funded by: University of North Carolina General Administration, Duke Energy, General Mills Inc., and Dole Food Company

A13 - Inferring probabilistic miRNA-mRNA interaction signatures in cancers: a role-switch approach

Yue Li, University of Toronto, Canada

Short Abstract: Aberrant microRNA (miRNA) expression is implicated in tumorigenesis. The underlying mechanisms are unclear because the regulations of each miRNA on potentially hundreds of mRNAs are sample specific. We describe a novel approach to infer Probabilistic MiRNA–mRNA Interaction Signature (‘ProMISe’) from a single pair of miRNA–mRNA expression profile. Our model considers mRNA and miRNA competition as a probabilistic function of the expressed seeds (matches). To demonstrate ProMISe, we extensively exploited The Cancer Genome Atlas data. As a target predictor, ProMISe identifies more confidence/validated targets than other methods. Importantly, ProMISe confers higher cancer diagnostic power than using expression profiles alone. Gene set enrichment analysis on averaged ProMISe uniquely revealed respective target enrichments of oncomirs miR-21 and 145 in glioblastoma and ovarian cancers. Moreover, comparing matched breast (BRCA) and thyroid (THCA) tumor/normal samples uncovered thousands of tumor-related interactions. For example, ProMISe–BRCA network involves miR-155/183/21, which exhibits higher ProMISe coupled with coherently higher miRNA expression and lower target expression; oncomirs miR-221/222 in the ProMISe–THCA network engage with many downregulated target genes. Together, our probabilistic approach of integrating expression and sequence scores establishes a functional link between the aberrant miRNA and mRNA expression, which was previously under-appreciated due to the methodological differences.

A14 - Using Hidden Markov Model to Analyze Time-Series Disease Data

Wen-Yu Chung, National Kaohsiung University of Applied Sciences, Taiwan

Short Abstract: The relation between genes and complex disease is difficult to identify because of the possible multiplex genetic structure. Though Genome-wide association studies (GWAS) have been working on developed computational methods to identify trait-related genes, little is known about the dynamic changes in gene expression and the course of disease development. We are interested in finding such association between genes and prognosis using microarray and next-generation sequencing data. We have developed a hidden Markov model and applied onto published datasets. The case samples are the expression profiles of hepatitis-virus infected non-tumoral and tumoral liver parts (GSE47197). Short-read expression data from normal people (GSE16921) is used as the stage zero sample.
We considered the changes from each stage to the next and constructed the states in our Markov model accordingly. In this exercise, there are three stages, but the methodology is applicable to higher number of stages. The expression level was first categorized as low or high for each gene. Then, all eight possible situations were further grouped into five dynamic patterns. These patterns were the observations and the hidden Markov model was put to test whether each gene is disease-related. Furthermore, we built gene co-expression networks using the same datasets and studied the changes of interactions.
Preliminary results showed many high-scored genes are known cancer genes. The identified genes and their connectivities will be examined together, and will provide a comprehensive understanding about the changes of gene expression and disease prognosis.

A15 - Oncogenomics analysis of Merkel cell carcinoma

Kenneth Daily, National Cancer Institute, United States

Short Abstract: Merkel cell carcinoma (MCC) is a rare skin cancer even more deadly than melanoma. MCC is one of only seven human cancers with a viral etiology. The DNA virus Merkel cell polyomavirus (MCV) is associated with tumorigenesis and is clonally integrated in the host genome in approximately 80% of MCC tumors. Our goal is to characterize the DNA and RNA of MCC tumors to identify tumorigenic factors and potential prognostic and therapeutic targets. As such, we have applied a number of high throughput analyses to obtain a high resolution view of 24 MCC fresh frozen tumor samples. Assays include RNA microarrays, RNA-Seq, array CGH, and exome sequencing supplemented with baits targeting MCV. Each of these analysis tools will help illuminate carcinogenic factors such as over- or under-expressed genes, small and large scale copy number changes and DNA rearrangements, point mutations, and identification of MCV integration sites. Most importantly, since the same tumor samples were used across multiple platforms, we are able to integrate datasets to help prioritize the thousands of potential target genes and genomic regions that are identified by individual assays; for example, a gene with a non-synonymous sequence variant can be assessed for reduced RNA expression or a copy number loss, raising the likelihood that it represents a tumor suppressor. Functional validation of targets identified from this integrated approach is being assessed using MCC cell lines, tumor microarrays, and mouse xenograft studies.

A16 - FABIA: A Probabilistic Model for Biclustering and its Application to Analyzing Big Data in Drug Design

Djork-Arné Clevert, Johannes Kepler University Linz, Austria

Short Abstract: Unsupervised bicluster analysis is a hot topic in data science and has become an invaluable tool for extracting concealed knowledge from high-dimensional data. Down the years, biclustering demonstrated its worth in many biomedical applications, e.g., to identify tightly co-expressed gene sets in cancer subgroups.

Biclustering simultaneously organizes a data matrix into subsets of rows and columns in which the entities of each row subset are similar to each other on the column subset and vice versa. This simultaneous grouping of rows (genes or chemical fingerprints) and columns (conditions or compounds) allows identifying subgroups within the conditions, e.g. in drug design where researchers want to reveal how compounds affect gene expression (the effects of compounds may only be similar on a subgroup of genes).

Standard clustering methods are not suited to tackle these kinds of problems. We therefore present a biclustering approach, called FABIA, which goes far beyond conventional clustering concepts. FABIA is a multiplicative latent variable model that extracts linear dependencies between column and row subsets by forcing both the hidden factors and their loadings to be sparse.
FABIA is a mathematically well-founded Bayesian analysis technique that allows exploring high-dimensional big data in an unsupervised manner and thereby shedding new light on the dark matter of many problems.

During the poster session, we will present:
a) the FABIA model for extracting biclusters and their ranking according to information content;
b) results from a high-throughput compound screening;
c) biclustering ChEMBL’s bioactive small molecules (16 million chemical fingerprints times 1 million compounds)

A17 - Assessment of Health Risks of Space Radiation Using Networks of Biological Processes

Yared Kidane, Universities Space Research Association / NASA Space Radiation Program Element, United States

Short Abstract: Space travel, among other factors, expose human to Low Doses of Ionizing Radiation (LDIR).
The health impacts of LDIR are not well understood. Existing approaches that attempt to infer
risks of LDIR from High Dose Ionizing Radiation (HDIR) risk models have been controversial
due to a nonlinear dose-response relationship in LDIR range.

Recent attempts involve use of host transcriptional profiles to study commonalities and
differences among host responses to LDIR and HDIR at molecular level. In this regard, gene
expression arrays provide a good platform in identification of LDIR- and HDIR- inducible genes
and biological processes. However, comparisons made at the level of genes, pathways, networks,
and biological functions have indicated that time-dependent responses are more apparent than
dose-dependent responses. This may in part be due to the transient nature of host responses
which may result in low level signal and concordance.

Here, we propose an alternative approach for comparing host transcriptional responses to LDIR
and HDIR. Our proposed method is based on identification of networks of co-regulated
biological processes under LDIR and HDIR. Our approach is based on the premise that
phenotypic differences are results of interworking/interweaving of biological processes instead
of individual genes and biological pathways.

First, we identified dysregulated biological processes under LDIR and HDIR separately. Then,
we constructed a co-regulation matrix of biological process under the two conditions. Based on
this, we built an interaction network depicting co-regulation of these processes. Finally, we
sought for highly connected biological processes that are commonly and differentially perturbed
by LDIR and HDIR.

A18 - Identification of somatic mutations associated with therapy resistance in acute myeloid leukemia

Samuli Eldfors, University of Helsinki, Finland

Short Abstract: Acute myeloid leukemia (AML) is treated with the standard regimen of anthracycline and cytarabine. While nearly all patients respond to induction chemotherapy, disease recurrence and drug resistance is common. Mutations driving resistance are not well defined in AML, and characterisation of such mutations would enable better prediction of treatment outcomes and development of improved treatment strategies.
To identify somatic mutations associated with disease recurrence and drug resistance we analyzed samples from six AML patients at the time of diagnosis and relapse. Exome-sequencing was performed on bone marrow samples obtained at diagnosis prior to induction chemotherapy and at relapse when the patients became resistant to therapy. Skin biopsies were used as germline controls. RNA-sequencing was performed on relapse samples to assess gene expression levels. To analyze the data we developed a computational protocol for the identification of somatic mutations and copy number alterations associated with the relapse state and that are thus candidate biomarkers of therapy resistance. We used gene expression data to prioritize candidate mutations in expressed genes.
Analysis of our AML data sets has identified several genes that accumulate mutations or are targeted by copy number alterations in relapse samples and hence are candidate drivers of therapy resistance in AML. The analysis protocol we have developed can be applied to identify mutations related to therapy resistance and disease progression in leukemias and other types of cancer, when exome-sequencing data from serial samples is available.

A19 - Dissecting Cancer Heterogeneity with a Probabilistic Genotype-Phenotype Model

DongYeon Cho, National Institutes of Health, United States

Short Abstract: One of the obstacles hindering a better understanding of cancer is its heterogeneity. However, computational approaches to model cancer heterogeneity have lagged behind. To bridge this gap, we have developed a new probabilistic approach that models individual cancer cases as mixtures of subtypes. Our approach can be seen as a meta-model that summarizes the results of a large number of alternative models. It does not assume predefined subtypes nor does it assume that such subtypes have to be sharply defined. Instead given a measure of phenotypic similarity between patients and a list of potential explanatory features, such as mutations, copy number variation, microRNA levels, etc., it explains phenotypic similarities with the help of these features. We applied our approach to Glioblastoma Multiforme (GBM). The resulting model Prob_GBM, not only correctly inferred known relationships but also identified new properties underlining phenotypic similarities. The proposed probabilistic framework can be applied to model relations between similarity of gene expression and a broad spectrum of potential genetic causes.

A20 - Stage-dependent activity inference of transcription factors by integrating genomics, epigenomics and transcriptomics profiles: with application to KIRC

QI LIU, Vanderbilt University, United States

Short Abstract: Comparative analysis of expression profiles between the early and the late stage cancers have identified the genes with stage-dependent expression alterations, which helps get a better understanding of cancer progression and metastasis and predict the clinical aggressiveness of cancer. The expression alterations can be explained by changes in genomic, epigenomic and regulatory programs. Compared with genomic and epigenomic alterations, however, changes in regulatory programs, mainly due to the activity changes of transcription factors, are hard to detect and quantify. Here we developed a statistical model to infer activity changes of transcription factors by combining the effect of genetic and epigenetic alterations on mRNA expression variation. Applied to kidney renal clear cell carcinoma (KIRC) patients, the model underscored the role of methylation as a significant contributor to stage-dependent expression alterations and identified key transcription factors as potential drivers of cancer progression.

A21 - Disease characterization using pathway profiles

Zhenjun Hu, University of Boston, United States

Short Abstract: A plausible reason for the disparity between gene signatures of a given disease is that diseases are more relevant to the perturbed cellular functions than the individual genes. From this perspective, we developed a new approach to characterize diseases based on perturbed pathway profiles (a binary vectors whose elements indicate whether a given function is perturbed). Using breast cancer subtypes (luminal A, luminal B, triple-negative, and HER2+) as an example, the approach is evaluated by the reproducibility, accuracy and resolution of the resulting pathway profiles.
The analysis is carried out using the Cancer Genome Atlas (TCGA) RNA-Seq and microarray data sets, and microarray data set provided by The European Genome-phenome Archive (EGA). Average reproducibility of 68% and 67% are achieved between different data sets (TCGA microarray vs. EGA microarray data) and between different technologies (TCGA microarray vs. TCGA RNA-Seq data), respectively. About 40% of the identified pathways are enriched in all subtypes. Among all the enriched pathways, about 74% are known to be associated with breast cancer or other cancers. Comparison of pathway profiles between subtypes of breast cancer, as well as other diseases indicates that luminal A and luminal B subtypes are closer to HER2+ than triple-negative subtype, and subtypes of breast cancer are more likely closer to each other than to other diseases.
Our results demonstrate that pathway profiles have acceptable reproducibility, high accuracy and reasonable resolution to characterize subtypes of breast cancer and related diseases. The correlations between diseases are shown as a network at http://visant.bu.edu.

A22 - Transcription Factor Activity Inference Reveals Mechanisms of Diseases

Yong Li, Stanford University, United States

Short Abstract: The human genome contains hundreds of transcription factors, cofactors and chromatin regulators. The activities of these regulatory elements define the transcriptional states and subsequently the normal and disease phenotypes of cells. Recent technological development such as ChIP-seq opens up the possibility to completely survey the binding sites of all transcriptional regulators in a given cell type. ENCODE is one such effort that aimed at identifying the epigenetic makeup of the human genome, providing us with rich information on the binding landscape of over 100 unique DNA binding proteins in a limited number of cell lines. Expanding and replicating such efforts in a large number of biological samples is however practically challenging, due to resource limitations and the huge number of potential combinations of cell types, diseases, and other yet undefined cellular states. We suggest that the issue can be circumvented partly by computationally inferring the transcription factor activities in any sample of interest, through combining low cost gene expression profiling of the sample with prior transcription factor binding data. We present a novel statistical method called Likelihood and Asymmetry (LA) test for the problem. By using a set of transcription factor siRNA knockdown experiments, we show that LA-test performs significantly better than the other test statistics we evaluated. We analyze the global properties of the transcription factor activity networks, and demonstrate the application of the method to muscle samples of a group of diseases called idiopathic inflammatory myopathies to define the common and unique transcriptional regulatory programs of the diseases.

A23 - Voting-based cancer module identification by combining topological and data-driven properties.

Hyunju Lee, GIST, Korea, Rep

Short Abstract: Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations e.g. GE-GE, CNA-GE, and CNA-CNA over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.

A24 - Rewiring of Airway Gene Expression-Based Networks by Bronchial Dysplasia and Lung Cancer

Anna Tassinari, Boston University, United States

Short Abstract: Background:
Lung cancer is the leading cause of cancer death in US due to late diagnosis. By studying molecular alterations associated with premalignant airway lesions, we hope to discover the earliest changes in process of lung carcinogenesis. Network-based analyses can potentially elucidate complex cellular pathway dysregulation in tobacco-induced lung cancer. Here, we utilize this approach to identify airway gene coexpression (GCE) patterns present in lung cancer and premalignancy, but absent from normal phenotype.

Methods:
Using weighted gene coexpression network analysis (WGCNA), we built GCE networks derived from cytologically normal airway epithelial cells obtained via bronchoscopy from current and former smokers with lung cancer (n=423), dysplastic airway lesions (n=121), and neither condition (n=467). Differences in GCE between disease and normal networks were quantified by modular differential connectivity (MDC). Conservation of premalignancy and cancer modules was evaluated based on gene overlap (Fisher’s Exact Test). We focused on modules that were conserved across disease conditions, but had significant MDC compared to their undiseased counterparts. Hub genes were identified based on intra-modular connectivity.

Results:
Among 107 modules, eight cancer modules gained connectivity (MDC>1.5; FDR<0.05), and significantly overlapped (p<0.01) with premalignancy modules. These modules associated with immune response, mitosis, and RNA elongation (pBonf<0.005). Hub genes were involved in oncogenesis, apoptosis, MAPK and Wnt signaling.

Conclusions:
Integration of differential and GCE network analyses represents a novel approach to discovering pathways associated with early lung carcinogenesis that remain indiscernible by differential expression. Future validation of key effectors may lead to discovery of novel early detection lung cancer biomarkers and chemopreventive agents.

A25 - Network-based method for predicting breast cancer prognosis by eliminating indirect interactions

Youngmi Yoon, Gachon University, Korea, Rep

Short Abstract: Prediction of breast cancer prognosis based on genetic network have been studied, however most gene networks contain many indirect interactions. These interactions make it difficult to find significant prognosis-related mechanisms and to secure high accuracy in classification of prognosis. The accuracy rate tends to be low, especially when considering estrogen receptor status.
Here, we discover prognosis-specific network excluding indirect interactions using gene expression data and protein-protein interaction network, and develop a prediction model which shows high accuracy in breast cancer prognosis considering estrogen receptor status.
First, we identify prognosis-specific interactions where two genes show different expression patterns between metastatic and non-metastatic groups. Second, for each prognosis-specific gene edges, we calculate weight of them which can discriminate between indirect and direct interactions using silencing method [1], and select the edges of which weight is over the threshold. As a result, the number of genes and edges in the prognosis specific network of the first stage drastically decreased.
Our classifier shows LOOCV accuracy of 98% for estrogen receptor negative samples (ER-) and 93% for estrogen receptor positive samples (ER+) in Van [2] data, and 97% for both ER- and ER+ in Wang [3] data. Furthermore, we also showed that our network contains significant hub genes and bottleneck genes related breast cancer prognosis.

References
[1] Barzel B, Barabasi AL. (2013) Nat Biotechnology, 31: 720–5.
[2] Van de Vijver MJ, et al. (2002) The New England Journal of Medicine, 347: 1999–2009.
[3] Wang Y, et al. (2005) Lancet, 365: 671–679.

A26 - Identification of serum biomarkers for major depressive disorder diagnostics by the integrated analysis of transcriptomic, proteomic and lipidomic profiles

Kwang Pyo Kim, Kyung Hee University, Korea, Rep

Short Abstract: Major depressive disorder (MDD) is the most common neuropsychiatric disorder, with 350 million affected people in the world. WHO estimated that depression is the major cause of disability and the global burden of disease in worldwide. However, the disease mechanism remains unclear and there is no objective test for diagnosis. To study its pathogenesis and discover non-invasive biomarkers for diagnosis, an integrative analysis of transcriptomic, proteomic and lipidomic profiles was performed to identify deregulated biomolecules and their regulatory network in patients. For 10 first-onset, treatment-naive depressed patients and 10 healthy controls, the serum proteome and lipidome were obtained and analyzed by LC-MS/MS. The proteomic and lipidomic data were integrated with four publicly available microarray gene expression data and the molecular network of depression was constructed using protein-protein and protein-lipid interactions in MetaCore and LMPD databases. The results suggest that the perturbation of growth and neurotransmitter signaling and mitochondrial functions in brain can regulate lipid transport and immune responses in blood, and the biomolecules involved in these functions may be effective for the diagnosis of depression. The biomarker candidates of proteins and lipids were verified using multiple reaction monitoring (MRM) in 15 patients and 15 healthy controls. Then, the final set of potential serum biomarkers were selected using multivariate statistical analysis. We plan the further validation in a large population.

A27 - ImmunExplorer: A Framework for NGS-based Characterization and Visualization of the Human Immune System

Susanne Schaller, University of Applied Sciences Upper Austria, Hagenberg, Austria, Austria

Short Abstract: A detailed characterization and representation of the adaptive immune system, especially an analysis based on NGS data of the key players B and T cells, is very important for gaining a better understanding of the current state of health as well as diseases of human beings.
We here present a bioinformatics framework called ImmunExplorer (IMEX), which has been developed for performing a wide range of different sequence analyses to identify very fast and precise the status of the immune system.

The source for IMEX sequence analysis is pre-processed NGS high throughput and deep sequencing data from human B and T cell receptors. The pre-processing of the raw data is done using the ImMunoGeneTics (IMGT) information system that produces IMGT/HighV-QUEST output files which can be imported in IMEX.
IMEX offers the following analysis features: descriptive statistical analyses about different sequence functionality types and frequencies of all V(D)J gene elements; clonality analyses (which is calculated by counting clones per clonotypes based on the CDR3 sequences); diversity analyses; and visual representation (histograms, heat maps, scatter plots, line charts) of various V(D)J combinations based on gene and allele level and on different sequence type. Moreover, it provides algorithms and representations of sample comparisons for cell spectrum and variety analysis.

IMEX provides in-depth insight about the immune system in general. It shall for instance be used to analyze and visualize the current state of kidney allografts, which shall enable a significant performance increase of current transplantation monitoring methods.

A28 - Analysis of schizophrenic candidate gene expression in postmortem BA22 brain sample using different quantification tools

Kuo-Chuan Huang, Beitou Branch, Tri-Service General Hospital, Taiwan

Short Abstract: The quantification analysis of differential gene expression in complex mental disorder is important in study of pathogenesis and disease mechanism. NGS RNA-seq data from STG BA22 brain tissue sample are critical for exploration of differential candidate gene expression for schizophrenia.
However, different quantification tools may result in different differential gene expression. In this work, we compare Bowtie+RSEM(BR) and TopHat+Cufflinks(TC) to generate schizophrenic candidate gene expression using NGS BA22 brain tissue sample. RPKM represents the quantification gene expression in TC and TPM represents the quantification gene expression in BR respectively.
There are total 722 and 511 candidate genes with statistic significance selected from transcriptional expression by TC and BR and compare with schizophrenic candidate gene database(SZgene). Only 2 genes (SLC25A5 and APEX1) appear in TC and BR matched in SZgene. There are 30 identical genes overlapped in two quantification tools.4.7 % of TC candidate genes matched in SZgene and 4.3 % of BR candidate genes matched in SZgene.
It is implicated that the complimentary phenomenon of schizophrenic candidate genes existed by using different quantification tool. Further investigation of quantification tools in heterogeneous condition could be analyzed to improve the accuracy and efficiency of gene expression quantification.

A29 - CanSNPer: a hierarchical genotype classiﬁer of clonal pathogens

Adrian Lärkeryd, Swedish Defence Research Agency, Sweden

Short Abstract: Advances in typing methodologies have recently reformed the field of molecular epidemiology of pathogens. The falling cost of sequencing technologies is creating a deluge of whole genome sequencing data that burdens bioinformatics resources and tool development. Sequence data will provide an excellent opportunity to extend our understanding of infectious disease when the challenge of extracting knowledge from available sequence resources is met. In particular, single nucleotide polymorphisms (SNPs) in core genomes of pathogens are recognised as the most important markers for inferring genetic relationships since they are evolutionarily stable and amenable to high-throughput detection methods. In order to take advantage of the new low-cost sequencing technology, sequence information must be presented and analysed in a way that e.g. public health experts in epidemiology can easily interpret. Here we present an efficient and user-friendly genotype classification pipeline, CanSNPer, based on an easily expandable database of predefined canonical SNPs. CanSNPer analysis can be performed directly in the browser as a Galaxy tool, eliminating the need of using the command line. The user can by extending the CanSNPer database easily incorporate this tool in the analysis of their pathogen of choice. CanSNPer, including all of its source code and documentation is available for download, free of charge, at http://github.com/adrlar/CanSNPer.

A30 - Using exome sequence data and Random Forest analysis to identify functional mutation signatures of 5 cancer differentiation subtypes.

Russel Sutherland, King's College London,

Short Abstract: The Pan-Cancer Analysis Project aims to identify the genomic changes present in 12 different cancer types from the Cancer Genome Atlas (TCGA) set . Cancer is a morphologically and genetically highly heterogeneous disease and as such we aimed to identify predictors of the 5 main differentiation subtypes (adenocarcinoma, squamous cell, urothelial carcinomas, leukaemia, and glioblastoma) in the Pan-Cancer Analysis Project based on differences in their patterns of functional mutation. Whole exome sequencing was performed on tumour and normal tissue samples from 1798 patients enabling the identification of cancer related mutations in each patient. Clinical data were also collected for all patients, including gender and age. We used a Random Forest machine learning approach to compare the 5 differentiation subtypes in a pairwise fashion. Our results show that we were able to discriminate between all 5 cancer types with accuracy above 0.8, except for the comparison between squamous cell and urothelial carcinomas. Across the ten comparisons the most discriminative protein coding genes were EGFR, PTEN and TTN.

A31 - Predictive and Experimental Approaches for Characterising Mutations in Proteins

Maria Buenavista, University of Reading,

Short Abstract: ENU mutagenesis studies in mice have demonstrated that PKD1L1 plays a key role in left-right differentiation in embryonic development via sensing of fluid flow by cilia. 3D models of the PKD1L1 PKD domain of the wild type and mutant suggest a destabilized β-structure in the second PKD domain (PKDd2).
Molecular dynamics simulations (MDS) using top-ranked quality-assessed model of each solvated variant were performed for 600ns. The time-evolved coordinates were studied for their root-mean-square fluctuations and secondary structures. Steered molecular dynamics (SMD), currently being carried out for each variant aims to obtain force-extension measurements under constant pulling speed of 0.01 nm ps-1.
Using synchrotron radiation circular dichroism (SRCD), the solution structures of the protein were compared under normal scan, thermal melt and UV denaturation conditions. The structural changes in both the wild type and the mutant at each stage of denaturing conditions were calculated by secondary structure analysis.
Normal scans of both N-His-tagged variants show similar folding with maximum ellipticity recorded between 210 and 220 nm. The thermodynamic and kinetic stability yielded by SRCD values correlate well with the hypothesis that the mutant has less stable properties. The CD spectral signatures confirm the mainly beta composition of the PKDd2 domain. Likewise, CD results show structural inter-conversions of the domain providing agreement with models. Computational models and experiments differentiated the structure and behaviour of the wild type from the mutant – differences which may underpin the impaired mechanosensing ability of the mutant found in nodal cilia during early development.

A32 - Molecular Interaction Analysis Of acetylcholinesterase with Phytochemicals and Its Derivatives for the Treatment of Alzheimer’s Disease- An in-silico Approach

Jitendra Gupta, Shodhaka Life Sciences Pvt. Ltd., India

Short Abstract: Alzheimer's disease (AD) is a neurodegenerative brain disease, which forms most common cause of dementia. AD is leading cause of mental impairment in aged people. Symptoms like delusions and hallucinations have been reported in a large proportion of patients with this disease. In fact the presence of these psychotic symptoms can lead to early institutionalization. AD is an irreversible, progressive brain disease that slowly destroys memory, risk of AD increases with age. A major strategy for the treatment of Alzheimer's disease has focused on the relation between memory impairment and dysfunction of the acetylcholine neurotransmitter system, results in reduction of acetylcholine level in the brain, which can be enhanced by acetylcholinesterase inhibitors. Here, we aim to find the binding between human recombinant acetylcholinesterase and its inhibitors to control/maintain the normal mechanism of acetylcholinesterase. Binding affinity was analyzed by docking simulation studies, performed between human recombinant acetylcholinesterase and phytochemicals by using AutoDock. Phytochemicals such as Solanine and chaconine with their analogues where used in present study for their molecular interaction and docking approach. Hence 74 lead compounds were selected for future analysis, which was then docked with acetylcholinesterase and inter and intra molecular interactions were found at GLN250, GLU292, HIS287, and PRO290 with the binding energy ranges from -10 kcal/mol to +50 kcal/mol. Recent studies suggested that phytochemicals which bind to acetylcholinesterase with minimum energy, can aid the development of new drugs for the treatment of Alzheimer’s disease.

A33 - Genomic models of short-term exposure accurately predict long-term carcinogenicity and identify putative mechanisms of action

Daniel Gusenleitner, Boston University, United States

Short Abstract: Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of carcinogenicity.

In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds and built a classifier that predicts a chemical’s carcinogenic potential (AUC: 0.78), and validated it on an independent dataset consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors.

Our results validate the toxicogenomic approach to predict carcinogenicity, show that the prediction of carcinogenicity is tissue-dependent, and provide evidence that, with a larger set of compounds, we would be able to substantially improve the prediction performance. Our results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the AhR receptor, and regenerative pathology in the response to carcinogen exposure.

A34 - K-Map: connecting kinases with therapeutics for drug repurposing and development

Aik Choon Tan, University of Colorado Denver, United States

Short Abstract: Protein kinases represent one of the largest ‘druggable’ and well-studied families in the human genome. Protein kinases play a key role as regulators and transducers of signaling in eukaryotic cells. Kinases are relatively easy to target with small molecules and have been extensively studied at the biochemical, structural, and physiological levels. In cancer cells, some kinases are mutated and acquire oncogenic properties to drive tumorgenesis. Small molecules that inhibit these oncogenic kinases can effectively kill cancer cells, as demonstrated by the success story of imatinib. We developed K-Map—a novel and user-friendly web-based program that systematically connects a set of query kinases to kinase inhibitors based on quantitative profiles of the kinase inhibitor activities. K-Map is motivated by the ‘connectivity map’ concept where gene expression changes could be used as the ‘universal language’ to connect between biological systems, genes, and drugs. Instead of gene expression signatures, we used the kinase activity profiles as the ‘language’ for connecting kinases and small molecules in K-Map to reveal the complex interactions of kinases and inhibitors. As a proof-of-concept, we queried K-Map with a set of essential kinases identified in EGFR-mutant lung cancer cell line. By connecting the essential kinases to compounds in K-Map, we identified and validated bosutinib as an effective compound that could inhibit proliferation and induce apoptosis in EGFR-resistant lines. In summary, we have demonstrated a proof-of-concept, bioinformatics-driven discovery roadmap for drug repurposing and development in cancer research, which could be generalized to other diseases in the era of personalized medicine.

A35 - Reconstructing Cancer Pathways and Their Mutation Order from Cross-Sectional Data

Fabio Vandin, University of Southern Denmark, Denmark

Short Abstract: Recent advances in DNA sequencing technologies have allowed the collection of all somatic mutations in large cohorts of cancer patients. One fundamental question that arises in the analysis of this data is whether there is an order, shared among patients, in which the driver mutations, important for the disease, arise. A number of computational methods have been designed to identify such mutation progression using cross-sectional mutation data from large number of patients These methods consider progression only at the gene level. Two recent works demonstrated advantages in reconstructing the order at the pathways (interacting genes sets) level, but restricted to a priori defined pathways. These pathways are often large, making it difficult to discover the progression in novel, smaller sets of interacting genes.

We developed a new combinatorial approach to simultaneously infer cancer pathways and their order of mutation in cancer. Our work assumes a linear order of mutations among pathways, and identifies i) the assignment of genes to cancer pathways and ii) the order of mutations in pathways that minimize the disagreement with the observed data, by leveraging on the expected exclusivity of mutations within cancer pathways. We tested our method on two recent datasets from large studies of The Cancer Genome Atlas on colorectal and glioblastoma cancers. In both cases the progression model reconstructed by our method captures most of the current knowledge of the cancer progression in these cancer types and identifies sets of genes that interact or are part of known cancer pathways.

A36 - A Network Approach for Integrative Analysis of Genomic Data in Ovarian Cancer

Xinghua Shi, University of North Carolina at Charlotte, United States

Short Abstract: Ovarian cancer accounts for 5% of cancer deaths in women, making a thorough understanding of the biological basis important for developing better treatment and diagnosis. In this study we develop a network approach that integrates serous ovarian cancer DNA methylation and gene expression data from The Cancer Genome Atlas (TCGA). We apply two methods to obtain networks by mining the associations between DNA methylation and gene expression. Specifically, we 1) perform a methylation expression quantitative trait loci (meQTL) analysis for each CpG and gene pair, and build gene networks by expanding the meQTL results under the guidance of known protein-protein interactions; 2) apply machine learning approach that simultaneously build gene networks using graphical models. We then build an integrated network by consolidating these networks constructed from different methods. Our further analysis of the integrated network points to a network view of how epigenetic signature, particularly DNA methylation, perturbs gene regulation and leads to tumorigenesis and cancer progression in ovarian cancer.

A37 - An integrative computational analysis of multiple data sets and dynamical models to identify new therapeutic targets to treat obesity.

Valentina Barbieri, Icahn School of Medicine at Mount Sinai & Systems Biology Center New York, United States

Short Abstract: Currently, the most effective therapies for obesity are highly invasive procedures like gastro-intestinal surgeries. So far all the FDA approved obesity drugs have been largely unsuccessful in providing long term weight loss and can also induce serious adverse effects.
A systems biology approach could help in predicting new plausibly effective drug target combinations.
We consider obesity a systemic and complex disease that is physiologically expressed as impairment in the peripheral and central regulation of energy homeostasis and satiation. In this study we have focused on peripheral mechanisms taking into account the different biological levels wherein the multiple cross-connected regulatory interactions lie.
We developed an ODE model of the plasma levels changes of selected orexigenic and anorexigenic digestive system hormones in response to nutrients in lean, obese and post-surgery human subjects. Through multivariate regression analysis and a SVM classifier we were able to identify the parameters that are selectively associated with a lean, obese or post surgery states. To investigate the molecular etiology of these changes an integrative statistic analysis of genomic, transcriptomic and proteomic data sets was performed to identify the most statistically significant SNPs, mRNA expression patterns, Gene Ontologies related to obesity. By using protein–protein interaction network analysis these hallmarks are going to be linked to each of the previously identified hormone-related parameters to identify the most highly lean- or obese-correlated proteins. We anticipate that these molecules will constitute a significant set of new therapeutic combinatory targets that can be experimentally tested in vitro and in vivo.

A38 - Rcircle: an R package for integrating and visualizing multiple “-omics” data for knowledge discovery

XING Li, Mayo Clinic, United States

Short Abstract: Biomedical science has entered big data era and biologists have access to an overwhelming abundance of data due to the rapid advance of high-throughput technology in sequencing and microarray. The tremendous volume and high dimensions pose an unprecedented challenge on data visualization and integration for efficient data exploration and effective scientific communication. Herein, we developed an R package to integrate and visualize interactome, time-course transcriptome, disease information, disease-affected pathways or networks to facilitate knowledge discovery. Starting with a curated list of congenital heart disease (CHD) genes, we identified their top 10 partners for each CHD gene and built a network for both disease genes and their partners. Pathway analysis is performed on the entire gene list. The R package visualized the gene network in the inner circle with line width representing the interaction confidence. Hub genes in the network were represented by the size of bubbles circling the genes. Transcription profile, disease information, and pathways are shown in the outer layers. By integrating different types of information together, we discovered disease hub genes and established that EMT genes were affected throughout entire cardiogenesis, members in BMP pathway are active in early embryo development and NF-AT pathway turns on at later stages. Furthermore, our tool revealed JAG1, NOTCH1, EDN1, EP300, SMAD2 are in charge of the crosstalk among those affected pathways. The case application of our package in CHD analysis indicates that our strategy goes beyond visualization of information and highlights the pattern, prioritizes vital candidates, and facilitates scientific discovery.

A39 - Identifying dysregulated metabolic pathways and miRNA gene targets associated with gastric cancer and effect of microbial environment on tumor growth

Amrita Kar, Boston University, United States

Short Abstract: Cancer metabolism is very complex and has long been associated with aerobic glycolysis i.e. the Warburg effect. However, recent studies have shown that metabolic signatures of cancer cells are not only responses to this effect but also due to oncogene-directed metabolic reprogramming. Understanding this dysregulation could help us to find therapeutic drugs or new treatment modalities for cancer. So in this project we firstly aim to understand the changes in cancer metabolism by looking at various up-or down-regulated metabolic pathways. For this we look into the irregularities in gene expression and the expression of its putative miRNA target that could affect these pathways. Secondly we look at tumor growth versus the environment. By doing so we’ve identified an ensemble of differentially expressed metabolic pathways in gastric cancer cells that not only fulfil catabolic requirements but also the anabolic requirements of the cancer cell. Apart from these hallmark changes we’ve also seen the emergence of new pathway players that can contribute to cancer cell survival. Also there is a significant overlap between signaling and metabolic pathways that can give us an insight into how a cancer cell hijacks homeostasis. In another aspect of tumor growth, we see the onset of H. pylori signaling in gene expression enrichment for patients who have no incidence of the infection. This bacterium is known to make gastric cancer more virulent and metastatic. Understanding such bacterial associations with cancer can help to elucidate how the microbial/internal environment around the tumor affects its growth.

A40 - Building a High-performance Pipeline for Analysis and Management of Whole Exome Sequencing Data

Riyue Bao, University of Chicago, United States

Short Abstract: Whole exome sequencing (WES) has facilitated discovery of inherited and novel genetic alterations associated with diseases at low cost, high efficiency and reliability. However, previous reports showed low concordance among various variant callers. We developed a WES pipeline consisting of raw read quality control (QC) and preprocessing, read alignment and postprocessing, multi-sample variant calling and annotation, and variant filtration. The pipeline can also detect LOH and CNVs in paired tumor-normal samples. We implemented three aligners (BWA, Bowtie2, Novoalign) and four variant callers (GATK, SAMtools, Atlas2, Freebayes) to generate twelve sets of calls. The joined list of variants is ranked through majority voting and further filtered to remove common variants in 1000 Genomes and ESP6500. The final list of variants is prioritized based on deleterious prediction, conservation, Combined Annotation–Dependent Depletion (CADD), and gene-level network and pathway analyses. An internal database (VariantDB) and a web server were built for storage, fast retrieval and comparison of the results. We evaluated the performance of our pipeline on published benchmark SNP and InDel calls and simulated datasets. It demonstrated high sensitivity (99.04% for SNPs, 87.28% for InDels), specificity (99.99%) and precision (99.10%). The WES pipeline has been implemented on the institutional high-performance commutating (HPC) server and Amazon Cloud. With 200-300 cores, 10 human WES samples can be analyzed within 16-24 hours. As one of our core data analysis platforms, we have used the WES pipeline to confidently identify candidate and/or novel genetic mutations in rare Mendelian disorders, disease treatment, and cancer predisposition studies.

A41 - Characterization of AML tumor genomes and transcriptomes for development of effective, targeted cancer treatments

Jessica Pilsworth, Genome Sciences Centre, Canada

Short Abstract: Over 80% of acute myeloid leukemias (AML) contain genomic rearrangements that involve genes related to hematopoietic lineage development. The resulting chimeric transcripts are potential drivers in tumorigenesis and therefore represent ideal diagnostic and therapeutic targets. We performed de novo genome assemblies using ABySS and transcriptome assemblies using Trans-ABySS of tumor samples (and matched normal, when available) from cancer patients diagnosed with AML. We report on the development of a bioinformatics pipeline that identifies mutations accurately and rapidly. For leukemogenesis to occur, two types of mutations are required: 1) a mutation that improves hematopoietic cells’ ability to proliferate (e.g. internal tandem duplication events in the FLT3 gene); and 2) a mutation that prevents the cells from differentiating (e.g. fusion events between the genes PML and RARA, and partial tandem duplication events in the MLL gene). Our analysis pipeline consistently identifies both classes of mutations, including the above three example events, which are all clinically relevant markers for AML treatment. Clinicians conventionally use standard treatment regimes based on a general understanding of a disease, and move to alternative treatments if patients do not respond. Clinical genomics holds great promise to change that paradigm, where each cancer patient will first be evaluated at the genomic level. These results will inform and enable clinicians to develop treatment plans on a per-patient basis. The reported work offers a bioinformatics pipeline that contributes to this new vision of cancer care.

A42 - The Cure: Making a game of gene selection for breast cancer survival prediction

Benjamin Good, The Scripps Research Institute, United States

Short Abstract: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility and biological interpretability. Methods that take advantage of structured prior knowledge show promise in helping to define better signatures but most knowledge remains unstructured.

Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes previously unheard of. Here, we developed and evaluated a game called The Cure on the task of gene selection for breast cancer survival prediction. Our central hypothesis was that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from game players. We envisioned capturing knowledge both from the players prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game.

Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted more than 1,000 registered players who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data clearly demonstrated the accumulation of relevant expert knowledge. In terms of predictive accuracy, these gene sets provided comparable performance to gene sets generated using other methods including those used in commercial tests. The Cure is available at http://genegames.org/cure/

A43 - Pan-cancer analysis of alternative splicing events reveals novel tumor biomarkers shared by different tumor types.

Daryanaz Dargahi, Canada’s Michael Smith Genome Sciences Center, Canada

Short Abstract: More than 90% of human genes are shown to alternatively splice and many of these isoforms are specifically associated with cancer. The high-throughput and single-nucleotide resolution of RNA sequencing offers an unprecedented opportunity to examine the structure of the cancer transcriptome. Identification of novel and cancer-associated alternative splicing (AS) events may help in better understanding of cancer pathogenesis, finding novel cancer-associated biomarkers, as well as novel therapeutic targets. We obtained RNA sequencing data from more than 3,750 patients from 14 different cancer types generated by The Cancer Genome Atlas (TCGA) Consortium. Using Trans-ABySS, a de novo short-read transcriptome assembly method, we identified recurrent AS events (or present in at least 10% of samples in a tumor type) that are cancer-associated compared to a large compendium of normal transcriptomes gathered from the Illumina BodyMap 2.0, TCGA, and an in-house database. These AS events include skipped exons, alternative 5’ donor sites, alternative 3’ acceptor sites, novel exons, and retained introns. In total, we identified 267 putative recurrent tumor-associated AS events that occur in more than one tumor type. Using PCR, we validated 5 (out of 267) AS events in an orthogonal panel of tumor tissues and cell lines; currently, we are in the process of validating the rest. Most strikingly, we found two highly frequent cancer-associated variants of matriptase (ST14 gene) that are shared across epithelial-derived tumors. Discovery of such cancer-specific AS events provides the foundation to improve our understanding of cancer pathogenesis and to open new avenues in therapeutic development.

A44 - Drugging the Undruggable : Systematic Identification of Drug Candidates Which Specifically Inhibit Targeted Protein-Protein Interactions

Revonda Mehovic, Ensemble Therapeutics, United States

Short Abstract: Protein-Protein Interactions (PPIs) represent some of the most attractive targets for drug discovery due to their critical role in multiple diseases. Yet many of these targets have been resistant to drugging by conventional small molecule approaches. Ensemble has been able to overcome this barrier by employing combinatorially synthesized macrocycles, known as Ensemblins, against these targets. These Ensemblins are generated using DNA-Programmed Chemistry (DPC), a proprietary technology that enables Ensemble to both control the large-scale synthesis of these compounds as well as providing a de facto barcoding strategy.

Under screening conditions, a target of interest is exposed to hundreds of thousands of these Ensemblins at a time. By taking advantage of the DNA barcode, we are able to exploit Next Generation Sequencing (NGS) in order to both determine a given Ensemblin’s target selectivity and to estimate its efficacy. In order to be considered as an early drug candidate, a given Ensemblin must meet four criteria. Namely, 1. It must be significantly enriched within the screen. 2. It must not be significantly enriched in a mock, e.g. control screen 3. It must be target specific, and 4. It must be structural related to other significantly enriched compounds. In order to identify Ensemblins which meet these criteria, we have developed a series of bioinformatics tools which allow us to quickly discriminate between strong and weak candidates.

A45 - Gene coexpression analysis in the frontal cortex of bipolar patients and controls

Paul Pavlidis, University of British Columbia, Canada

Short Abstract: Several large-scale gene expression analyses have addressed the question of genes differentially expressed in bipolar disorder. While the list of candidate genes is continuously growing, the reproducibility of the results at the gene level is relatively low. Systemic approaches evaluating expression data based on enrichment analyses appear to provide more robust results but are limited to our current biological knowledge. Such approaches are biased towards more studied processes and genes and may neglect important but less examined connections.
Coexpression analyses have the potential to uncover previously unknown functional connections between proteins, and while noisy, are not as biased by prior knowledge. We performed coexpression analyses of eight gene-expression datasets from brains of ~200 bipolar patients and controls. We concentrated on a subset of genes, each of which exhibited an extreme expression value in at least several diseased individuals. The coexpression analysis of these genes was performed in the control samples. Since by our definition the expression of these genes in controls is more homogeneous, we assume that data from these samples represents the baseline relationships among the transcripts.
Our results show that the studied genes, while not consistently showing differential expression among subjects, are highly coexpressed in the human brain, suggesting existence of yet unknown functional relationships between the genes' products. Given the polygenic and multifactorial nature of bipolar disorder, we believe that our approach to gene expression analysis may provide an important insight into the pathophysiology of bipolar disorder and other complex disorders.

A46 - Network-Augmented Genomic Analysis (NAGA) applied to Cystic Fibrosis studies

Salvatore Loguercio, The Scripps Research Institute, United States

Short Abstract: Cystic fibrosis (CF) is an early onset disease characterized by a defect in the apical chloride channel, CF transmembrane conductance regulator (CFTR). The most common disease causing mutation is a 3 base pair deletion resulting in loss of Phe 508 (F508del), which leads to misfolding, endoplasmic reticulum (ER) retention and efficient ER-associated degradation of the protein.
To elucidate the molecular networks influencing the folding and function of CFTR in CF, we recently screened CFBE41o- cells containing F508del-CFTR against a siRNA library of 2500 targets known to be involved in protein homeostasis. In parallel, we generated a high confidence CFTR interactome of F508del-CFTR in the same cell line.
Given a list of high-scoring siRNA hits for CFTR rescue of function, and a set of CFTR binding proteins, we sought to connect these datasets through an integrated protein-protein interaction network, and use shortest path analysis to uncover the minimal network structure consistent with both the CFTR interactome and siRNA data. The goal of this approach is to prioritize proteins connecting CFTR with siRNA hits that may act as central “hubs” in cellular processes required for CFTR functional rescue. For each protein in the subgraph, it computes the number of distinct siRNA hits that utilize the protein on its shortest path to CFTR. In order to filter out nonspecific protein hubs, this computation is repeated using a random selection of hits from the original siRNA library. The analysis identified several novel candidates for CFTR rescue of function that could be validated through targeted siRNA screens.

A48 - Human Tumor Model-based Databank Shared Resources

Aik-Choon Tan, University of Colorado Denver, United States

Short Abstract: The traditional approach of developing new cancer therapies based on cancer cell lines has a remarkably high failure rate. Approximately 95% of potential treatments that are effective in these contexts fail to exhibit similar level of effects on patients. This discrepancy is attributable to the heterogeneous environment present in human tumors, which is in huge contrast to the “pure” tumor cell lines. An emerging approach termed Patient-Derived Tumor Xenografts (PDTX) is to transplant human tumors into immune-deficient mice for growth and development over a few generations. The new approach has proven more effective in understanding human tumors as it maintains the tumor heterogeneity and microenvironment. We are developing a Patient-Derived Tumor Xenografts Cancer Genomics Portal (PDTX-CanGen) to facilitate the drug development. The PDTX-CanGen Portal will incorporate genomics data (RNA-Seq, Exome-Seq, etc) from the PDTX models, which supports online queries, visualization and analysis of gene expression and alterations in genotype and cell signaling pathways. The interactive web interface will facilitate researchers to dissect genomics data for identifying molecular signatures that could be targeted for treatments in the PDTX cancer model. Our portal will not only support user-customized data analysis, it will also enable integration with data from public sources such as TCGA and Cancer Cell Line Encyclopedia (CCLE) data sources. Whenever available, the returned query results will be linked to clinical information on cancer patients’ survival and responsiveness to treatments. Therefore, the portal will serve the general objective of helping researchers develop effective cancer treatments.

A49 - Identification of circular RNAs (circRNAs) and their potential regulation in Breast Cancer Subtypes

ASHA NAIR, Mayo Clinic, United States

Short Abstract: Recent RNA-Seq analyses have revealed importance of circRNAs in several diseases. CircRNAs are stable, nonlinear, 3’ to 5’ fused RNA transcripts that have shown evolutionary conservation across species and lack translational potential to proteins. There have been intriguing discoveries highlighting the functional impacts of circRNA on gene expression. It is known that these circRNAs may provide opportunities to treat diseases, such as cancer. We processed 885 Breast Cancer (BRCA) RNA-Seq samples from The Cancer Genome Atlas (TCGA) using our circRNA workflow. Our initial analysis consists of 141 Triple-Negative (TN) BRCA samples (128 tumors and 13 normal-tumor pairs). In the 13 tumor-normal pairs, we have identified 47 Somatic circRNAs that are expressed (10 or more reads). Of the 47 unique candidates identified, 11 circRNAs overlap with each other and 36 correspond to unique chromosomal loci. Chromosome 6 seems to be enriched with circRNAs compared to others. The circRNAs were then further annotated using the UCSC RefSeq gene definition. The 36 unique circRNA loci correspond to 2,220 genes. We found 903 cancer related tumor markers, 129 microRNAs, 8 non-coding RNAs, 279 genes associated with inflammatory response, 93 genes related to connective tissue disorders and 39 genes linked to antimicrobial response. We are currently analyzing somatic circRNAs from Luminals and HER2 positive BRCA subtypes. We plan to identify somatic circRNAs that are common and unique across BRCA subtypes. We will further explore known-pathways as well as microRNA binding hotspots of these circRNAs to understand their regulatory potential in BRCA disease.

A50 - Gene expression analyses of RNA-Sequencing data across multiple cancers to identify Basal-like cancer subtypes

Kevin Thompson, Mayo Clinic, United States

Short Abstract: Introduction:
Basal-like breast cancer is identified using expression profiling techniques, such as the PAM50 intrinsic signature model. Basal-like breast cancer is a subgroup of breast cancer associated with poor prognosis, defects in homologous recombination, sensitivity to platinum and Parp inhibitors, and resistance to standard chemotherapy. Recently, the PAM50 model was applied to 5 non-breast cancers, and identified basal-like populations. Characterizing basal-like core gene signatures could offer novel therapies for multiple populations of cancer patients.

Methods:
We obtained RNA-Seq gene expression data from The Cancer Genome Atlas data portal: 488 lung adenocarcinoma, 483 lung squamous, 262 ovarian, 584 breast, 497 head and neck, and 386 colon samples. The PAM50 intrinsic gene signature was used to elucidate basal-like cancers, which were substantial in head and neck (96%), squamous lung (88%), ovarian (73%), in addition to the basal-like breast cancer (20%). Consensus cluster and cluster validation analysis were performed on each cancer cohort using the most variable gene expression.

Results and Future Work:
We have identified 1,214 basal-like cancers, representing 45% of the cancers, and confirming that the intrinsic PAM50 gene signature elucidates basal-like cancers in non-breast cancer samples. The basal-like ovarian cancer cohort consisted of 2 clusters, while the other basal-like tumors had 3 clusters each. Establishing correlations between cluster centroid prediction models is to be done and core gene signatures to be elucidated. These studies may identify non-breast cancers in which regimens with known efficacy in basal breast cancer could be tested.

A51 - Extreme conservation and divergence of functional regions in Influenza A pandemic subtypes: A peaceful journey or a wild ride?

Dmitry Korkin, University of Missouri, United States

Short Abstract: Pandemic subtypes of Influenza A have posed serious health and economic stress worldwide. In 2009, the H1N1 subtype resulted in significant economic loss and 61,000,000 infections. Recently, interest has been placed on subtypes H3N2, H5N1, and H7N9. Because no universal vaccine or treatment exists, these common viruses still pose a serious risk. It is imperative to understand the evolutionary dynamics of these pandemic viruses. This is a challenging task due Influenza’s ability to reassort its genome.

Here, we study the evolutionary patterns of different influenza subtypes by determining highly conserved and diverse regions using structural bioinformatics, unsupervised learning, and Metropolis Monte Carlo simulation. Our automated computational pipeline employs an integrated measure of residue diversity at each position taking into account entropy, substitution similarity, and percentage of gaps. To determine surface regions of extreme divergence we develop a Metropolis Monte Carlo approach that samples structural patches surrounding the most diverse residues on the surfaces of influenza proteins. To determine surface regions of extreme sequence conservation we employ a clustering procedure that is based on the surface distance between residues. This information is used to obtain and compare an evolution-specific interaction network for influenza subtypes which incorporates both intra-viral and host-pathogen interactions. Finally, we develop an efficient computational method to comprehensively identify patterns of extreme similarity between functional regions of the proteins obtained from temporally distant influenza strains. Our findings may provide further insights into designing new targets for universal antivirals and explaining the age-related immunity to current and future influenza pandemics.

A52 - High-Performance Bayesian Statistical Model Checking for Behavioral Exploration of Stochastic Computational Epidemiological Models

Raj Dutta, University of Central Florida, United States

Short Abstract: Since the days of Dr. John Snow’s analysis of cholera in 1854, our understanding of the mechanisms underlying epidemiological has reached a turning point where we are able to build detailed computational models that capture the emergence, spread and containment of infectious diseases. However, different computational models often provide different predictive outcomes for the same scenario – thereby making them unreliable for use by government agencies.
We have been developing high-performance Bayesian statistical model checking methods for validating the outcome of parameterized probabilistic complex computational models employed in epidemiology against massive data sets and expert insights.

Traditionally, expectations on computational models have often been described using temporal logics that are capable of capturing the notion of tense and modalities in natural languages. In our work, we have extended the correctness specification of epidemiological models to probabilistic spatio-temporal logics that can capture both the ideas of tense and space, as well as the notions of modalities and multiple outcomes.

By embedding specifications on which the models have been verified into the modeling framework itself, we prevent a misuse of the epidemiological model in scenarios that do not conform to the scenarios where its behavior has been verified. We believe that the high-performance Bayesian statistical model checking provides an extreme-scale solution to the problem of validating epidemiological models against ground truth. Our sustained efforts in verifying deployed computations models will provide high-assurance epidemiological models and increase their adoption by practitioners in daily decision-making.

A53 - Identification of Significant Features by the Global Mean Rank Test

Martin Klammer, Evotec (Munich) GmbH, Germany

Short Abstract: With the introduction of omics-technologies such as transcriptomics or proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions between features. With the MeanRank test we aimed at developing a test which is robust against these constraints, while favorably scaling with the number of replicates.

MeanRank is a global one-sample test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation.
In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset.
MeanRank outperformed the other global methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed better or similar in our simulated datasets, and proved to be more robust against changes in preprocessing when applied to the spike-in data.
The best performing tests, MeanRank and Significance Analysis of Microarrays, were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies.

A54 - Simultaneous Identification of Multiple Driver Pathways in Cancer

Benjamin Raphael, Brown University, United States

Short Abstract: An important challenge in cancer genome sequencing is to distinguish the small subset of somatic driver mutations that cause cancer from the multitude of random passenger mutations in a tumor. Since patients with the same cancer type typically have different collections of mutations, single-gene tests of recurrence are insufficient for this task. We present Multi-Dendrix, an algorithm to identify combinations of mutations with combinatorial properties consistent with cancer pathways. Multi-Dendrix does not use prior knowledge of pathways, and finds multiple sets of mutations simultaneously since driver mutations target multiple pathways in a patient. We applied Multi-Dendrix to glioblastoma and breast cancer data from The Cancer Genome Atlas. In both cancers, Multi-Dendrix identified gene sets overlapping major signaling pathways -- including Rb, PI(3)K, and p53 -- that were manually annotated in the TCGA publications, as well as novel gene sets that include transcription factors and regulators.

A55 - Screening hormone metabolism pathway genes for variants that might cause early-onset breast cancer

Khalid Mahmood, University of Melbourne, Australia

Short Abstract: Breast cancer is the most common malignant disease in Australian women. It is known that hormones play an important role in breast cancer aetiology. Genetic epidemiological studies have found associations between SNPs in region of hormone metabolism genes and breast cancer risk. To try to determine causal variants, we have performed targeted DNA sequencing of ~10Mb in and around selected hormone metabolism genes for 150 patients (aged < 40) from the Australian Breast Cancer Family Study. Bioinformatics analysis was performed using GATK best practices and variants were identified using VarScan, UnifiedGenotyper and HaplotypeCaller. It was reassuring to observe high concordance between the three algorithms. Variants were filtered based on quality measures and 1KGP variants and grouped into coding and non-coding categories. Coding variants were annotated using SnpEff and functional effects were predicted using PolyPhen. Ranking the approximately 99% non-coding variants is not trivial, so we used an ensemble of resources to rank variants including annotations from ENCODE, sequence conservation scores (e.g. GERP++), as well as scores from CADD and FunSeq. As a result, we have identified few hundred variants with putative deleterious effect. Work towards identifying causal variants is ongoing using select UK10K whole genomes as a reference group, taking into account family cancer histories of case-carriers and variant status of relatives from whom a blood sample has been obtained. We believe our analysis pipeline used to rank all variants and calculate multi-sample statistics has potential wider use in the bioinformatics community and will be shared publicly.

A56 - OncoCNV: a multi-factor data normalization method for the detection of copy number aberrations in amplicon sequencing data

Valentina Boeva, Institut Curie, France

Short Abstract: High-throughput sequencing (HTS), although widely used in biomedical research, is increasingly making its way into clinics. Here, it is helping to identify and designate personalized treatment for cancer patients based on the information on ‘actionable’ mutations, i.e. mutations influencing cell sensitivity to a particular targeted therapy. Since only a limited number of genes are informative for treatment, expensive genome-wide profiling would not be required. Thus, clinics often employ a less costly amplicon sequencing method that focuses on a selection of actionable genes. However, although amplicon sequencing allows us to reliably detect point mutations, so far we have been unable to assess DNA copy number aberrations, essential for detecting the involvement of some oncogenes. As a result, HTS is often complemented in clinics by other techniques, in particular microarrays (such as array CGH or SNP).

Here, we show how amplicon sequencing data can be effectively exploited in clinics in a way that would mean the use of microarrays would become unnecessary. We present ONCOCNV (http://oncocnv.curie.fr/), the method which is, to the best of our knowledge, the first to allow for the accurate detection of gene copy number aberrations based solely on amplicon sequencing data. ONCOCNV includes a multi-factor normalization and annotation technique enabling the detection of large copy number changes from amplicon sequencing data. We validated our approach on high and low amplicon density datasets and demonstrated that ONCOCNV can achieve a precision comparable to that of array CGH techniques in detecting copy number aberrations.

A57 - Gene expression deconvolution of public cancer datasets

Neta Zuckerman, Stanford / City of Hope, United States

Short Abstract: Publicly available databases such as the gene expression omnibus (GEO) are replete with gene expression microarrays of large patient pools from unique experimental conditions never to be repeated. However, the majority of these datasets are whole tissue samples that contain mixtures of different cell types. For example, the thousands of profiled breast tumor tissue datasets deposited in GEO do not only contain tumor cells but also infiltrating immune cells and additional microenvironmental cells. Individual cell type profiles and/or any information regarding the identify or proportions of the constituent cell types in the tissue rarely exist. We developed a novel approach to blindly estimate the identity, relative proportions per sample and separated gene expression profiles of the cell types that constitute mixed cell tissues. The only a-priori information needed is an initial estimate of the cell types in the tissue analyzed and general reference signatures of these cell types that may be easily obtained from public databases such as GEO. We have applied our method to perform a comprehensive analysis of whole tissue samples from publicly available datasets of different cancer types, mainly breast cancer. Re-examination of these datasets and analysis of the individual cell types holds great promise in discovering new phenomena otherwise not detected in the mixed tissue samples of individual experiments.

A58 - Object Oriented Data Analysis (OODA) Approaches to Microbiome Data

Perry Haaland, BD Technologies, United States

Short Abstract: This work focuses on the microbiome of the lower lung based on sequencing of DNA from bronchoalveolar lavage (BAL) samples. This is relevant to lower respiratory infections (LRI) and to ventilator associated pneumonia (VAP) as it occurs in the hospital intensive care unit.

We analyze published data consisting of taxonomic identifications and relative abundances for BAL samples from ICU patients. With the goal of characterizing similarities and differences among four diagnosis groups, we employ object oriented data analysis (OODA) as a statistical framework.

The goal of OODA is to carefully define and appropriately combine the possible data objects. First, we consider relative abundances as data objects and consider species richness measures using standard multivariate analysis and visualization techniques. Second we construct a support tree that captures phylogenetic relationships. We consider relative abundances along with genetic distances in the context of tree structure as data objects. Third, we consider patient specific subtrees as data objects. We propose and implement a similarity measure for tree objects. The measure integrates within-sample branch distances into leaf abundance differentials.

Our results suggest distributional differences in diversity among diagnosis groups. For example, certain taxa are dominant in the Principle Component Analysis directions and these taxa are over-represented in subjects with low species richness. Taking into account tree structure suggests interesting clusters of patients that cut across diagnosis groups.

We believe that OODA provides a feature-rich data analysis strategy in support of our long-term goal to better characterize LRI and to provide guidance in selecting antimicrobial therapies.

A59 - Extensive signature-based characterization of cancer transcriptomes reveals novel links between known cancer subtypes and biological pathways

Eike Staub, Merck Serono, Germany

Short Abstract: Multi-gene expression signatures have been widely applied to characterize different molecular subtypes in various cancers. These subtypes often differ in prognosis, reflect a tumor’s cell-of-origin and require adapted therapeutic approaches. As an alternative to subtype assignments based on only small sets of signatures, we strive for comprehensive portraits of tumor expression phenotypes using large collections of signatures. Thereby, we aim to decrease the chance to miss important sub-classes and better capture the combinatorics of transcriptional phenotypes in cancer.
We present a computational framework for expression signature assessment and scoring that we apply to cancer subtype recognition. Using simulated and real data we compare several approaches for the identification of coherent gene signatures. When screening large collections of gene signatures, only signatures with best properties are used for characterization of single tumors in a data set.
We use our framework to compare multiple breast cancer and ovarian cancer expression data sets in the light of a larger set of further cancer-related datasets. Our approach enables us to highlight some new findings about common disease characteristics of breast and ovarian cancer.

A60 - A systematic approach to a transcriptome analysis to asthma sputum inflammatory phenotypes

saeedeh maleki-dizaji, Sheffield University,

Short Abstract: Asthma is a heterogeneous disease that can be split into different phenotypes. One of these partitions is created using sputum inflammatory cell counts, that categorise asthma into four groups eosinophilic, neutrophilic,mixed granulocytic and pauci-granulocytic. These groups have been explored in a number of different settings in asthma but have surprisingly been under-represented in sputum transcriptomics. The aim of this work is to determine differences and similarities across sputum inflammatory groups in terms of clinical outcomes, gene expression, gene ontologies and gene pathways. The Taverna Workbench was used to construct complex data analysis pipelines. These workflows, are then able to be shared and executed repeatedly over different sets of data, giving all users access to complex analysis methods. The re-usable nature of workflows, allows the automation and standardisation of methods, hence reducing the time required for analyses. Infection ontologies and pathways were associated with both neutrophilic and mixed granulocytic groups. The neutrophilic sputum group alone is associated with genes that are involved in epithelial damage, apoptosis and cell proliferation. Two pathways that were associated with gene expression of neutrophil vs control were that of IgE related asthma and IgG phagocytosis suggesting the neutrophil phenotype in some patients could relate to a typical acute asthma attack, rather than a consistent phenotype through time. The mixed granulocytic phenotype could be a separate phenotype than the neutrophilic group or as this study is only cross-sectional the mixed granulocytic phenotype and neutrophilic phenotype could be a stage i

A61 - CRAVAT 3.0: informatics tools for high-throughput analysis of cancer mutations

Rachel Karchin, Johns Hopkins University, United States

Short Abstract: CRAVAT provides high-throughput services for researchers to annotate and prioritize small-scale mutations and genes discovered in the exomes of normal tissues and cancers. The services are publicly available and free for non-commercial use through both a graphical web interface and a RESTful web services API.

Mutations can be submitted in either a basic genomic format, in which the chromosomal location and nucleotide changes are specified, or in transcript format, with a specified RefSeq, CCDS or Ensembl transcript and protein sequence changes. For both submission formats, we provide mutation- and gene-level annotations and bioinformatics scores, to enable extraction of biologically-relevant results from the output of a sequencing study.

New features in CRAVAT 3.0 include mapping of a mutation to all relevant transcripts in RefSeq, CCDS, and Ensembl; identification of mutation consequence types and bioinformatics scoring of mutations with respect to all transcripts; mapping of mutation consequence types onto Sequence Ontology terms; tissue-specific allele frequencies of indels in COSMIC; allele frequencies of indels in ESP6500; and mappability warnings for mutations that occur in genomic regions prone to sequencing and alignment errors.

We have done substantial performance enhancement and currently the service can process a submission of a million mutations within 24 hours. Our user base spans North America, South America, Europe, East and South Asia, and Russia, and includes academic centers, government labs, and biotechnology companies. In the past year, CRAVAT processed over 3400 jobs from over 450 unique users, with analysis provided for over 265 million mutations.

A62 - DINIES: A web-based application for predicting drug-target interaction networks

Yoshihiro Yamanishi, Kyushu University, Japan

Short Abstract: The identification of drug-target interactions, which are defined as interactions between drugs (or drug candidate compounds) and target proteins (or target candidate proteins), is an important part of genomic drug discovery.
In this poster, we present DINIES (Drug-target Interaction Network Inference Engine based on Supervised analysis: http://www.genome.jp/tools/dinies/), a web server to predict unknown drug-target interaction network from various types of biological data (e.g., chemical structures, drug side-effects, amino acid sequences, and protein domains) in the framework of supervised network inference. The prediction is performed with the state-of-the-art machine learning methods in chemogenomics and pharmacogenomics, assuming that similar compounds (not necessarily in chemical structures, but also in side-effect profiles etc.) are likely to interact with similar proteins. The method is suitable for predicting potential off-targets of marketed drugs with known targets and potential target profiles of new drug candidate compounds without known targets. The server is compatible with the KEGG database by sharing the same identifiers, which leads to integrative analyses with useful components in KEGG such as biological pathways, functional hierarchy, human diseases, and drug classification.

A63 - Systematically inferring regulatory mechanisms from disease-associated loci

Lucas Ward, Massachusetts Institute of Technology, United States

Short Abstract: The unprecedented breadth of reference chromatin state maps from the Roadmap Epigenomics Project provides the opportunity to dissect disease regulatory mechanisms, both globally and at individual loci.

We developed a new regulatory annotation enrichment test that takes into account uncertainty in causality on each haplotype and varying LD block length, and applied it to the NHGRI GWAS catalog, which we systematically reannotated using a controlled set of reference ethnicities and 1000 Genomes Phase 1 data imputation. Systematic application of our new method using 127 reference maps of enhancer regions across diverse tissues and cell types revealed remarkably tissue-specific enrichments for many traits: of 96 non-coding breast cancer loci, 45 overlap enhancers in mammary epithelial cells, a 2.0-fold enrichment over other GWAS loci (p=10-5), and HDL cholesterol SNPs were 3.5-fold enriched in liver enhancers (p=10-5). The enriched tissues often revealed disease-relevant organ systems that were not previously implicated: schizophrenia loci and brain germinal matrix enhancers, and vitiligo loci and CD19+ B cell enhancers. Some enrichments only became visible when considering modules of coordinated enhancer activity, rather than individual tissues in isolation: cognitive performance loci and brain enhancers, and hematological phenotypes and mesenchymal stem cells.

We then applied a motif analysis to annotate individual variants within enhancer regions as mechanistic drivers of the observed associations. We find that 1342 noncoding disease-associated haplotypes from the GWAS catalog disrupt driver motif instances, providing hypotheses for validation. We have incorporated these regulatory annotations in our online tool HaploReg.

A64 - An effective text query tool for prioritizing protein structures via their known biological functions with an application to rank drugs relevant to disease

William McLaughlin, The Commonwealth Medical College, United States

Short Abstract: We describe improvements of an online tool for prioritizing proteins relevant to a text search. KB-Rank, http://protein.tcmedc.org/KB-Rank, allows a user to input a text query, such as a description of a phenotype, and retrieve a list of protein structures that are ranked according to their estimated relevance to the query. One aspect of the tool is that the ranking is entirely based on the functional attributes of the protein structures. Attributes used for ranking include those assigned either at the level of entire protein structures or at the level of specific residues within the structures. Proteins thereby serve as effective common denominators for the integration of the varied biomedical information. An emergent utility of the tool is that it allows for ranking the biological functions relevant to the query, where the rank score of a function is the average rank score of the protein structures with that function. An important application is the ability to rank drugs relevant to a disease query.

Financial support was provided in part by the NIGMS [grant number 5U01 GM093324-02].

A65 - The Sweden Canceromics Analysis Network – Breast (SCAN-B) Initiative and Information Management Solution Using BioArray Software Environment (BASE)

Johan Vallon-Christersson, Clinical Sciences, Sweden

Short Abstract: Current clinical practice for evaluation of breast cancer is imperfect and additional criteria are required to better personalize treatment. The Sweden Cancerome Analysis Network - Breast (SCAN-B) initiative is a prospective study of breast cancer using genomic technologies with the aim of real-time clinical implementation of diagnostic, prognostic, and predictive tests. One initial objective is gene expression profiling of tumors by next-generation RNA-sequencing on the Illumina platform. Since 2010, more than 4900 patients have enrolled representing 85% of eligible patients in the catchment area. Inclusion of patients, collection of biomaterial – including blood and primary tumor – and sample processing take place asynchronously and at separate sites. Information on patients and biomaterials must be tracked together with laboratory processing steps and sequence analysis of derivates such as extracted RNA and DNA. To this end we use BioArray Software Environment (BASE), an application that include support for analysis and a laboratory information management system. BASE was originally created for microarray experimentation but has expanded to handle next generation sequencing and is highly customizable to accommodate local needs. Through extensions, highly specified applications can be added and we have created solutions for laboratory information collection and data analysis. Extensions handle asynchronous registration of patients and biomaterials, automates information flow and collection from laboratory equipment, and validate relations between existing entries within the system. The solution provides traceability from patient through inclusion, sampling of specimen, sample processing and quality control measurements, sequencing, and subsequent analysis, all the way to end-data reports.

A66 - Conserved PU.1 and ETS transcriptional networks control the progression of Alzheimer’s disease

Andreas Pfenning, Massachusetts Institute of Technology, United States

Short Abstract: A growing number of loci are being implicated in Alzheimer’s disease (AD) genome-wide association studies. Many of the highest scoring SNPs fall in non-coding regulatory regions, making their interpretation a challenge. To identify the transcriptional networks that control AD predisposition and progression, we combined epigenome-wide maps of an inducible mouse model of neurodegeneration with epigenome-wide maps of human brain tissue from AD cases and controls. We found a strong enrichment of AD-associated SNPs in conserved immune enhancers that increase in H3K27 acetylation activity during the progression of neurodegeneration in mouse. Although the genome sequence of these enhancer regions is often poorly conserved between mouse and human, there was a strong enrichment for PU.1 and ETS binding sites in both species. To test for the relevance of these transcription factors in the progression of AD in human, we used methylation profiles from the prefrontal cortex of 438 Alzheimer’s patients and 285 controls. We found the genes near both PU.1 and ETS transcription factor binding site motifs were enriched for probes differentially methylated in AD cases relative to controls. Our results reveal the importance of PU.1 and ETS transcription factors in mediating the Alzheimer’s disease predisposition and progression. Furthermore, the strong conservation of the implicated transcriptional network in mediating neurodegeneration between mouse and human allows us to manipulate and explore these pathways in vivo.

A67 - Human disease network crossing large-scale clinical records and underlying genetic profiles

Atul Butte, Stanford University School of Medicine, United States

Short Abstract: Geneticists and epidemiologists often observed that certain disorders co-occur in individual patients significantly more frequently than expected, suggesting there is a conserved disease-related functional modules, such as gene expressions and protein-protein interactions (PPIs).

Based on the modification of Google’s PageRank algorithm and large-scale clinical records from 121K patients, we suggest a directed network model consisting 1,320 nodes of disease onsets and 22,134 directional edges of followed disease outbreaks (φ-correlation coefficient > 0.0). By pairwise analysis of clinical records, we integrated disease-associated mRNA-expressions using PPI network frames to unravel underlying genetic architectures of human diseasome. Network of diseases-associated expressions were significantly overlapped each other, and overlapping of disease related gene networks were associated with disease comorbidity. Interestingly, our clinical records integrated disease-network analysis identified functional modules for disease comorbidities with directions of primary disease onsets and followed sequential diseases. In addition, we also analyzed etiologies of diseases via integration of clinical records and disease associated genetic signatures including gene expressions and PPIs.

Our analysis, we successfully visualized human disease network, which describes disease onset patterns and associated genetic features including transcriptional signatures and PPIs.

A68 - Discovery of new inhibitors of Fatty Acid Amide Hydrolase (FAAH): Study leading to design of new pain and CNS disorder drugs

Asif Naqvi, BioDiscovery Group, India

Short Abstract: Fatty acid amide hydrolase (FAAH) is an integral membrane enzyme that hydrolyzes the
endocannabinoid anandamide and related amidated signaling lipids. Genetic or pharmacological inactivation of FAAH produces analgesic, anti-inflammatory, anxiolytic, and antidepressant phenotypes without showing the undesirable side effects of direct cannabinoid receptor agonists, indicating that FAAH may be a promising therapeutic target.
The study highlights the development of new FAAH inhibitors. 3000 molecules on structure similarity of
PF750 (inhibitor of FAAH) was taken and molecular docking approach using Lamarckian Genetic
Algorithm was carried out to find out the potent inhibitors for FAAH on the basis of calculated ligandprotein pairwise interaction energies. The grid maps representing the protein were calculated using auto grid and grid size was set to 60*60*60 points with grid spacing of 0.375 Ǻ. Docking was carried out with standard docking protocol on the basis of a population size of 150 randomly placed individuals; a maximum number of 2.5 *107 energy evaluations, a mutation rate of 0.02, a crossover rate of 0.80 and an elitism value of 1. Fifteen independent docking runs were carried out for each ligand and results were clustered according to the 1.0 Ǻ rmsd criteria. Further in-vitro and in-vivo study is required for the future design of new derivatives with higher potency and specificity.

A69 - Use of biomedical literature for biomarker discovery for metformin response and validation in hypothesis free GWAS studies

Ashfaq Ali, Lund University, Sweden

Short Abstract: Here, we aim to exploit candidate gene-metformin interactions identified from the biomedical literature to study the genetic basis of metformin response in hypothesis-free GWAS studies using a Genotype Based Recall (GBR) approach. We used an online biomedical literature-mining tool FABLE (http://fable.chop.edu/) to identify original research papers that mention “genes” and “metformin”. This search identified 1236 genes connected with metformin in 5963 unique articles. We selected 282 genes with at least 5 research papers each for KEGG pathways and protein-protein interactions network analysis using Cytoscape software. Our pathway analysis suggests significant enrichment of genes in insulin response and drug metabolism pathways and the PPI network revealed several centrally located proteins. We then assessed, using simulations, the statistical power advantage of selecting participants on the basis of discordant levels of a genetic risk score (GRS) comprised of variants that modify treatment effects. We demonstrated that this approach requires one half to one third of the sample size (for 80%), compared with a conventional clinical trial design for testing gene-metformin interactions in human populations. We are in the process of functionally annotating the genes by text mining the full-text articles and identify non-synonymous and regulatory SNPs in the genes selected from the pathway and the network analysis. We will then use the population level genetic data from the goDARTS cohort (n=~2000) to construct individual level genetic risk scores and apply GBR approach to analyze metformin response in type 2 diabetic patients.

A70 - Legacy Microarray Data in the RNA-Seq Era – A Biomarker Investigation

Zhenqiang Su, National Center for Toxicological Research of US FDA, United States

Short Abstract: We systematically evaluated the transferability of predictive models and gene signatures between microarray and RNA-Seq using a large clinical data set. We demonstrated that predictive models and gene signatures between microarray and RNA-Seq are mutually transferable. The results suggest that existing microarray data can be synergistically used with RNA-Seq data

A71 - A novel signal detection algorithm for drug effects using objective clinical measurements in Electronic Health Records

Tomasz Adamusiak, Medical College of Wisconsin, United States

Short Abstract: Prevention of adverse effects of pharmaceutical products is an important topic in patient safety. Electronic Health Records contain substantial data on prescribed drug products and clinical measurements. Clinical measurements in contrast to other categorical data in the EHR represent an objective assessment of patient state. We developed a novel algorithm for mining associations between drug therapeutic effects and clinical mea-surements (laboratory tests and vital signs). The study population consisted of 8 229 patients treated at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. 5 947 unique drug products were tested together with 431 unique clinical measurements. The analysis was further expanded to include 8 041 drug classes using RxNorm drug terminology. We used a self-control time series design to compare a pre-index baseline measurement with a post-index measurements within a 24 hour window for each of the patients and 663 859 drug-measurement pairs. Associations were tested for significance using Wilcoxon signed-rank test. 24 184 associations were found to be significant at a Bonferroni corrected p-value. We validated the approach using known drug effects, such as warfarin and increased INR (p heart rate (p

A72 - Computational Breath Analysis

Anne-Christin Hauschild, Max Planck Institute for Informatics, Germany

Short Abstract: Volatile organic compounds are metabolites emitted by all living cells and tissues. We seek to non-invasively ``sniff'' biomarker molecules that are predictive for the biomedical fate of individual patients. This promises great hope to move the therapeutic windows to earlier stages of disease progression. While portable devices for breathomics measurement exist, we face the traditional biomarker research barrier: A lack of robustness hinders translation to the world outside laboratories. To move from biomarker discovery to validation, from separability to predictability, we have developed several bioinformatics methods for computational breath analysis, which have the potential to redefine non-invasive biomedical decision making by rapid and cheap matching of decisive medical patterns in exhaled air. We aim to provide a supplementary diagnostic tool complementing classic urine, blood and tissue samples. The presentation will review the state of the art, highlight existing challenges and introduce new data mining methods for identifying breathomics biomarkers.

View Posters By Category

Search Posters:

TOP