Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Print your poster in Chicago
Poster Categories

View Posters By Category

Session A: (July 7 and July 8)
Session B: (July 9 and July 10)
A-180: Gaussian Mixture Modeling Identifies Bimodal Oncogenic miRNA Across Cancers
COSI: TransMed
  • Laura Moody, University of Illinois at Urbana Champaign, United States
  • Yuan-Xiang Pan, University of Illinois at Urbana Champaign, United States
  • Hong Chen, University of Illinois at Urbana Champaign, United States

Short Abstract: Tumors are highly heterogeneous such that expression of specific oncogenes may only be seen in a subset of patients. Such bimodal genes can be defined as those having two modes of expression within the same population. Given the large-scale regulatory role of microRNA (miRNA), we hypothesize that bimodal miRNA may be important biomarkers of tumorigenesis. Gaussian mixture modeling was applied to miRNA-seq data from 9 types of cancer. The underlying distribution of each miRNA was compared to non-cancerous tissue samples and a bimodality index was calculated. In head and neck, liver, lung, and stomach cancers, we identified bimodal co-expression of miR-105-1, miR-105-2, and miR-767. Patients with high expression of these miRNA module had poor survival prognoses. The computationally predicted targets of the 3 miRNA were then analyzed for bimodality, and functional annotation clustering revealed enrichment for transcriptional regulation. miR-105 and miR-767 were overexpressed in vitro, and putative target genes, including ATF3, MINOS1, PER2, SERTAD2, and SUMO1, were downregulated. Next, we will test miR-105 and miR-767 inhibitors as a viable cancer therapy by measuring proliferation in the cell model. Collectively, we show that bimodally expressed miRNA can be utilized to predict cancer prognosis and design personalized treatments.

A-270: Deciphering genetic diseases using WGS regulatory elements and ncRNAs
COSI: TransMed
  • Marilyn Safran, Weizmann Institute of Science, Israel
  • Ruth Barshir, Weizmann Institute of Science, Israel
  • Naomi Rosen, Weizmann Institute of Science, Israel
  • Michal Twik, Weizmann Institute of Science, Israel
  • Tsippi Iny Stein, Weizmann Institute of Science, Israel
  • Simon Fishilevich, Weizmann Institute of Science, Israel
  • Doron Lancet, Weizmann Institute of Science, Israel

Short Abstract: The avalanche of variants residing in genomic non-coding “dark matter”, available via whole genome sequencing (WGS), provide a challenging disease interpretation opportunity. The popular GeneCards Suite KnowledgeBase, encompassing ~150K annotated coding and non-coding genes in GeneCards(PMID:27322403) and ~20,000 annotated diseases in MalaCards(PMID:27899610), provides qualitatively and quantitatively rich entities and relationships for the Suite’s NGS tools: VarElect(PMID:27357693), the phenotype interpreter, accepts lists of genes and phenotypes as input, and computes prioritized direct (keyword-based) and indirect (inferred from gene-to-gene associations) gene/disease connections; TGex, our end-to-end NGS solution, is a VCF-to-report clinical analyzer which incorporates VarElect’s algorithms. WGS contributes three classes of functional genomic elements to variant analyses: promoters, enhancers, and ncRNAs, all central to tissue-related gene expression, with many underlying diseases. Together they amount to >20% of the new DNA territories. We’ve augmented GeneHancer(PMID:28605766), a novel regulatory element database with ~250,000 enhancers and promoters. Information is amalgamated from: ENCODE, Ensembl, FANTOM5, VISTA, dbSUPER, and now, UCNEbase, and EPDnew. In parallel, ~100,000 unified ncRNAs are consolidated from 21 general and specialized databases(PMID:23172862 and work in progress). The Suite’s WGS disease interpretation platform provides a comprehensive route to clinical significance of coding and non-coding single nucleotide and structural genomic variations, often elucidating unsolved cases.

A-272: PROMIS-Med: Precise and Reproducible OMICS-Data Management and Integrative System for Precision Medicine
COSI: TransMed
  • Zeeshan Ahmed, University of Connecticut Health Center, United States
  • Bruce T. Liang, University of Connecticut Health Center, United States

Short Abstract: To improve the quality and transition of healthcare, robust big data management platforms are necessary to analyze heterogeneous genomics and healthcare data of high volume, velocity, variety and veracity. Healthcare data includes information about patient life style, medical history, visits to the practice, wet lab and imaging test, diagnoses, medications, surgical procedures, consulted providers and genomics profile. Adequate and analytic access to the healthcare and genomics data has potential to revolutionize the field of medicine by developing better understanding of biological mechanisms and modelling complex biological interactions by integrating and analyzing knowledge in a holistic manner. To effectively meet the goals of implementing system for precision medicine, significant efforts are required from the experts in various disciplines, located within one or multiple organizational units. One of the major challenges is to establish an efficient and secure workflow that can connect all units to streamline transparent data flow, quality inspection, processing, analysis and sharing. Here we presents a new, user-friendly HIPAA compliant precision medicine platform i.e. PROMIS-Med towards complex and large scaled healthcare and genomics data management, analysis and visualization. PROMIS-Med is managing healthcare data of over 800,000 patients and helping integrative processing and analysis of genomics data of various kinds.

A-274: Association of Cell-Free DNA Tumor Fraction and Somatic Copy Number Alterations With Survival in Metastatic Triple-Negative Breast Cancer.
COSI: TransMed
  • Daniel Stover, Ohio State University Comprehensive Cancer Center, United States
  • Heather Parsons, Harvard University, United States
  • Samuel Freeman, Harvard University, United States
  • William Barry, Harvard University, United States
  • Hao Guo, Harvard University, United States
  • Atish Choudhury, Harvard University, United States
  • Greg Gydush, Harvard University, United States
  • Sarah Reed, Harvard University, United States
  • Justin Rhoades, Harvard University, United States
  • Denisse Rotem, Harvard University, United States
  • Melissa Hughes, Harvard University, United States
  • Ann Partridge, Harvard University, United States
  • Nikhil Wagle, Harvard University, United States
  • Ian Krop, Harvard University, United States
  • Gad Getz, Harvard University, United States
  • Todd Golub, Harvard University, United States
  • J. Christopher Love, Massachusetts Institute of Technology, United States
  • Eric Winer, Harvard University, United States
  • Nancy Lin, Harvard University, United States
  • Gavin Ha, Harvard University, United States
  • Viktor Adalsteinsson, Harvard University, United States

Short Abstract: Cell-free DNA (cfDNA) offers the potential for minimally invasive genome-wide profiling of tumor alterations without tumor biopsy and may be associated with patient prognosis. Triple-negative breast cancer (TNBC) is characterized by few mutations but extensive somatic copy number alterations (SCNAs), yet little is known regarding SCNAs in metastatic TNBC. We sought to evaluate SCNAs in metastatic TNBC exclusively via cfDNA and determine if cfDNA tumor fraction is associated with overall survival in metastatic TNBC. We identified 164 patients with biopsy-proven metastatic TNBC and performed low-coverage genome-wide sequencing of cfDNA from plasma. Without prior knowledge of tumor mutations, we determined tumor fraction of cfDNA for 96.3% of patients and SCNAs for 63.9% of patients. Copy number profiles and percent genome altered were remarkably similar between metastatic and primary TNBCs. Certain SCNAs were more frequent in metastatic TNBCs relative to paired primary tumors and primary TNBCs in publicly available data sets The Cancer Genome Atlas and METABRIC. Prespecified cfDNA tumor fraction threshold of ≥ 10% was associated with significantly worse metastatic survival (median, 6.4 v 15.9 months) and remained significant independent of clinicopathologic factors.

A-276: Variant annotation pipeline for the identification of splice site SNVs
COSI: Transmed
  • Susanne K Kirchen, Luxembourg Centre for Systems Biomedicine , Luxembourg
  • Dheeraj R Bobbili, Luxembourg Centre for Systems Biomedicine , Luxembourg
  • Patrick May, Luxembourg Centre for Systems Biomedicine , Luxembourg

Short Abstract: The genome of an individual person contains up to 3.5 million variants, which can potentially contribute to disease. Very often, little attention is paid on SNVs within splicing regions, which might alter splicing patterns and thus produce aberrant proteins. Although a variety of publications showed that splice site SNVs are associated to human disease, yet no efficient tool has been developed that is able to identify splice site disruptors out of large variant datasets. In contrast, splice-site prediction tools are mainly designed to process small number of sites. Hence, we aim to implement an effective and accurate annotation pipeline, which identifies deleterious synonymous splice site variants out of very large datasets. We integrate the MaxEntScan tool in our pipeline, as well as dbscSNV_ADA and dbscSNV_RF scores to predict the deleteriousness of SNVs within splicing regions. The pipeline is trained using variant sets from HGMD, ClinVar and gnomAD databases. Furthermore, we applied machine learning using a combination of MaxEntScan and other genomic scores, such as CADD13, DANN and FATHMM, DPSI and Eigen. Finally, we applied our pipeline on a large dataset of Parkinson’s disease cohort studies to identify novel links between splice-site mutations and disease.

A-278: pan-cancer identification of RNAs with biomarker potential in plasma
COSI: TransMed
  • Celine Everaert, Ghent University, Belgium
  • Kimberly Verniers, Ghent University, Belgium
  • Nurten Yigit, Ghent University, Belgium
  • Glenn Vergauwen, Laboratory of Experimental Cancer Research, Cancer Research Institute Ghent, Ghent University, Ghent, Belgium
  • An Hendrix, Laboratory of Experimental Cancer Research, Cancer Research Institute Ghent, Ghent University, Ghent, Belgium
  • Olivier Thas, Ghent University, Belgium
  • Hannelore Denys, Ghent University, Belgium
  • Philippe Tummers, Ghent University, Belgium
  • Jo Vandesompele, Ghent University, Belgium
  • Pieter Mestdagh, Ghent University, Belgium

Short Abstract: In cancer, liquid biopsies are an extremely attractive resource for the discovery of novel biomarkers. Numerous studies have demonstrated the presence of RNA-species in liquid biopsies. The expression of RNA molecules can be highly cell-type specific, suggesting they may serve as circulating biomarkers. To identify cancer-type-specific RNAs, we reprocessed smallRNA and polyA+ RNA-sequencing data of TCGA and TARGET database, resulting in the annotation and quantification of miRNAs, isomiRs, sn(o)RNAs, tRNAs, mRNAs and lncRNAs in almost 12,000 patient samples covering 40 cancer types. A novel expression specificity score was calculated, resulting in more than 6000 genes with a cancer-type-specific expression pattern. Whereas cancer-type-specific lncRNAs and mRNAs were identified in almost all cancer types, cancer-type-specific miRNAs were only found in a subset. To evaluate the potential of these cancer-type-specific RNAs as non-invasive biomarkers, we collected plasma samples from metastatic patients representing 34 cancer types. We prepared RNA from plasma and extra-cellular vesicles and analysed those using smallRNA sequencing to quantify circulating miRNA expression levels. Quantification of mRNAs and lncRNAs in plasma derived RNA is currently ongoing. Our preliminary findings indicate that a subset of cancer-type specific RNAs are detectable in plasma and may be used as biomarkers for diagnosis or disease monitoring.

A-280: Population-level distribution and putative immunogenicity of cancer neoepitopes
COSI: TransMed
  • Mary Wood, Oregon Health and Science University, United States
  • Mayur Paralkar, Carnegie Mellon University, United States
  • Mihir Paralkar, Carnegie Mellon University, United States
  • Austin Nguyen, Oregon State University, United States
  • Adam Struck, Oregon Health and Science University, United States
  • Kyle Ellrott, Oregon Health and Science University, United States
  • Abhinav Nellore, Oregon Health and Science University, United States
  • Reid Thompson, Oregon Health and Science University, United States

Short Abstract: Tumor neoantigens are drivers of cancer immunotherapy response; however, neoantigen prediction tools produce many candidates that require further prioritization for research/clinical applications. We investigated four peptide novelty metrics that help refine predicted neoantigenicity: paired MHC binding affinity difference, paired peptide sequence similarity, homologous peptide sequence similarity, and microbial peptide sequence similarity. We applied these metrics to neoepitopes predicted from somatic mutations in The Cancer Genome Atlas (TCGA), as well as to a group of peptides with neoepitope-specific immune response data. Only 19.9% of predicted neoepitopes across TCGA displayed novel MHC binding based on our criteria. Peptide sequence similarity was high between paired tumor-normal epitopes, but some neoepitopes were more similar to other human peptides, or to bacterial or viral peptides, than their paired normal counterparts. Applied to peptides with neoepitope-specific immune response data, a linear model incorporating a neoepitope’s binding affinity, paired MHC binding affinity difference, and sequence similarity to its closest viral peptide was able to predict immunogenicity with an AUROC of 0.66. These novelty criteria emphasize biologically meaningful neoepitopes, demonstrating that neoepitopes should be considered within the context of putatively co-occurring peptides, with potential implications for the development of personalized vaccines for cancer treatment.

A-282: Predicting Substance Abuse Status from Lab Tests and Vitals from the Mount Sinai EHR with Machine Learning
COSI: TransMed
  • Randall Ellis, Icahn School of Medicine at Mount Sinai, United States
  • Zichen Wang, Icahn School of Medicine at Mount Sinai, United States
  • Avi Ma'Ayan, Icahn School of Medicine at Mount Sinai, United States

Short Abstract: The opioid crisis in the United States is currently accounting for ~100 death per day due to overdose. The effectiveness of opioids to reduce pain, and the seeking behavior of opioid addicts, leads physicians in the United States to prescribe over 200 million opioid prescriptions every year. To better understand the profile of opioid seeking patients, advanced computational methods that mine Electronic health records (EHR) can be employed. EHR systems contain information on medical procedures, lab tests, vital signs, prescriptions, and other data for millions of patients. For this project we trained a machine learning model to classify patients for likelihood to have substance dependence using EHR from ~3,200 patients diagnosed with substance dependence (ICD-9 code 304.*), along with control samples of the same size composed of patients with no history of substance dependence (ICD-9 codes 304.* or 305.*), but with matched age, race, and gender. The model achieves prediction accuracy of ~86%, and the analysis of the model uncovers associations between basic clinical factors and substance dependence. The predictive model may hold utility for identifying opioid seeking patients that report other symptoms in the emergency room (ER), so those patients can more properly treated.

A-284: TCGA LUNG CANCER ANALYSIS PIPELINE
COSI: TransMed
  • Talip Zengin, Mugla University, Turkey
  • Tugba Onal-Suzek, Mugla Sitki Kocman University, Turkey

Short Abstract: Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient’s molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The goals of our project are to 1) computationally identify the top most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against results of a recent study conducted on TCGA LUAD dataset(Deng Z. min et al, 2017) and 3) provide an open-source automated pipeline.

A-286: A side effect prediction method that uses drug target expression information and unsupervised representation learning
COSI: TransMed
  • Magdalena Zwierzyna, Benevolent AI/University College London, United Kingdom
  • Mark Davies, Benevolent AI, United Kingdom
  • Chris Finan, University College London, United Kingdom
  • Aaron Sim, Benevolent AI, United Kingdom
  • Aroon Hingorani, University College London, United Kingdom

Short Abstract: We present a machine learning framework that aims to predict drug side effects arising from a variety of biological mechanisms, including the interactions of the drug molecule with targets expressed in unintended tissues. The approach integrates data from several public resources into a large knowledge graph summarising molecular and phenotypic information on 1,035 approved drugs. It incorporates information on tissue expression of drug targets as well as links between tissues and side effects based on normalised literature cooccurrence. The graph is used as input for node2vec - a neural network embedding method that automatically derives feature vectors for each node based on the topology of its neighbourhood. The learnt feature vectors are then used to train a classifier predicting the existence of a relationship for each drug-side effect pair. The integrated model can predict infrequent side effects with AUC of 0.95 and is validated on two external datasets including adverse events detected in clinical trials of yet unapproved drugs. Our approach demonstrates the utility of automated unsupervised network representation learning. In addition, considering tissue information enhances prediction of drug side effects and may lead to new strategies for identifying novel drug repurposing opportunities.

A-288: Comprehensive molecular comparison of murine and human hepatocellular carcinoma
COSI: TransMed
  • Michelle Dow, University of California San Diego, United States
  • Rachel Marty, University of California San Diego, United States
  • Brian Tsui, University of California San Diego, United States
  • Ludmil Alexandrov, University of California San Diego, United States
  • Hayato Nakagawa, University of California San Diego, United States
  • Koji Taniguchi, University of California San Diego, United States
  • Ekihiro Seki, University of California San Diego, United States
  • Michael Karin, University of California San Diego, United States
  • Joan Font-Burgada, Fox Chase Cancer Center, United States
  • Hannah Carter, University of California San Diego, United States

Short Abstract: Hepatocellular carcinoma (HCC) has poor prognosis and is the second leading cause of cancer mortality worldwide. The heterogeneous nature of HCC has resulted in the development of multiple murine models to investigate the underlying tumor biology and potential therapies. However, it is unclear to what extent different mouse models of HCC recapitulate human disease at the molecular level. We compared the genomic and transcriptomic profiles for 56 tumors from 4 mouse models to 987 HCC patients with diverse etiologies. Analyzing somatic single nucleotide variants from mouse tumors, we identified known mutational signatures and one novel mouse-specific signature. Among diverse non-synonymous mutations affecting established oncogenes and tumor suppressors, we observed orthologous amino acid changes in CTNNB1, a known HCC driver, in mice exposed to streptozotocin (STAM), and near universal (~90%) BRAF V600E in mice exposed to N-nitrosodiethylamine (DEN). Transcriptomic analysis revealed high correlations between STAM samples and human tumors characterized by high proliferation, high tumor grade, and poor prognosis. We also found evidence that the mouse immune system shapes the somatic mutational landscape of murine tumors. Overall, we identified two mouse models that demonstrated similar molecular characteristics to human tumors and may therefore provide more representative models for studying HCC oncogenesis.

A-290: NATURAL PRODUCT TARGET NETWORK TO EXPAND TREATMENT OPTIONS IN CANCER
COSI: TransMed
  • Steven R Chamberlin, Oregon Health and Science University, United States
  • Aurora Blucher, Oregon Health and Science University, United States
  • Gabrielle Choonoo, Oregon Health and Science University, United States
  • Molly Kulesz-Martin, Oregon Health and Science University, United States
  • Shannon McWeeney, Oregon Health and Science University, United States

Short Abstract: Here we assess coverage and characteristics of a natural-product (NP) target network to explore the potential of NP’s in cancer therapeutic space. NP's, or compounds from living sources, may help address challenges of cancer drug resistance, and may also be synergistic with some cancer drugs. In recent work, our group developed an evidence based framework for approved cancer drug-target interactions, which we termed the Cancer Targetome (Blucher et al., 2017). Using publicly available databases, we developed a similar framework for known biological targets for compounds isolated from plant sourced NPs. Critical cancer pathways were then identified from the Reactome knowledgebase using over-representation analysis with a set of pan-cancer driver genes. Both cancer drug and NP targets were mapped to protein targets within these pathways to assess independent and combined coverage. Considering target interactions with the strongest target binding values, the addition of NPs saw a 60% increase in coverage of targets in these pathways. In addition, mapping of these two classes of targets into a biological network revealed statistically similar characteristics. This work indicates that natural products may substantially increase the therapeutic target space when considered jointly with cancer drugs and assist in identifying novel therapeutic combination strategies.

A-292: Gradient Boosting Classifier Accurately Predicts Antibody Resistance of Clinical HIV-1 Isolates
COSI: TransMed
  • Reda Rawi, National Institute of Allergy and Infectious Diseases, National Institute of Health, United States
  • Chen-Hsiang Shen, National Institute of Allergy and Infectious Diseases, National Institutes of Health, United States
  • Raghvendra Mall, Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar, Qatar
  • S. Katie Farney, National Institute of Allergy and Infectious Diseases, National Institutes of Health, United States
  • Jing Zhou, National Institutes of Health, United States
  • Andrea Shiakolas, National Institute of Allergy and Infectious Diseases, National Institutes of Health, United States
  • Tae-Wook Chun, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, United States
  • Rebecca Lynch, George Washington University, United States
  • John Mascola, National Institutes of Health, United States
  • Peter Kwong, National Institutes of Health, United States
  • Gwo-Yu Chuang, National Institutes of Health, United States

Short Abstract: Broadly neutralizing antibodies (bNAbs) targeting the HIV-1 envelope glycoprotein (Env) have promising utility in prevention and treatment of HIV-1 infection with several undergoing clinical trials. Due to the high diversity and mutation rate of HIV-1, viruses may be resistant to a particular bNAb or administration of a bNAb may lead to viral escape. Until now, resistant viral strains have been identified by a three-step procedure: sequencing of Env, generating of pseudoviruses, and performing of in vitro neutralization assays, of which the latter two steps are expensive- and time-consuming. Here we developed sequence-based machine learning (ML) classifiers that predict resistance of HIV-1 to 32 bNAbs with an average accuracy of 87%. Using the tree-based ML method gradient boosting machine, we were able to interpret learnt biological features that distinguished between resistance and sensitivity for all 32 antibodies. Notably, we correctly identified 100% of the resistant and 88% of the sensitive strains from patients enrolled in the VRC601 phase 1 clinical trial. Further, we predicted resistance to bNAbs VRC01, 3BNC117, 10-1074, and PGT121 with a mean accuracy of 91% for 262 clinical viral strains. The availability of in silico antibody resistance predictors will facilitate informed decisions of antibody usage in clinical settings.

A-294: Cancer eQTL profiles can be recovered from bulk tumor gene expression data by modeling tumor purity
COSI: TransMed
  • Paul Geeleher, University of Chicago, United States
  • Cathal Seoighe, National University of Ireland, Galway, Ireland
  • R. Stephanie Huang, University of Minnesota, United States

Short Abstract: Expression quantitative trait loci (eQTLs) have been mapped in most tumor types. These studies measured gene expression in tumors and identified associations between these gene expression levels and common inherited genetic variants (e.g. SNPs) profiled in the same patients. These results have been widely applied: For example, the majority of inherited cancer risk variants implicated by GWAS are in non-coding likely-regulatory regions of the genome. Thus, to identify genes regulated by these variants, eQTLs identified from tumors are typically interrogated—facilitating rational functional follow-up studies. However, bulk tumor gene expression data reflect cancer and tumor-infiltrating normal cells; thus, tumor eQTLs could arise from cancer cells, normal cells, or both. We have developed an approach, which by modeling tumor purity, can identify high-confidence cancer eQTLs from mixture tumor gene expression. We investigated the eQTL profiles of cancer risk variants identified by breast cancer GWAS. Only about one-third of breast cancer risk variants identified as eQTLs from an uncorrected analysis of bulk tumor expression could be confidently attributed to cancer cells, with the remaining variants showing evidence of an effect in cells of the tumor microenvironment. Our approach will be critical for understanding how inherited polymorphisms influence cancer risk, development, and treatment.

A-296: Utilizing Temporal Genomic Data For Transcriptomic Profiling
COSI: TransMed
  • Guenter Tusch, Grand Valley State University, United States
  • Shahrzad Eslamian, Grand Valley State University, United States

Short Abstract: Genomic data become more frequently part of clinical practice. Novel tools and methods are required to transform information from increasingly voluminous genomic databases into actionable data for health care. In the era of precision medicine, the development of high-throughput technologies and electronic health records resulted in a paradigm shift in healthcare. However, the treatment of temporal data still remains a challenge. Temporal models have been proposed for electronic health records, but not genomic data. Frequently temporal genomic data are based on stimulus response studies. A typical query on those data includes searching for temporal effects/ time patterns in genesets. One frequently employed model for temporal data in healthcare is temporal abstraction, a model based on conversion of expression values into an interval-based qualitative representation expressing the amount of change over time. The challenge is to find domain specific mappings to create those representations. We explore the feasibility of modeling change by statistical significance. We propose to use empirical Bayes for DNA microarray data to determine differences in consecutive time points and comparisons across platforms by comparing p-values. For count data we use voom transformations allowing for RNA-seq data analysis. We demonstrate this approach in the framework of our SPOT software.

A-298: Activity landscapes of cancer cell lines predict drug response
COSI: TransMed
  • Martin Frejno, Technical University of Munich, Germany
  • Benjamin Ruprecht, Merck & Co., United States
  • Chen Meng, Technical University of Munich, Germany
  • Alexander Hogrebe, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark, Denmark
  • Jana Zecha, Technical University of Munich, Germany
  • Dominic Helm, Proteomics Core Facility, EMBL, Heidelberg, Germany, Germany
  • Thomas Oellerich, Goethe University, Germany
  • Sebastian Scheich, Goethe University, Germany
  • Hans-Michael Kvasnicka, Goethe University, Germany
  • Enken Drecoll, Technical University of Munich, Germany
  • Wilko Weichert, Technical University of Munich, Germany
  • Bernhard Kuster, Technical University of Munich, Germany

Short Abstract: In recent years, proteomic profiling of cancer cell lines combined with quantification of their response to drugs has proven to be useful for the identification of protein biomarkers of drug sensitivity and resistance. However, given that phosphorylation-based signaling is known to play a major role in determining drug response, phosphoproteomic profiling can provide a different angle on predicting drug sensitivity of cancer cell lines by focusing on their activity landscapes. Here, we profiled the proteomes and phosphoproteomes of 125 cancer cell lines using label-free mass spectrometry to a depth of >10,000 proteins and >55,000 phosphorylation sites (p-sites). We applied a wide range of computational approaches, including elastic net, concordance analysis, etc, to integrate these data with publicly available drug sensitivity measurements, identify proteomic and phosphoproteomic markers of drug response and suggest novel kinase-substrate relationships. The results not only recapitulated known drug-gene/protein interactions, but also suggested novel biomarkers predicting drug responses, which were subsequently validated in vitro, in vivo and on the patient level. These results suggest that, in combination with advanced computation methods, the activity profiling of cell lines has important value in translational research.

A-300: Finding best sequential drug application based on continuous and discontinuous drug application of unspecific and specific substances in B-ALL cell lines
COSI: TransMed
  • Yvonne Saara Gladbach, Rostock University, Germany
  • Lisa-Madeleine Sklarz, Rostock University Medical Center, Germany
  • Sina Sender, Rostock University Medical Center, Germany
  • Hugo Murua Escobar, Rostock University Medical Center, Germany
  • Christian Junghanß, Rostock University Medical Center, Germany
  • Georg Fuellen, Rostock University Medical Center, Germany
  • Mohamed Hamed Fahmy, Rostock University Medical Center, Germany

Short Abstract: Background RNA-Sequencing opens new opportunities in personalized medicine and offers unprecedented information about the human transcriptome, but harnessing this information with bioinformatics tools is typically a bottleneck. Little is known about the genetic associations between mRNAs, miRNAs, other noncoding RNAs in deriving cancer pathogenesis, and their potential suitability as drug targets. Results Here we combined the transcriptomic assays of mRNAs responses from different time points in continuous or discontinuous drug pressures of three different drug classes on Leukemia specific cell lines in order to identify pathways underlying drug resistance and to understand the mechanism of actions of mono and combined therapeutic applications. An integrative approach was developed to analyze the mRNA sequencing data and link the differentially expressed genes to various biological ontologies and regulatory databases. Next, machine learning approaches, as well as network approaches, will be applied to the drug transcriptomic profiles to build a sequential drug prediction model. Conclusion Our analysis showed that the discontinuous long drug application in both cell lines has a similar effect as the continuous short drug application. On the other hand, the continuous long drug application led to a remarkable difference of the B-cell receptor signaling pathway rankings in the two BTK inhibitors.

A-302: Polygenic Scores for Metabolic Traits and Related Drug Prescriptions in the Michigan Genomics Initiative
COSI: TransMed
  • Samuel Handelman, University of Michigan, United States

Short Abstract: We tested for associations between polygenic scores derived from three sources (the Genetics of Obesity-related Liver Disease "GOLD" consortium, the UK Biobank "UKB", and the cytokine level GWAS of Aholi-Olli et al. "AO") and obesity- or liver-related outcomes and corresponding laboratory measures in the Michigan Genomics Initiative, by which patient genotypes have been linked to the Michigan EMR. Numerous significant associations are found, especially between polygenic scores and laboratory measures. There is evidence for overfitting, especially in AO, but even in the well-powered UKB study there is not strong evidence for signals below the genome-wide significant cutoff. However, the likelihood ratio tests are not conclusive on either. These association tests include adjustments for prescribed drugs, and we will describe technical challenges related to biased-missingness of drugs and treatments as longitudinal covariates. Best practices in these studies should interpret prescription records in a cite-specific context, which becomes a challenge for "one size fits all" approaches to such studies. Therefore, we propose to create and then curate instances of two classes of Knowledge Object payloads: one class of objects holding encodings of readily-updateable polygenic scores, and another class holding encodings of site-specific data filters for use in incorporating longitudinal prescribing data.

A-304: Pathway analysis of GWAS loci identifies novel drug targets and repurposing opportunities
COSI: TransMed
  • Deepali Jhamb, GlaxoSmithKline, United States
  • Michal Magid-Slav, GlaxoSmithKline, United States
  • Mark Hurle, GlaxoSmithKline, United States
  • Pankaj Agarwal, GlaxoSmithKline, United States

Short Abstract: Genome-wide association studies (GWAS) have made substantial progress in identifying susceptibility loci associated with complex traits and there is emerging evidence that genetics-based targets lead to 28% more launched drugs. However, translating results of GWAS for drug discovery remains challenging. We implemented a pathway-centric approach to analyzing GWAS loci, and demonstrated that global large scale analysis of 1,456 protein interaction pathways on nearly 1,600 GWAS (473 traits) leads to drug targets and translates genetic findings into therapeutic hypotheses for 182 diseases. We validated our genetic pathway-based targets by testing if current drug targets for 97 diseases are enriched in the pathway space for the same indication. Remarkably, 30% of these diseases have significantly more targets in these pathways than expected by chance; the comparable number for GWAS alone (without using pathway analysis) is zero. Overall, this study provides a large-scale pathway analysis of GWAS data and demonstrates how pathway analysis can aid in translation of GWAS data into therapeutic hypotheses for new drug discovery targets and repositioning opportunities for current drugs.

A-306: Rule-based integration of heterogeneous multi-omics data to determine the prognostic markers for relapse in Acute Lymphoblastic Leukemia
COSI: TransMed
  • Aleksandra Gruca, EU COST Action CHARME CA15110; Silesian University of Technology, Poland
  • Roman Jaksik, Institute of Automated Control, Silesian University of Technology, Poland, Poland
  • Marek Sikora, Institute of Informatics, Silesian University of Technology, Poland; Institute of Innovative Technologies EMAG, Poland, Poland

Short Abstract: Public multi-omics repositories allow researchers to extract data for integration and analysis in order to discover the molecular bases of diseases and development of effective treatments. However, the diversity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the integration and analysis of multi-omics datasets a challenging task. In this work we integrate the data from TARGET Acute Lymphoblastic Leukemia (ALL) Phase2 project. In particular, we combine clinical data, gene expression, DNA methylation and copy number variance to find new markers correlated with poor clinical outcome and early bone marrow relapse. From the cohort of 792 patients we identified 80 patients for which all required multi-omics experimental data is available in TARGET database. In this group, 36 patients suffered from ALL relapse. For the analysis we designed rule-based framework. The main advantage of rule-based approach is the fact that the outcome rule set is not only useful as a classifier but also provides a natural means of understanding which features (and their combinations) influence the outcome. Presented rule-based workflow allows selecting the most important features discriminating between relapse and relapse-free patients providing prognostic multi-markers for relapse in ALL.

A-308: Integrative analysis of pharmacogenomics in major cancer cell line databases using CellMinerCDB
COSI: TransMed
  • Vinodh Rajapakse, National Cancer Institute, United States
  • Augustin Luna, Dana-Farber Cancer Institute, United States
  • Fathi Elloumi, National Cancer Institute, United States
  • Sudhir Varma, National Cancer Institute, United States
  • Margot Sunshine, National Cancer Institute, United States
  • Chris Sander, Harvard University, United States
  • William Reinhold, National Cancer Institute, United States
  • Yves Pommier, National Cancer Institute, United States

Short Abstract: Cancer cell line panels are widely used for evaluating drug response across diverse tissue types. A growing set of molecular profiling data complements measurements of chemosensitivity, providing novel avenues for response determinant discovery and clinical translation. Accessing and inter-relating data from different sources is essential for evaluating such determinants, but remains challenging. To enable wider access to cell line pharmacogenomic data, we have developed CellMinerCDB (CellMiner Cross-Database, discover.nci.nih.gov/cellminercdb), a web application integrating data from several widely studied cancer cell line panels, including the NCI-60 (NIH), GDSC (Sanger/MGH), and CCLE/CTRP (Broad). Altogether, our database spans over 1300 distinct cell lines, 400 clinically relevant cancer drugs, 20,000 experimental compounds, and molecular profiling data, such as gene/protein expression, DNA copy, methylation, and mutational status. Cell line and tested drug overlaps allow cross-database validation of genomic and drug data, and CellMinerCDB simplifies this by transparently matching differently named entities between sources. Data exploration is enhanced via annotations to restrict analysis to particular tissue types, as well as pathway and process-based gene annotations allow biological interpretation of response predictive features. Descriptions of the data availability, retrieval, and web-application functionalities will be presented, including informative examples of data integration, and translational results.

A-310: Modeling of undefined off-target effects of drugs by computational systems biology methods and in vivo validation in hepatocellular carcinoma
COSI: TransMed
  • Nurcan Tuncbag, Middle East Technical University, Turkey
  • Esra Sinoplu, Middle East Technical University, Turkey
  • Rengul Atalay, METU, Turkey

Short Abstract: Hepatocellular carcinoma (HCC) is the fifth most common and the second deadliest cancer in the world. HCC is highly resistant to conventional chemotherapies, and targeted agents only extend the patient's life by a few months. In this context, it is important to identify drugs that have not yet been used in HCC treatment. Discovery of off-target effects of drugs targeting liver cancer-specific protein networks can be investigated by using systems biology research approaches. In this work, the pathways reported to be active in HCC were constructed from 801 nodes-3896 edges. DrugBank small molecule inhibitors those having at least one target protein integrated. Shortest cycles were extracted. By the means of in silico perturbation attack strategies, the target proteins of the drugs and their interactions were calculated as the drug effectiveness, changes in the efficiency of the signaling network, changes in the number of feedback cycles, and changes in the network functionality were identified and ranked. The same attack strategy was applied to identify and rank drug combinations. Brigatinib, Regorafenib, Sunitinib, Thalidomide, Pranlukast, Lenvatinib, Chloroquine, Pseudoephedrine, and Amrinone were identified to be validated in vivo at the transcription level of 770 cancer gene in chemosensitive Huh7 and chemoresistant Mahlavu cells.

A-312: Using precision pharmacovigilance to detect and evaluate antiepileptic drug associated adverse reactions in pediatric patients
COSI: TransMed
  • Nicholas Giangreco, Columbia University, United States
  • Nicholas Tatonetti, Columbia University, United States

Short Abstract: Adverse drug reactions (ADRs) are common, unwanted side effects during drug treatment that cause over 2 million injuries, hospitalizations, and deaths across the United States each year. The detection, assessment, understanding, and prevention of ADRs in patients is a priority in pharmacovigilance research. However, most approaches are limited in considering the identified adverse reactions as independent, which misses important but sparse side effects and provides minimal evaluation of related or predominant side effects. This leads to low ADR detection for less well represented populations, such as pediatric patients (<18 years old). , ultimately widening gaps in our knowledge of ADRs disproportionately affecting these populations. Here, we present a personalized pharmacovigilance approach for detecting ADRs disproportionately affecting pediatric patients. Using the Adverse Event Open Learning Universal Standardization (AEOLUS) containing more than 8 million ADR reports and the SNOMED-CT clinical concept hierarchy, we provide an approach to detect ADR relationships that is more interpretable, generalizable, and statistically powerful. We show that building the SNOMED-CT hierarchy, using the ‘is_a’ relationship, increases the statistical power for detecting ADRs for pediatric subpopulations. We present a case study in which we find adverse reactions from anti-epileptic drug use disproportionately affecting pediatric patients.

A-314: Phenotype-based characterization of drug-disease relationships
COSI: TransMed
  • Suryanarayana Yaddanapudi, Cincinnati Children’s Hospital Medical Center, United States
  • Jaswanth Yella, Cincinnati Children’s Hospital Medical Center, United States
  • Nishal Pattan, Cincinnati Children’s Hospital Medical Center, United States
  • Anil Goud Jegga, Cincinnati Children’s Hospital Medical Center, United States

Short Abstract: Joint analysis of drug-induced adverse events (AEs) and disease-phenotypes can potentially identify novel drug-drug, disease-disease, and drug-disease relationships. The latter can be potentially used for identifying drug-disease contraindications or drug repositioning candidates. To test this hypothesis, we extracted AE of drugs from the ADReCS database and disease-phenotype associations from the Monarch Initiative. We mapped the AEs and phenotypes to the UMLS (Unified Medical Language System) concepts to enable direct comparison of diseases and drugs based on their phenotypic similarities. We ranked drug-disease associations using cosine similarity with term frequency–inverse document frequency. We evaluated the performance of our approach with two sets of data: (a) known drug mechanism of action (MoA) and (b) known drug–disease contraindications. Our phenotype-based computed drug-drug relationships suggest that drugs of same MoA or chemical class tend to have high phenotypic similarity. In case of phenotype-based drug-disease associations, surprisingly, we often noticed that several known indications showed high phenotypic similarity. This might suggest that there could be potential drug-induced aggravation of the primary disease condition for which the implicated drug is prescribed. Our methods are freely available as a web application (https://phenorx.research.cchmc.org/).

A-316: RGD: Human data and tools for translational medicine
COSI: TransMed
  • Jennifer R. Smith, Medical College of Wisconsin, United States
  • Stanley Laulederkind, Medical College of Wisconsin, United States
  • G. Thomas Hayman, Medical College of Wisconsin, United States
  • Shur-Jen Wang, Medical College of Wisconsin, United States
  • Matthew Hoffman, Medical College of Wisconsin, United States
  • Elizabeth Bolton, Medical College of Wisconsin, United States
  • Vatsal Mehra, Medical College of Wisconsin, United States
  • Jyothi Thota, Medical College of Wisconsin, United States
  • Monika Tutaj, Medical College of Wisconsin, United States
  • Marek Tutaj, Medical College of Wisconsin, United States
  • Jeffrey De Pons, Medical College of Wisconsin, United States
  • Melinda Dwinell, Medical College of Wisconsin, United States
  • Mary Shimoyama, Medical College of Wisconsin, United States

Short Abstract: RGD (https://rgd.mcw.edu) is a multi-species platform ideally suited for translational research. RGD was designed to allow researchers to easily access a large corpus of data for human and to move from data for human to that for disease models and back again. As such, RGD has established a rich core set of human data and integrated it with data for six other species used as models for human disease. The complete human gene set in RGD has been enhanced with associated annotations for disease, phenotype, pathways, Gene Ontology (GO) and gene-drug interactions. Additionally, RGD has imported the ClinVar variant set from NCBI and associated these variants and their associated phenotype/disease annotations with the corresponding genes. Both validated and predicted miRNA targets have been incorporated from miRGate. To facilitate use of these data, RGD has also developed a suite of innovative tools for data discovery and analysis. These include the OLGA Object List Generator and Analysis tool, the Gene Annotator tool for exploring functional annotations for a list of genes, Interactive pathway diagrams, InterViewer for visualizing protein-protein interactions, Variant Visualizer to investigate ClinVar variants, JBrowse genome browsers, and RGD's disease portals which present consolidated data for twelve disease categories.

A-318: Building and Using a Gen3 Data Commons for Translational Medicine
COSI: TransMed
  • Christopher Meyer, University of Chicago, United States
  • Xiangyan Kuang, University of Chicago, United States
  • Yilin Xu, University of Chicago, United States
  • Francisco Ortuno, University of Chicago, United States
  • Zac Flamig, University of Chicago, United States
  • Christina Yung, University of Chicago, United States
  • Robert Grossman, University of Chicago, United States

Short Abstract: The data commons paradigm aims to accelerate scientific discoveries by facilitating cross-project analyses through harmonization of ingested data curated from a variety of sources. The Gen3 software stack is a suite of open source software for hosting data commons in a secure, scalable platform for applications. Gen3 includes five main services for authentication and authorization, GraphQL based searching, curating submissions against a metadata dictionary, mapping data GUIDs to locations, and an interactive website. The process of building a Gen3 Data Commons and using it requires harmonizing datasets by creating a standardized data dictionary of variable names and using this dictionary for data ingestion and co-analyses. Since all Gen3 Data Commons share a common infrastructure, open-source tools and apps can be developed for analyses that span different datasets, even across different commons. We describe the Gen3 software components in detail and discuss steps required for creating data dictionaries. We then demonstrate our cloud-based workspace for data analysis and visualization, which supports Python and R Jupyter notebooks, ShinyR applications, and Docker/CWL analysis pipelines. These tools represent real use cases as they interoperate with existing Gen3 Data Commons, including the Brain Commons, BloodPAC Commons, Environmental Data Commons, and the NIAID Data Hub.

A-320: Integration and prediction of data on evolutionary drug interactions in antibiotic resistant strains of Escherichia coli
COSI: TransMed
  • Alexander Flohr, University of Saarland, Germany
  • Volhard Helms, University of Saarland, Germany
  • Daria Gaidar, University of Saarland, Germany

Short Abstract: Widespread use of antibiotics in clinics and livestock increases the probability that drug resistance develops in bacteria. Bacterial resistance not only renders applied antimicrobial drugs and therapies inefficient, it can also alter the effect of other therapeutic compounds. The latter is referred to as evolutionary drug-drug interactions, cross-resistance or collateral sensitivity. Several data sets describing this phenomenon were recently and independently reported based on laboratory evolution experiments where Escherichia coli was placed under antibiotic stress. Comparative and integrative analysis of these data sets appear to be lacking so far. From cross-treatment studies on evolutionary drug-drug interactions, we compiled a comprehensive collection how 1014 pairs of antibiotics affect E.coli and discuss implications on the protocol design of rational drug treatment. Further, we numerically extended the integrated compendium by 38% to 1406 drug pairs by critically evaluating the performance of 6 different data imputation methods including baseline prediction, Hidden-Markov-like, latent factor model, and group factor analysis (GFA). With the GFA method and using background information on the structural similarity of drug pairs in the form of functional connectivity fingerprints of radius 4, we reached an RMSE below 0.5 and accuracy of over 85% in imputing the effect of unseen drug combinations.

A-322: An integrative model for predicting the toxicity of target proteins to human tissues
COSI: TransMed
  • Yun Hao, Columbia University, United States
  • Phyllis Thangaraj, Columbia University, United States
  • Nicholas Tatonetti, Columbia University, United States

Short Abstract: Drug toxicity is a leading cause of hospital adverse events and injuries to patients, affecting over two million hospital stays annually. Severe toxicity can cause 100,000 deaths per year. A majority of toxic events were caused by the interaction between target proteins and human tissues. While some events were well studied, there are no systematic methods to investigate the cellular mechanism and predict toxicity in general. To solve this issue, we introduced an integrative model that combines multi-omic features for predicting the toxicity of target proteins in 10 human body systems and 45 tissues. By incorporating novel features such as pharmacological pathways and interaction with cellular regulatory network, we were able to improve the overall performance by 23% and achieve a median AUROC of 0.70. We then applied our models to predict the toxicity of 4,968 proteins in human druggable genome, and will further validate the results using clinical trials data from AACT. By showing that our toxicity score can well differentiate clinical trials that failed for toxicity reasons from those that succeeded, we aim to demonstrate the promising application of our results in drug development and pharmacoepidemiology, as well as deciphering the mechanism of toxicity in chemical biology.

A-324: Functional Deconvolution and Integrative Genomic Strategies for the Identification of Druggable Dependencies in Pediatric Tumors
COSI: TransMed
  • Gabriela Alexe, Dana-Farber Cancer Institute, Boston University, United States
  • Liying Chen, Harvard University, United States
  • Neekesh V. Dharia, Dana-Farber Cancer Institute, Broad Institute, United States
  • Linda Ross, Harvard University, United States
  • Amanda Balboni Iniguez, Dana-Farber Cancer Institute, Broad Institute, United States
  • Amy Saur Conway, Harvard University, United States
  • Emily Jue Wang, Harvard University, United States
  • Veronica Veschi, National Cancer Institute, United States
  • Norris Lam, National Cancer Institute, United States
  • Jun Qi, Harvard University, United States
  • W. Clay Gustafson, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, United States
  • Nicole Nasholm, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, United States
  • Francisca Vazquez, Broad Institute of MIT and Harvard, United States
  • Barbara Weir, Broad Institute of MIT and Harvard, United States
  • Glenn S. Cowley, Broad Institute of MIT and Harvard, United States
  • Robin M. Meyers, Harvard University, United States
  • Aviad Tsherniak, Broad Institute of MIT and Harvard, United States
  • David E. Root, Harvard University, United States
  • James E. Bradner, Dana-Farber Cancer Institute, Novartis Institutes for BioMedical Research, United States
  • Todd R. Golub, Harvard University, United States
  • Charles W. M. Roberts, Dana-Farber Cancer Institute, Broad Institute, United States
  • William C. Hahn, Dana-Farber Cancer Institute, Broad Institute, United States
  • William A. Weiss, Helen Diller Family Comprehensive Cancer Center, Brain Tumor Research Center, University of California, San Francisco, United States
  • Carol J. Thiele, National Cancer Institute, United States
  • Kimberly Stegmaier, Dana-Farber Cancer Institute, Broad Institute, United States

Short Abstract: Pediatric cancers are often driven by master transcription factors difficult to drug, such as MYCN, which is frequently amplified in neuroblastoma. Genome-wide CRISPR-Cas9 dependency screening in large compendia of tumor lineages offer a powerful therapeutic avenue. A significant computational difficulty in the identification of druggable lineage dependencies is that dependencies are usually shared among tumors from different lineages, and they are described by a combinatorial essentiality effect of genes involved in several functional mechanisms. Here we show that these challenges can be addressed through data deconvolution and integrative genomic strategies. We applied Independent Component Analysis to a genome-scale CRISPR-Cas9 screen of MYCN-amplified neuroblastoma cell lines and found a preferential dependency on polycomb repressive complex 2 genes EZH2, EED and SUZ12. Transcriptomic and epigenetic analysis of neuroblastoma cell line and tumor data revealed that MYCN upregulates EZH2, leading to inactivation of a tumor suppressor program, and that EZH2 represses neuronal differentiation. Further experiments focused on EZH2 as a druggable target because several small-molecule, EZH2-inhibitors are already in clinical trials. In summary, our study highlights data deconvolution and integrative genomic strategies as effective exploratory methods for the identification of novel druggable dependencies and supports testing EZH2 inhibition in patients with MYCN-amplified neuroblastoma.

A-326: Public cancer genome data mining to inform design of genetic analysis solutions for oncology research
COSI: TransMed
  • Vinay Mittal, Thermo Fisher Scientific, United States
  • Nick Khazanov, Thermo Fisher Scientific, United States
  • Christopher Taylor, Thermo Fisher Scientific, United States
  • Fernando Farfan, Thermo Fisher Scientific, United States
  • Dinesh Cyanam, Thermo Fisher Scientific, United States
  • Paul Williams, Thermo Fisher Scientific, United States
  • Nikki Bonevich, Thermo Fisher Scientific, United States
  • Santhoshi Bandla, Thermo Fisher Scientific, United States
  • Michael Hogan, Thermo Fisher Scientific, United States
  • Seth Sadis, Thermo Fisher Scientific, United States

Short Abstract: Targeted genomic profiling assays allow assessment of a single to hundreds of cancer-related genes simultaneously. Effective targeted assays focus on somatic mutations in genes that have established relevance in cancer. The patterns of somatic mutations in cancer genes are highly characteristic and non-random. When designing oncology research assays, cancer genomic databases provide evidence for characterizing somatic mutation patterns, and help define significantly mutated regions. However, a robust process to collect, catalogue, and analyze the data is required to maintain the value of derived knowledge with growing datasets. We present here a streamlined process for standardizing and analyzing publicly available genomic cancer data. The supervised process used the recent data from COSMIC and a comprehensive quality control process validated each build, ensuring data integrity. The most recent build sourced v83 which contained over 16000 whole-exome and 181000 targeted sequencing samples across 22 standardized cancer types. Our process defined ~5,000 recurrent hotspot mutations across and within specific cancer types and identified 709 candidate cancer genes significantly enriched (q< 0.001; frequency >10%) in either hotspot mutations or deleterious mutations. The implemented genomic data standardization process supports a sustainable approach to the development of oncology-focused genetic analysis solutions including cancer research gene panels.

A-328: A Targetome-Pathway Perspective on Drug Response for Targeted Therapies in Acute Myeloid Leukemia
COSI: TransMed
  • Aurora Blucher, Oregon Health and Science University, United States
  • Daniel Bottomly, Oregon Health & Science University, United States
  • Erik Segerdell, Oregon Health & Science University, United States
  • Beth Wilmot, Oregon Health & Science University, United States
  • Steve Kurtz, Oregon Health & Science University, United States
  • Cristina Tognon, Oregon Health & Science University, United States
  • Brian Druker, Oregon Health & Science University, United States
  • Jeffrey Tyner, Oregon Health & Science University, United States
  • Guanming Wu, OHSU - Reactome, United States
  • Shannon McWeeney, Oregon Health and Science University, United States

Short Abstract: Combination therapies offer the potential of targeting multiple genetic aberrations at once to tackle tumor subclonal populations or overcome resistance mechanisms. We provide a computational framework for assessing single agents individually, as well as in combination, and their interactions with patient genetic alterations. Investigating the association between aberrational pathways and drug response in de novo acute myeloid leukemia patient samples from the Beat AML Consortium reveals many significant pathway-level associations with drug sensitivity or resistance. We note these are driven by mutations in a spectrum of genes within the pathway, and therefore are potentially missed when considering only single gene interactions with drug response. To further understand how these intrinsic mutational perturbations result in drug sensitivity or resistance, we used a probabilistic graphical modeling framework to model pathway impact. Complementary to this, we model extrinsic drug perturbations on these pathways using quantitative drug-target binding information from the Cancer Targetome to model impact downstream. We will discuss our developments on the development of a unified framework of intrinsic and extrinsic perturbation modeling for rigorous in silico hypothesis generation and testing to facilitate future drug combination screening recommendations.

A-330: Genetic algorithm for the search of cancer subtypes with clinical significance according to their gene expression patterns
COSI: TransMed
  • Martin Eduardo Guerrero-Gimenez, Instituto de Medicina y Biologia Experimental de Cuyo (IMBECU-CONICET), Argentina
  • Carlos Catania, Facultad de Ingenieria - Universidad Nacional de Cuyo, Argentina
  • Juan Manuel Fernandez Muñoz, Instituto de Medicina y Biologia Experimental de Cuyo (IMBECU-CONICET), Argentina
  • Daniel Ramon Ciocca, Instituto de Medicina y Biologia Experimental de Cuyo (IMBECU-CONICET), Argentina
  • Felipe Carlos Martin Zoppino, Instituto de Medicina y Biologia Experimental de Cuyo (IMBECU-CONICET), Argentina

Short Abstract: Motivation: Clustering analysis has been long used to find underlying structures in different omics data such as gene expression profiles. This data typically presents high number of dimensions and has been used successfully to find co-expressed genes in samples that share similar molecular and clinical characteristics. Nevertheless, the clustering results are highly dependent of the features used and the number of clusters considered, while the partition obtained does not guarantee clinically relevant findings. Methods: We propose a multi-objective optimization algorithm for disease subtype discovery based on a non-dominated sorting genetic algorithm. Our proposed framework combines the advantages of clustering algorithms for grouping heterogeneous omics data and the searching properties of genetic algorithms for feature selection and optimal number of clusters determination to find features that maximize the survival difference between subtypes while keeping cluster consistency high. Results: Two breast cancer datasets were divided into a training and testing set to test our model. In both cases our method identified clinically relevant sub-groups in the training sets (log-rank test = 0 & 0.0004). The features obtained were used to create nearest-centroid classifiers which were tested in the test sets with significant survival differences between groups (log-rank test = 1.22E-15 & 0.028).

A-332: Systematic modeling of pan-cancer drug response.
COSI: TransMed
  • Mehreen Ali, Institute of Molecular Medicine Finland, Finland
  • Tero Aittokallio, Institute for Molecular Medicine Finland FIMM, Finland
  • Suleiman Ali Khan, Institute of Molecular Medicine Finland FIMM, Finland

Short Abstract: Pan-cancer modeling approaches lead to an integrated picture of commonalities among various tumor types, whereas tissue-specific studies lead to insights solely based on a single tumor type. In this study, we aim to systematically model these two extremes of comparison spectrum for drug sensitivity prediction in cancer cell-lines. We used publically available human cancer cell-lines from Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) projects in pan-cancer and cancer-type specific settings to develop an understanding on which of the two extremes yields better predictive power. We compared several linear regression methods and ridge regression performed well overall. Since in pan-cancer setting, there is an additional effect of drug for various tissue types along with the confounding effect of sample size, pan-cancer predictions outperformed cancer-type specific results for CCLE. GDSC results, however, have shown improved performance of targeted drugs in cancer-type specific setting for certain tissue types. Predictive accuracy of EGFR, ERK, MEK and AKT inhibitors has been significantly boosted for GDSC breast cancer cell-lines in cancer-type specific setting. Results from this study show pan-cancer approach overcasting tissue-specific signals and thus we advocate for an integration of both modeling approaches, for improved drug response prediction accuracy.

A-334: Performance comparison of three DNA extraction kits on human whole-exome data from formalin-fixed paraffin-embedded normal and tumor samples
COSI: TransMed
  • Eric Bonnet, CNRGH, France
  • Marie-Laure Moutet, CNRGH, France
  • Céline Baulard, CNRGH, France
  • Delphine Bacq-Daian, CNRGH, France
  • Florian Sandron, CNRGH, France
  • Lilia Mesrob, CNRGH, France
  • Bertrand Fin, CNRGH, France
  • Marc Delépine, CNRGH, France
  • Marie-Ange Palomares, CNRGH, France
  • Claire Jubin, CNRGH, France
  • Hélène Blanché, CEPH, France
  • Vincent Meyer, CNRGH, France
  • Anne Boland, CNRGH, France
  • Robert Olaso, CNRGH, France
  • Jean-François Deleuze, CNRGH, France

Short Abstract: Next-generation studies can be used for the detection of various DNA modifications, such as single nucleotide variations on a large scale. Formalin-fixed paraffin-embedded (FFPE) tissues are one of the most abundant source of clinical specimen, and this method of preparation is known to degrade DNA. Here, we generated 42 whole-exome sequencing data sets from pairs of matched fresh-frozen (FF)/FFPE pairs. The samples contain human normal and tumor tissues from two different organs (liver and colon). Coverage analysis shows that FFPE samples have less good indicators than FF samples, but the coverage quality remains globally above the usual thresholds. We detect limited but significant variations in coverage between the three extraction kits. Variants analysis shows a high rate of concordance calls between matched FF / FFPE pairs. We detect a limited but significant variation in number of variants between FF and FFPE samples for the three different FFPE DNA extraction kits. Taken together, our results confirm the potential of FFPE samples for clinical genomic studies, but also indicate that the choice of a FFPE DNA extraction kit should be done with careful testing and analysis beforehand.

A-336: Gene Expression in Orbital Inflammatory Syndrome
COSI: TransMed
  • Shreya Wadhwa, Loyola University Chicago, United States
  • Vinay Aakalu, Unniversity of Illinois at Chicago, United States

Short Abstract: Inflammatory dacryoadenitis (ID) is an important cause of morbidity and may be associated with inflammatory disorders [e.g. sarcoidosis, IgG4 related disease (IgG4RD)]. Advances in understanding of non-coding RNA [microRNA (miRNA), small nuclear RNA(snRNA) and small nucleolar RNAs (snoRNAs)], have revealed their importance in transcriptional regulation and pathogenesis of disease. We investigated the role of non-coding RNA levels in a pilot study using samples of ID [sarcoidosis, idiopathic chronic dacryoadenitis (ICD), IgG4RD and sclerosing dacryoadenitis (SD)].

A-338: Identification of New Targets Regulating Platinum Resistance in Pancreatic Cancer using the CRISPR Dropout Screens
COSI: TransMed
  • Yan Zhou, Fox Chase Cancer Center, United States
  • Vera Skripova, Kazan Federal University, Russia
  • Ilya Serebriiskii, Fox Chase Cancer Center, United States
  • Ramziya Kiyamova, Kazan Federal University, Russia
  • Igor Astsaturov, Fox Chase Cancer Center, United States

Short Abstract: Pancreatic cancer is an aggressive cancer with a very poor prognosis, for which chemotherapy remains the mainstream treatment options. But only a small subset of pancreatic cancers respond well to chemotherapy. Better knowledge of the molecular mechanisms contributing to drug resistance is imperative to improve patient prognosis. To identify genes modulating the impact of platinum, we performed genome-wide CRISPR screen in pancreatic cell line MIA PaCa-2, as well as peritoneal carcinomatosis model in SCID mice. We ranked genes according to their sensitizing impact and identified genes whose deletion modified the impact of Oxaliplatin or cisplatin, as well as genes shared across both treatments with FDR<0.2 cutoff using EdgeR methodology. Genes involved in DNA damage response, cell cycle regulation, and also DNA repair are significantly enriched among hits identified. In addition, some are known platinum sensitizer, which is a validation of our systematic functional screening approach. We are currently in the process of validating these hits in additional pancreatic cell lines and also the in vivo system. Our study yield important mechanistic insights into platinum resistance in pancreatic cancer cells and allow us to nominate new treatment targets for pancreatic carcinomas.

A-340: A data driven approach for drug target identification in oncology and immunology
COSI: TransMed
  • Maria Lourdes Rosano Gonzalez, Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Switzerland, Switzerland
  • Chia-Huey Ooi, Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Switzerland, Switzerland
  • Klas Hatje, Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, Switzerland, Switzerland

Short Abstract: A computational approach for drug target identification in the fields of oncology and immunology by mining gene expression data was developed. The idea is to focus on certain cell types that play a pathogenic role in cancer as well as in autoimmune diseases. First, differential gene expression analysis is applied to transcriptomics datasets that are relevant for the diseases and cell types of interest. Then, the differentially expressed genes are used to derive a list of candidate targets for that cell type. These candidate genes are tested for association to disease-driving pathways in reference expression data from various cancer types and immune-related diseases. A strong association increases the confidence that a candidate gene plays a pathogenic role in the disease. Several other criteria are taken into account as well, including the expression of candidate genes across relevant normal tissues or gene ontology annotations. As proof of concept, we applied the described method to identify targets in tumor-associated fibroblasts and in immune-activated fibroblasts in rheumatoid arthritis (RA). We focus on in this cell type due to its common pathogenic role in both cancer and RA, but the concept is generally applicable to further tissues, cell types, and diseases.

A-342: Exploring co-evolution between individuals' social networks and physical activities in NetHealth data
COSI: TransMed
  • Shikang Liu, University of Notre Dame, United States
  • David Hachen, University of Notre Dame, United States
  • Omar Lizardo, University of Notre Dame, United States
  • Christian Poellabauer, University of Notre Dame, United States
  • Aaron Striegel, University of Notre Dame, United States
  • Tijana Milenkovic, University of Notre Dame, United States

Short Abstract: Understanding the relationship between individuals' social networks and health could help reduce incidence of unhealthy behaviors, and by complementing biological -omics data analyses, it could help improve patient-specific precision medicine. So, we analyze individuals' mobile sensor-based longitudinal social network (SMS) data, Fitbit-based longitudinal health-related behavioral (physical activity) data, and trait (personality, depression, and anxiety) data, all collected by the NetHealth study. We examine trait differences between individuals whose social network positions (centralities) or Fitbit physical activities change over time versus time-stable individuals, as well as individuals whose centralities and their physical activities co-evolve (correlate over time) versus those with no co-evolution relationship. We find that individuals whose centralities change with time do not show any trait difference compared to time-stable individuals. However, if out of the centrality-changing individuals we focus on those whose physical activities also change with time, then these are more introverted than the time-stable individuals. Moreover, individuals whose centralities and physical activities both change with time and whose centralities co-evolve with physical activities are more anxious compared to individuals who are time-stable and do not have a co-evolution relationship. Hence, our study reveals several links between individuals' social network structure, health-related behaviors, and the other traits.

A-344: Quantifying the Influence of Environmental Quality on Mental Disorders
COSI: TransMed
  • Atif Ali Khan, University of Chicago, United States
  • Hannah Landecker, The University of California at Los Angeles, United States
  • Patrick Sullivan, Karolinska Institute Sweden, and The University of North Carolina at Chapel Hill, United States
  • Andrey Rzhetsky, University of Chicago, United States

Short Abstract: This study presents the finding from a population level computational investigation of disease-environment relationship. We examined the health insurance claims database of over 150 million unique US individuals to obtain the frequency of the clinical appearances of six mental conditions including bipolar disorder, schizophrenia, personality disorder, Parkinson’s disease, epilepsy, and major depressive disorder. The dataset included time-stamped patient treatment episodes during the period of 2003-2013 with individual patient diagnoses defined by the corresponding ICD-9 code. We used mixed-effects regression, modeling disease counts per age and sex groups with the environmental exposures measured at the U.S. county level. The environmental factors included U.S. county-level qualities of air, land, water, built, and weather along with the sociodemographic characteristics. The computational investigation suggests that (a) the spatial prevalence of six mental conditions are remarkably different in the U.S., (b) air and land pollution are important environmental predictors of the frequency of the clinical appearance of bipolar disorder and major depression, and (c) that the psychiatric disorders should be an important component of efforts to quantify and understand effects of chronic pollutant exposures. These links point to the potential systemic effects of air and land pollution that might affect the brain.

A-346: Modeling High-Throughput Screening Data for Drug Repurposing
COSI: TransMed
  • Srilatha Sakamuru, National Center for Advancing Translational Sciences (NCATS) and George Mason University (GMU), United States
  • Menghang Xia, National Center for Advancing Translational Sciences (NCATS), United States
  • Ruili Huang, National Center for Advancing Translational Sciences (NCATS), United States
  • Iosif Vaisman, George Mason University, United States

Short Abstract: The NCATS Pharmaceutical Collection (NPC) consists of approximately 3000 approved and investigational small molecule drugs for clinical use in humans or animals. Till date, we have concentration-response data available from high throughput screening of NPC libraries against a large panel of cell-based assays. Computational models were built using these in vitro activity data along with structural features of the compounds to predict potential target and/or therapeutic indications of drugs across a broad array of human diseases. Machine learning classification algorithms: Naïve Bayes, Random Forest, Support Vector Machines, and Extreme Gradient Boosting were used to generate predictive models using these datasets for three different targets namely, Cytochrome P450 3A4 (CYP3A4), Estrogen Receptor1 (ESR1), and Adrenoceptor Alpha 1A (ADRA1A). The established models were validated using internal test dataset. The results showed an improved predictive performance of the models using the combination datasets of in vitro activity and structural features. Thus, the proposed studies will help to discover novel therapeutic uses of approved drugs for repurposing.

A-421: Identifying Crohn's disease signal from variome analysis
COSI: TransMed
  • Yanran Wang, Rutgers University, United States
  • Yuri Astrakhan United States
  • Britt-Sabina Petersen, Christian-Albrechts-University of Kiel, Germany
  • Stefan Schreiber, Christian-Albrechts-University of Kiel, Germany
  • Andre Franke, Christian-Albrechts-University of Kiel, Germany
  • Yana Bromberg, Rutgers University, United States

Short Abstract: Background: After years of research, the cause of Crohn’s disease (CD) remains unknown. Its accurate diagnosis, however, can help in management and preventing the onset of disease. Whole exome sequencing (WES) provides a new way of evaluating CD-predisposition and can help identify new disease genes and pathways. Method and Results: We developed AVA,Dx (Analysis of Variation for Association with Disease), a machine learning-based method that uses WES data alone to highlight CD genes and predict individual CD status. AVA,Dx first predicts changes in function of genes due to individual-specific genetic variation. Then, it maps the resulting gene-function vectors to individual CD-status. In testing, AVA,Dx differentiated three quarters of the CD patients from healthy controls with 71% precision. Importantly, we were able to account for batch effects to enable accurately predicting individual-CD status for individuals from a separately sequenced cohort. Furthermore, some of the genes selected by our method as relevant to CD were not previously identified, but were significantly enriched in some known CD pathways. Conclusions: AVA,Dx highlights new CD genes and pathways and accurately predicts CD-status. Note that using AVA,Dx techniques may help improve our understanding of other complex disease in the future.