Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner


Accepted Posters

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.


Track: Translational Medicine (TransMed)

Session B-230: Correlation analysis between tumor heterogeneity and genomic features in colorectal cancer
COSI: TransMed
  • Bo Young Oh, Ewha Woman University School of Medicine, South Korea
  • Je-Gun Joung, Samsung Medical Center, South Korea

Short Abstract: Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide and a leading cause of cancer-related death. Tumor heterogeneity plays a critical role in the progression of many cancer types and is a major obstacle to precision cancer therapy. In this study, we investigate whether tumor heterogeneity is associated with genomic features and clinical outcome in colorectal cancer (CRC) by whole-exome sequencing of primary and hepatic metastatic tumors from CRC patients. We found that tumor heterogeneity across CRC samples was highly variable and a lower rate of progression-free survival was associated with a high degree of tumor heterogeneity in primary CRC. Highly heterogeneous primary CRC was correlated with a higher rate of liver metastasis. Recurrent somatic mutations in CRC-associated major genes were frequently detected in highly heterogeneous CRC. Network based analysis revealed that major genomic and transcriptomic features may contribute to the degree of tumor heterogeneity. Our results provide insight into the mechanism of metastasis in CRC and provide a basis for the development of novel therapeutic agents for disease treatment.

Session B-232: A Longitudinal Transcriptome Analysis Identifies Novel Gene Expression Signatures for Body Mass Index in Monocytes
COSI: TransMed
  • Christian Müller, Department of General and Interventional Cardiology, University Heart Center Hamburg, Germany
  • Maria F Hughes, UKCRC Centre of Excellence for Public Health Northern Ireland, Queens University Belfast, Ireland
  • Francisco Ojeda, Department of General and Interventional Cardiology, University Heart Center Hamburg, Germany
  • Daniela Börnigen, Department of General and Interventional Cardiology, University Heart Center Hamburg, Germany
  • Karl J Lackner, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Mainz, Germany
  • Philipp S Wild, Center for Thrombosis and Hemostasis, Mainz, Germany
  • David A Trégouët, Sorbonne Universités, UPMC, INSERM, UMR_S 1166, ICAN Institute for Cardiometabolism and Nutrition, Paris, France
  • Stefan Blankenberg, Department of General and Interventional Cardiology, University Heart Center Hamburg, Germany
  • Tanja Zeller, Department of General and Interventional Cardiology, University Heart Center Hamburg, Germany

Short Abstract: Background and study aims: More than 600 million obese people worldwide are at higher risk of developing cardiometabolic disorders. However, the underlying molecular mechanisms are not fully understood. We aimed to identify longitudinal gene expression signature changes of circulating monocytes in relation to weight gain/loss during five years follow-up in a large population of healthy subjects. Methods: mRNA was isolated from monocytes of 1,092 subjects of the Gutenberg Health Study at baseline study visit (BL) and at five years follow-up (FU). Whole-transcriptome gene expression was measured by Illumina HT12 BeadChip microarrays and technical variance was reduced using quantile normalization and ComBat. In addition, probabilistic estimation of expression residuals (PEER) was used to estimate and correct for hidden factors in measured gene expression. For each subject and each gene, the mRNAdelta was calculated (mRNA.FU – mRNA.BL). Weighted gene coexpression network analysis (WGCNA) was applied to gene expression data in order to identify sub-networks (or modules) of coregulated genes based on their mRNA changes over time (mRNA.delta). The first principal component derived from gene expressions belonging to each identified module (module eigengene) was calculated and tested for association with longitudinal body mass index (BMI) changes. Associations between module eigengene and BMI changes that satisfied the Bonferroni corrected statistical threshold of 0.05 were considered significant. Pathway and gene-set enrichment analyses were performed using LYNX (http://lynx.ci.uchicago.edu/) and by Fisher’s exact test, respectively. Results: From the set of 12,170 genes found expressed in circulating monocytes, we identified 152 modules of co-regulated genes over time among which 6 were significantly associated to longitudinal BMI changes. One module was significantly enriched with genes involved in nucleosome assembly (p = 1.9×10-9) in LYNX analysis. Furthermore, one module was ~6-fold enriched (p = 2.5×10-5) with a signature separating classical CD14++CD16- from non-classical CD14+CD16++ monocytes indicating a shift of monocyte composition during obesity development. Conclusion: Cluster analysis of mRNA changes over time identified a few networks of genes reflecting molecular alterations during weight gain/loss. One of these network was highly enriched with signatures of specific monocyte sub-types, suggesting a shift of monocyte sub-populations towards non-classical monocytes. The latter finding is consistent with current studies reporting a shift in monocyte composition during obesity and cardiovascular disease development.

Session B-234: Logic model for connecting somatic mutations to kinome activities
COSI: TransMed
  • Yusuke Matsui, Graduate school of medicine, Nagoya university, Japan
  • Hideko Kawakubo, Graduate school of medicine, Nagoya university, Japan
  • Teppei Shimamura, Graduate school of medicine, Nagoya university, Japan

Short Abstract: Large-scale integrated cancer genome and proteome characterization efforts including the cancer genome atlas and clinical proteomic tumor analysis consortium have opened unprecedented opportunities to reveal the comprehensive understandings of cancer biology. An important challenge is organizing our knowledge how the genomic events drive the proteome and phosphoproteome to form phenotypic characteristics. Protein kinases, that are often the therapeutic targets - and their activities can be estimated from proteome and phosphoproteome - regulate key processes such as cellular proliferation, survival and migration, and hence they are well poised to contribute to the various hallmarks of cancer if dysregulated. In particular, phenotypic characterization with the intersection of genome and kinome could give us significant implications for developing clinical strategies and potential therapeutic targets. We developed statistical approach to connect somatic mutations to kinase activities combined with the clinical outcomes. We employ the logic regression modeling framework where responses are survival time and predictors are binary inputs of somatic mutations and binarized kinase activities. The model estimates the effects of logical combinations i.e., and / or / not, between the mutations and activated kinases to explain the survival times of patients. We applied the method to actual proteogenomic dataset of 77 patients of breast cancer from the cancer genome atlas with clinical information and will show the effectiveness and interpretability of the approach.

Session B-236: An equivalence testing approach for the integrative analysis of multiple gene lists
COSI: TransMed
  • Alex Sánchez-Pla, Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Spain
  • Miquel Salicrú, Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Spain
  • Jordi Ocaña, Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Spain

Short Abstract: The analysis of gene lists (or other features such as proteins, microRNAs etc.) has been a very active field since the appearance of omics data analysis. Its main application has been in the development of many distinct but related methods and tools for biological significance analysis -Over Representation Analysis and Gene Set Enrichment Analysis (3) being the best known - but also the development of gene sets and signatures databases that have been applied for example for classification or prognostic purposes (1, 2). Curiously one topic that has received much less attention than obtaining or applying gene lists is their comparison, not in terms of gene ranking, but from the point of view of their biological meaning. Comparing gene lists seems a very relevant problem, especially in the post-genomics age, where multiple datasets from the same or different type, are available for study. This can be used for example, in data fusion problems to decide if some datasets may be merged, in data mining of gene signatures to decide if certain gene sets are equivalent or in a meta-analysis context where several studies can be compared through the gene lists they have produced. Although a few methods have been developed for the comparison between gene lists, the goProfiles approach (4, 5) is, to our knowledge, one of the few that is being used for that purpose relying on the functional representation of the gene lists (alternative methods based on comparing ranked gene lists have different purpose and are not considered here). Very briefly this approach consists of taking lists of genes and projecting them into predefined levels or slices of the Gene Ontology, in such a way (called “expanded functional profiles”) that a multinomial model can be used for estimation and testing. In many analyses the main interest is in establishing -not in rejecting- the similarity between two lists so that, instead of the classical hypothesis tests setting, designed to reject a null hypothesis of equality we derived an equivalence method which uses a distance–based approach and the confidence interval inclusion principle. Equivalence is declared if a one–sided confidence inter¬val for the distance between two profiles is below a pre–established equivalence limit. In this talk we show how the equivalence test developed for one pair of lists can be naturally extended to establish the equivalence of any number of gene lists by taking an iterative approach that combines a bottom-up approach to determining the most to least equivalent gene lists while adjusting for multiple testing. The applicability of the method will be demonstrated on two typical situations for this approach. By one side it will be applied to the comparison of two groups of gene lists, one made by Cancer-related gene lists (http://www.bushmanlab.org/links/genelists) and the other by pathogenesis-based transcripts sets (http://atagc.med.ualberta.ca/Research/GeneLists/Pages/default.aspx). The comparison not only shows how the lists tend to be more similar according to their origin but it also helps classifying the lists of each group by their similarity. A second example from an unpublished cancer study where more than 30 gene lists were derived from individual comparisons will show how this approach can be used to cluster the resulting lists suggesting ways to simplify the posterior analysis of all comparisons. The methods developed will be available in the next Bioconductor release and in the devel branch of the goProfiles package. [1] Beck, Andrew H., Nicholas W. Knoblauch, Marco M. Hefti, et al. PLOS Computational Biology 9(1): e1002875. [2] Birrer, Michael J., Markus Riester, Wei Wei, et al. Journal of Clinical Oncology 32(15_suppl): 5531–5531. [3] Khatri, Purvesh, Marina Sirota, and Atul J. Butte. PLOS Computational Biology 8(2): e1002375. [4] Salicrú, Miquel, Jordi Ocaña, and Alex Sánchez-Pla 2011. BMC Bioinformatics 12: 401 (1-13) [5] Sanchez, A, Ocana J., Salicru M. http://bioconductor.org/packages/goProfiles

Session B-238: Cell-Specific Computational Modeling of the PIM Pathway in Acute Myeloid Leukemia
COSI: TransMed
  • Dana Silverbush, Tel-Aviv university, Israel
  • Shaun Grosskurth, AstraZeneca, USA
  • Dennis Wang, AstraZeneca, United Kingdom

Short Abstract: Personalized therapy is a major goal of modern oncology, as patient responses vary greatly even within a histologically defined cancer subtype. This is especially true in acute myeloid leukemia (AML), which exhibits striking heterogeneity in molecular segmentation. When calibrated to cell-specific data, executable network models can reveal subtle differences in signaling that help explain differences in drug response. Furthermore, they can suggest drug combinations to increase efficacy and combat acquired resistance. Here, we experimentally tested dynamic proteomic changes and phenotypic responses in diverse AML cell lines treated with pan-PIM kinase inhibitor and fms-related tyrosine kinase 3 (FLT3) inhibitor as single agents and in combination. We constructed cell-specific executable models of the signaling axis, connecting genetic aberrations in FLT3, tyrosine kinase 2 (TYK2), platelet-derived growth factor receptor alpha (PDGFRA), and fibroblast growth factor receptor 1 (FGFR1) to cell proliferation and apoptosis via the PIM and PI3K kinases. The models capture key differences in signaling that later enabled them to accurately predict the unique proteomic changes and phenotypic responses of each cell line. Furthermore, using cell-specific models, we tailored combination therapies to individual cell lines and successfully validated their efficacy experimentally. Specifically, we showed that cells mildly responsive to PIM inhibition exhibited increased sensitivity in combination with PIK3CA inhibition. We also used the model to infer the origin of PIM resistance engineered through prolonged drug treatment of MOLM16 cell lines and successfully validated experimentally our prediction that this resistance can be overcome with AKT1/2 inhibition.

Session B-240: A large-scale compound-compound network based on bioactivity profile similarity
COSI: TransMed
  • Haeseung Lee, Ewha Research Center for Systems Biology(ERCSB), Ewah Womans University, South Korea
  • Wankyu Kim, Ewha Research Center for Systems Biology(ERCSB), Ewah Womans University, South Korea

Short Abstract: Recently, more than two million bioassay dataset for drug screening became publicly available at PubChem (https://pubchem.ncbi.nlm.nih.gov). It provides a wealth of information on the bioactivity profiles for millions of compounds. The bioactivity profile of a drug represents diverse layers of bioactivities that collectively responsible for its mode-of-action and phenotypic outcome. Here, we constructed a large-scale compound network (CNET) based on bioactivity profile similarities among >370K compounds. We evaluated the utility of CNET in the context of virtual screening and the prediction of target profiles. In both applications, we tool the "guilty by association" approach, where two neighboring compounds in our network are expected to share common targets. In virtual screening, known ligands were predicted highly accurately (mean AUC > 0.9) for >80% of the ~1,000 targets tested. In the prediction of target profiles for ~100K compounds, we show that the top ranking target is among the known ligands for >80% of the compounds tested. It suggests that the compounds sharing common targets are densely connected in our network. We also discuss the potential application of this network in identifying the targets of uncharacterized compounds and off-target identification of approved drugs.

Session B-242: Mutational Landscape of the Transcriptome in Patients with Hematological Malignancies
COSI: Transmed
  • Fiorella Schischlik, CeMM, Research Center for Molecular Medicine, Austria
Session B-244: The Origins and Consequences of Tumour Hypoxia
COSI: Transmed
  • Vinayak Bhandari, University of Toronto & Ontario Institute for Cancer Research, Canada
  • Shadrielle Espiritu, Ontario Institute for Cancer Research, Canada
  • Lydia Liu, Ontario Institute for Cancer Research, Canada
  • Emilie Lalonde, University of Toronto & Ontario Institute for Cancer Research, Canada
  • Takafumi Yamaguchi, Ontario Institute for Cancer Research, Canada
  • Lawrence Heisler, Ontario Institute for Cancer Research, Canada
  • Julie Livingstone, Ontario Institute for Cancer Research, Canada
  • Vincent Huang, Ontario Institute for Cancer Research, Canada
  • Yu-Jia Shiah, Ontario Institute for Cancer Research, Canada
  • Veronica Sabelnykova, Ontario Institute for Cancer Research, Canada
  • Fouad Yousif, Ontario Institute for Cancer Research, Canada
  • Melvin Chua, Duke-NUS Graduate Medical School, Singapore
  • Michael Fraser, Ontario Institute for Cancer Research, Canada
  • Theodorus van der Kwast, University of Toronto, Canada
  • Paul Boutros, University of Toronto & Ontario Institute for Cancer Research, Canada
  • Robert Bristow, University of Toronto & Princess Margaret Hospital, Canada

Short Abstract: Introduction: Localised prostate cancers are classified into risk-groups using clinical measurements like grade and stage to inform treatment decisions. However, these groupings are imprecise: ~30% of intermediate-risk patients suffer relapse of their disease despite precision image-guided radiotherapy or radical prostatectomy. One reason for this variability in response to treatment is the underlying cellular and molecular heterogeneity of tumours. Prostate tumour cells exist within a microenvironment characterized by gradients of oxygen levels and prostate tumours with low levels of oxygen (hypoxia) have poor clinical outcomes.

Methods and Results: To understand the correlates of hypoxia in cancer we conducted a pan-cancer analysis of copy number alterations (CNAs) and single nucleotide variants (SNVs) across 18 cancer types. We measured hypoxia using multiple mRNA-based signatures and discovered numerous CNAs and SNVs enriched or depleted in hypoxic tumours, highlighting the role of hypoxia in shaping the genomic landscape of multiple tumour types. Next, we examined 548 patients with localised prostate cancer and statistically assessed the association of hypoxia with CNAs, SNVs, genomic rearrangements, focal genomic events (i.e. kataegis, chromothripsis), telomere length, clinical indices (i.e. grade, stage) and subclonal architecture. Tumour hypoxia is associated with specific CNAs and SNVs in prostate cancer driver genes. To translate these findings into a biomarker for prostate cancer precision medicine, we integrated tumour microenvironmental data with genomic and pathological information to stratify patients into distinct prognostic groups.

Impact: These data suggest that the aggressiveness of cancers is driven by the interplay of the tumour microenvironment and its genomic mutational profile.

Session B-246: Assessing spatiotemporal heterogeneity in cancer patients using alignment-free methods
COSI: TransMed
  • Aideen Roddy, Queen's University Belfast, United Kingdom
  • Anna Jurek, Queen's University, United Kingdom
  • Chris O'Neill, Queen's University Belfast, United Kingdom
  • Alexey Stupnikov, Queen's University Belfast, United Kingdom
  • Paul O'Reilly, Queen's University Belfast, United Kingdom
  • Peter Bankhead, Queen's University Belfast, United Kingdom
  • Philip Dunne, Queen's University Belfast, United Kingdom
  • David Gonzalez de Castro, Queen's University Belfast, United Kingdom
  • Kevin Prise, Queen's University Belfast, United Kingdom
  • Manuel Salto-Tellez, Queen's University Belfast, United Kingdom
  • Darragh McArt, Queen's University Belfast, United Kingdom

Short Abstract: It is evident in cancer research that there is an increasing need for measures of intra-tumoural heterogeneity and spatial and temporal evolution. Current approaches involve the alignment and assembly of sequencing data to a reference human genome. This is not only time consuming and computationally intensive but can lead to information loss due to the subjective design of alignment algorithms as well as the aneuploidy and heterogeneity present in tumour samples. Alignment-free sequence comparison is a novel methodology that has been heavily implemented in protein sequencing comparison and assessing evolutionary relationships between organisms. Here, we present an alignment-free method of building phylogenetic trees for assessing heterogeneity and therapeutic vulnerabilities in cancer cohorts. This involves segmenting reads into k-mers using a ‘sliding window’ approach to build a frequency distribution of features present for each sample followed by a measure of dissimilarity to compare sequences. Thus far, we have applied our algorithm to three datasets: a simulated dataset and two datasets of longitudinal patient samples in both glioma and renal cancer. For these patients we also have somatic mutation based phylogenetic trees for direct comparison with current measures of evolution. In two out of five (one glioma and one renal) cancer patients our trees produced the same branching pattern as the original trees suggesting that these could be representative of tumour evolution. However, in the remaining 3 cases we found that our trees varied slightly from the original trees suggesting that our approach may have captured a more aberrant landscape.

Session B-248: In Silico Drug Combination Discovery for Personalized Cancer Therapy
COSI: TransMed
  • Minji Jeon, Korea University, South Korea
  • Sunkyu Kim, Korea University, South Korea
  • Sungjoon Park, Korea University, South Korea
  • Heewon Lee, Korea University, South Korea
  • Wonjin Yoon, Korea University, South Korea
  • Jaewoo Kang, Korea University, South Korea

Short Abstract: Combination therapy, which is considered as an alternative to single drug therapy, can potentially reduce resistance and toxicity, and show synergistic efficacy. As drug combination therapies are widely used in the clinic for hypertension, asthma, and AIDS, they have also been proposed for the treatment of cancer. However, it is difficult to select and experimentally evaluate effective combinations because not only is the number of cancer drug combinations extremely large but also the effectiveness of drug combinations varies depending on the genetic variation of cancer patients. A computational approach that prioritizes the best drug combinations considering the genetic information of a cancer patient is necessary to reduce the search space. Therefore, we propose an in-silico method for personalized drug combination therapy discovery. We predict how synergistic between 583 drug combinations and 31 cancer cell lines. The main difficulty was the high dimensionality problem of the feature space. The number of genomic features such as mutations and expressions of more than 10,000 genes should be reduced to a much smaller size. To address this, we reduce feature dimension by using only the mutations or expression levels of the genes in cancer-related pathways. We aim to find the best combinations of feature layers because there are still many features despite feature dimension reduction. We have 77 feature layers such as gene mutations in the p53 pathway or expression levels of genes in the RTK signaling pathway. However, testing all possible combinations of feature layers would require a considerable amount of computing resources. To address this problem, we constructed a high throughput computing pipeline using HTCondor. The pipeline greedily selects feature layers for improving the prediction performance. Support Vector Regression (SVR) is the prediction model for the pipeline. We analyze the selection results and interpret which selected feature layers are important for predicting the synergism of drug combinations.

Session B-250: CLIA-certified gene panel-based machine learning method to predict sensitivity of anticancer drugs
COSI: TransMed
  • Sunho Park, Lerner Research Institute, Cleveland Clinic Foundation, United States
  • Jeon Lee, UT Southwestern Medical Center, United States
  • Yunpeng Gao, UT Southwestern Medical Center, United States
  • Elizabeth Mcmillan, UT Southwestern Medical Center, United States
  • Zechen Chong, UT MD Anderson Cancer Center, United States
  • Ken Chen, UT MD Anderson Cancer Center, United States
  • Jae-Ho Cheong, Yonsei University College of Medicine, South Korea
  • John Minna, UT Southwestern Medical Center, United States
  • Michael White, UT Southwestern Medical Center, United States
  • Tae Hyun Hwang, Cleveland Clinic Foundation, United States

Short Abstract: CLIA certified genetic panel testing offers potential to identify individualized treatments that target specific genetic alterations. However, molecularly-guided therapy is only available for the minority of cancer patients carrying such alterations for targeted drugs (e.g., ~15% of lung adenocarcinoma). The selection of treatment for the majority of cancer patients without such alterations is still limited. Thus, it is critical to build predictive models based on information of genetic panel testing to predict sensitivity/resistance of drug for individualized treatment stratification. To tackle this challenge, we develop a novel machine learning approach called Robust Bayesian Matrix Factorization (RBMF) to integrate genetic information on a large panel of cancer cell lines with large-scale drug/chemical compound screening profiles on these same cell lines to (a) discover genetic variation-based predictive biomarkers and (b) use this to predict response of drugs. The RBMF method leverages information across multiple related drug/chemical compound screening profiles that have similar mechanisms of actions/targets as well as samples with similar genetic variant profiles. In experiments with our institutional drug/chemical screening profiles (unpublished data) and SNVs present in a commercially available genetic panel such as FoundationOne, the RBMF method showed better prediction performance compared to the state-of-the-art methods. Taken together, our method demonstrated the clinical utility of genetic panels to predict drug response in cancer patients. Furthermore, the novel mutation-drug association discovered by the RBMF method could provide unprecedented opportunities to develop a clinical assay as a predictive biomarker, which could individualize treatments based on genetic information of cancer patients.

Session B-252: A comparative study on immune cell signature for Korean lung cancer patients
COSI: TransMed
  • Yelin Son, Ewha Womans University, South Korea
  • Sanghyuk Lee, Ewha Womans University, South Korea

Short Abstract: Lung cancer is the prime target of recent cancer immunotherapy, and the immune cell signature is an important predictor of immunotherapy response in lung cancer. It was demonstrated that patient prognosis is closed correlated with the abundance of T cells and B cells. MCP-counter (Microenvironment Cell Populations-counter) is a tool to calculate the abundance of eight immune cell types and two stromal cell types from transcriptome sequencing data based on deconvolution of expression signature for each cell type. Here, we analyzed RNA-seq data from 197 Korean lung cancer patients and 506 TCGA LUAD patients. Abundance of each cell type was investigated for likely associations with various patient characteristics including race, smoking status, cancer subtypes, and many clinical parameters. Our results suggest that the patient-specific immune cell abundances deduced from RNA-seq data could be useful to predict prognosis of lung cancer patients and to determine the effectiveness of immunotherapy.

Session B-254: Unraveling the Collective Diagnostic Power Behind the Features in the Autism Diagnostic Observation Schedule
COSI: TransMed
  • Harkirat Bhullar, University of Saskatchewan, Canada
  • Kimberly MacKay, University of Saskatchewan, Canada
  • Katie Ovens, University of Saskatchewan, Canada
  • Anthony Kusalik, University of Saskatchewan, Canada

Short Abstract: Background: Autism is a group of heterogeneous disorders defined by deficits in social interaction and communication. Typically, diagnosis depends on the results of a behavioural examination called the Autism Diagnostic Observation Schedule (ADOS). Unfortunately, administration of the ADOS exam is time-consuming and requires a significant amount of expert intervention, leading to delays in diagnosis and access to early intervention programs. The diagnostic power of each feature in the ADOS exam is currently unknown. Our hypothesis is that certain features could be removed from the exam without a significant reduction in diagnostic accuracy, sensitivity or specificity. Objective: Determine the smallest subset of predictive features in ADOS module-1 (an exam variant for patients with minimal verbal skills). Methodology: ADOS module-1 datasets were acquired from the Autism Genetic Resource Exchange and the National Database for Autism Research. The datasets contained 2572 samples with the following labels: autism (1763), autism spectrum (513), and non-autism (296). The datasets were used as input to 4 different cost-sensitive classifiers in Weka (functional trees, LADTree, logistic model trees, and PART). For each classifier, a 10-fold cross validation was preformed and the number of predictive features, accuracy, sensitivity, and specificity was recorded. Results & Conclusion: Each classifier resulted in a reduction of the number of ADOS features required for autism diagnosis. The LADtree classifier was able to obtain the largest reduction, utilizing only 10 of 29 ADOS module-1 features (96.8% accuracy, 96.9% sensitivity, and 95.9% specificity). Overall, these results are a step towards a more efficient behavioural exam for autism diagnosis.

Session B-256: Computational proteogenomic identification of translated fusions and micro structural variations in cancer
COSI: TransMed
  • Yen Yi Lin, School of Computing Science, Simon Fraser University, Canada
  • Alex Gawronski, School of Computing Science, Simon Fraser University, Canada
  • Faraz Hach, Vancouver Prostate Centre; School of Computing Science, Simon Fraser University, Canada
  • Sujun Li, School of Informatics and Computing, Indiana University, Bloomington, United States
  • Ibrahim Numanagic, School of Computing Science, Simon Fraser University, Canada
  • Iman Sarrafi, School of Computing Science, Simon Fraser University, Canada
  • Swati Mishra, Department of Surgery, Indiana University, School of Medicine, United States
  • Andrew McPherson, School of Computing Science, Simon Fraser University, Canada
  • Colin Collins, Vancouver Prostate Centre, Canada
  • Milan Radovich, Department of Surgery, Indiana University, School of Medicine, United States
  • Haixu Tang, School of Informatics and Computing, Indiana University, Bloomington, United States
  • Cenk Sahinalp, School of Informatics and Computing, Indiana University, Bloomington, United States

Short Abstract: Rapid advancement in high throughput sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same patient. Notably, The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC) provide proteogenomics characterization of colonrectal and breast cancer patients, but they focus primarily on single amino acid variants and novel isoforms without including genomic aberrations that are frequently observed in specific cancer types, such as fusions, inversions, and duplications. Once these aberrations occur in coding regions they can generate novel proteins with high impact on cancer phenotypes. Here we introduce a computational framework which integratively analyzes all three types of omics data to obtain a complete molecular profile, especially translated fusions and micro structural variants (microSVs), for a patient. Our framework includes MiStrVar, an algorithm to identify microSVs on WGS data. Coupled with the fusion detection tool deFuse, our framework first detects structurally aberrant transcripts in cancer samples. Given the obtained breakpoints, our pipeline can identify all relevant peptides spanning the breakpoint junctions, and match them with unique proteomic signatures in the respective proteomics data sets. We have applied our framework to 105 TCGA/CPTAC breast cancer patients with both NGS and proteomics data, and we detected 244 novel peptides from 432 candidate aberrations. Many of these discovered aberrations are private and have not been reported before. In addition, the most significantly enriched genes involved in these translated aberrations are cancer-related genes.

Session B-258: Phrase Mining and Machine Learning in Textual Data to Uncover Distinct Protein Patterns in Cardiovascular Disease
COSI: TransMed
  • Vincent Kyi, HeartBD2K Center at UCLA, United States
  • David Liem, HeartBD2K Center at UCLA, United States
  • Yu Shi, KnowEnG Center, United States
  • Fangbo Tao, KnowEnG Center, United States
  • Jiawei Han, KnowEnG Center, United States
  • Peipei Ping, HeartBD2K Center at UCLA, United States

Short Abstract: Over the past decades, mounting information from textual data in cardiovascular disease (CVD) has accumulated, but it is unmatched by the capacity of free-text analytical tools, and most information is not properly analyzed or even neglected. To address these big data science challenges, we developed a text-mining and machine learning algorithm to dissect textual data on CVD and identify protein patterns in datasets to uncover mechanistic insights and complement predictive analytics. We applied a novel phrase mining workflow, Context-aware Semantic Online Analytical Processing (CaseOLAP), to recognize patterns from six CVD datasets based on their MeSH-terms: cerebrovascular accidents (CVA), cardiomyopathies (CM), ischemic heart diseases (IHD), arrhythmias (ARR), valvular heart disease (VHD) and Congenital Heart Disease (CHD). We analyzed the patterns of 8,325 proteins referenced to the heart. We screened 1.1 million publications (1995-2016) in Pubmed. Out of 8,325 proteins, only a subset exhibited a high CaseOLAP score indicating a high relevance in CVD, with the largest scores in IHD, CM and CVA. We identified six high scoring protein clusters unique to one CVD group. A principle component analysis (PCA) indicated that IHD, CVA, and CM showed distinct protein scoring patterns, while CHD, VHD, and ARR were clustered and more similar. We identified 10 protein clusters shared between two or more CVD groups with key biological functions in inflammation, contractility, blood coagulation, hemodynamic regulation, cytoskeletal organization and neurotransmission. Inflammatory proteins appeared to be relevant in all CVDs, while proteins in neurotransmission and memory processing were relevant in CVA, ARR, VHD, and CHD.

Session B-260: HD Proteome Base: A Novel Data Repository for Proteomics of a Huntington’s Disease Mouse Model
COSI: TransMed
  • Phillip Ihmor, Evotec GmbH, Germany
  • Christoph Schaab, Evotec GmbH, Germany
  • Andreas Tebbe, Evotec GmbH, Germany
  • Manuela Machatti, Evotec GmbH, Germany
  • Martin Klammer, Evotec GmbH, Germany
  • Tao Xu, Evotec GmbH, Germany
  • Jim Rosinski, CHDI Foundation Inc., United States
  • Ignacio Munoz-Sanjuan, CHDI Foundation Inc., United States
  • Thomas Vogt, CHDI Foundation Inc., United States
  • Daniel Lavery, CHDI Foundation Inc., United States
  • Jeffrey S. Aaronson, CHDI Foundation Inc., United States

Short Abstract: Background Huntington’s disease (HD) is a hereditary, neurodegenerative disease caused by an abnormal expansion of a glutamine stretch (polyQ) in the sequence of Huntingtin protein (HTT). Extensive research efforts are aimed at understanding molecular mechanisms altered in the presence of the aberrant HTT. To this end, we compared the proteomes between wild-type and heterozygous Huntingtin knock-in mice with increasing CAG repeat lengths in a number of different brain regions and peripheral tissues at three different ages. Methods A large-scale proteomic analysis was conducted on various brain regions (striatum, cortex, cerebellum, and hippocampus) and peripheral tissues (liver, muscle, and heart) from wild-type and heterozygous Huntingtin knock-in mice with increasing CAG repeat lengths aged 2, 6, and 10 months. For storage and query of results, a user-friendly web interface was created Results The analysis of more than 1,200 tissue samples with on average 8,000 quantified proteins comprises one of the largest global, quantitative proteomics studies published so far. More importantly, it allows a systematic analysis of pathways and interaction networks on the protein level, the identification of novel target candidates, and provides a comprehensive resource for training of system biology models. Here, we describe a novel data repository termed HD Proteome Base, which stores and displays the quantitative proteome profiles collected within this large-scale study. The repository will be publicly accessible through a web portal that allows the researcher to query for proteins and to visualize their expression across the CAG repeat length series and across different brain tissues as well as peripheral tissues. Conclusions Our novel, publicly available HD Proteome Base affords researchers from all over the globe to access and employ this very comprehensive resource and, thereby, contribute to research efforts towards gaining a better understanding of huntingtin biology.

Session B-262: Gene expression of follicular thyroid adenomas and carcinomas
COSI: TransMed
  • Aleksandra Pfeifer, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Bartosz Wojtas, Laboratory of Molecular Neurobiology, Neurobiology Center, Nencki Institute of Experimental Biology, Poland
  • Malgorzata Oczko-Wojciechowska, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Jolanta Krajewska, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Agnieszka Czarniecka, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Aleksandra Kukulska, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Markus Eszlinger, Department of Oncology & Arnie Charbonneau Cancer Institute, Cumming School of Medicine, University of Calgary, Canada
  • Thomas Musholt, Department of General, Visceral, and Transplantation Surgery, University Medical Center of the Johannes Gutenberg University, Germany
  • Tomasz Stokowy, Department of Clinical Science, University of Bergen, Norway
  • Michal Swierniak, Genomic Medicine, Department of General, Transplant, and Liver Surgery, Medical University of Warsaw, Poland
  • Ewa Stobiecka, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Ewa Chmielik, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Dagmara Rusinek, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Tomasz Tyszkiewicz, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Monika Halczok, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Steffen Hauptmann, Department of Pathology, Martin Luther University Halle-Wittenberg, Germany
  • Dariusz Lange, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Michal Jarzab, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland
  • Ralf Paschke, Division of Endocrinology, Departments of Medicine, Biochemistry & Molecular Biology, and Oncology, and Arnie Charbonneau Cancer Institute, Cumming School of Medicine, University of Calgary, Canada
  • Barbara Jarzab, Maria Sklodowska-Curie Institute – Oncology Center, Gliwice Branch, Poland

Short Abstract: Follicular thyroid adenomas (FTA) and follicular thyroid carcinomas (FTC) are two types of thyroid tumors which are difficult to distinguish in histopathology. Diagnoses, even when performed by experienced pathologists, are often equivocal. Therefore, there is a need for molecular classifiers that may aid the differential diagnosis of these tumors. The aim of our study was to select genes differentially expressed between FTC and FTA and to create a gene expression classifier differentiating between FTC and FTA. We performed two distinct analysis of high-throughput gene expression data. First, we performed microarray experiment for 52 thyroid tumors and selected genes differentially expressed between FTC and FTA (Student’s t-test). Second, we performed meta-analysis of 14 papers in which follicular thyroid tumors were analyzed by gene expression high-throughput methods and identified 57 genes that were reported as differentially expressed in two or more such papers. From both analyses we selected 18 genes for validation. We performed validation in an independent dataset of 71 FFPE thyroid tumors and positively validated 7 genes (Student’s t-test, FDR udjusted p-value <0.05). Further, we created the DLDA classifier based on gene expression of 4 genes (CPQ, PLVAP, TFF3, ACVRL1). The classifier distinguishes between FTC and FTA with an accuracy of 78%, sensitivity 76% and specificity 80% in FFPE dataset. In conclusion, we identified and positively validated 7 genes differentially expressed between FTC and FTA. We also created the 4-gene classifier which gives 78% accuracy in FTC/FTA classification and may potentially support the histopathology diagnosis. Acknowledgment: This study was supported by the Polish National Center of Research and Development MILESTONE project - molecular diagnostics and imaging in individualized therapy for breast, thyroid and prostate cancer (grant No. STRATEGMED2/267398/4/NCBR/2015)

Session B-264: Genomic instability phenotypes in multidimensional genomic cancer studies
COSI: TransMed
  • Brittany Lasseigne, HudsonAlpha Institute for Biotechnology, United States
  • Sara Cooper, HudsonAlpha Institute for Biotechnology, United States
  • Greg Cooper, HudsonAlpha Institute for Biotechnology, United States
  • Richard Myers, HudsonAlpha Institute for Biotechnology, United States

Short Abstract: The ability to infer molecular signatures from various genomic data sets can prioritize future experiments and provide biological insight into disease-associated pathways and mechanisms. In particular, chromosomal instability (CIN, altered chromosome number and structure) and the CpG island methylator phenotype (CIMP, widespread altered promoter methylation) represent distinct cancer etiologies and chemotherapy responses. These genomic instabilities demonstrate overlapping information content across data types because gross alterations in one feature set results in consistent changes in others (e.g. CIN is linked to widespread DNA hypomethylation and characteristic gene expression changes). In this proof of concept study in kidney cancer (n=291), we compared several definitions each of CIN (total number of breakpoints, number of genes with significant CNV, percent of bases with CNV, or total functional aneuploidy) and DNA methylation instability (the raw number of CpGs methylated, percent of CpGs methylated in CpG islands, shores or non- islands, and the density of methylated CpGs to non-methylated CpGs), characterizing their relationship to each other and clinical phenotypes like patient survival. Our results suggest these metrics are capturing distinct biological properties to a considerable extent. For example, we show that CIN better predicts long-term surviving patients (p=0.0559) while DNAm instability better predicts non-surviving patients (p=0.00017). Additionally, we predict both CIN (ROC AUC=0.866) and DNAm instability (ROC AUC=0.8817) from gene expression data. Our findings will facilitate prioritization of experiments in future studies, improve interpretation of these instability signatures for both basic biology and clinical use, and allow their inference from each of the major genomic data types.

Session B-266: Application of marker gene expression profiles for estimation of relative cellular abundance in human and mouse brain tissue
COSI: TransMed
  • Lilah Toker, University of British Columbia, Canada
  • Ogan Mancarci, UBC, Canada
  • Beryl Zhuang, UBC, Canada
  • Shreejoy Tripathy, UBC, Canada
  • Paul Pavlidis, University of British Columbia, Canada

Short Abstract: Neuropsychiatric/neurodevelopmental disorders are characterized by genetic heterogeneity. High-throughput methods, such as microarrays and RNAseq, are often used to study the underlying biological mechanisms and pathways. A major challenge in interpreting these studies is understanding the biological impact of the identified genes. Researchers often struggle with questions such as - which cells are expressing the affected genes? Are the observed changes cell type specific? Furthermore, it is unclear what part of the transcriptional pattern is driven by changes in cell type densities (e,g, cellular death or neuroinflammation, often reported in subjects with neuropsychiatric disorders). Currently, these questions are difficult to answer since cellular expression patterns are largely unknown for the majority of genes in the brain. Moreover, cell type specific changes are likely to be overlooked if the prevalence of the cell type in the examined tissue is low. To address these issues, we created NeuroExpresso, a database of gene expression data from 36 major brain cell-types. The database contains high-throughput expression data from both neuronal and glial cell types spanning all major brain regions. We used the expression data to identify reliable cellular marker genes and used the marker genes to infer changes in specific cellular populations based on bulk tissue data from various neuropsychiatric and neurodevelopmental disorders, including bipolar disorder, schizophrenia, autism, Alzheimer’s and Parkinson’s. Altogether we show that marker gene profiles be used to gain a better understanding of brain-related disorders and facilitate the analysis and interpretation of bulk tissue data.

Session B-268: Clinical relevance of TCF/LEF transcription factors from the viewpoint of public bioinformatical resources
COSI: TransMed
  • Dušan Hrčkulák, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Vídeňská 1083, 14220 Prague 4, Czech Republic, Czech Republic
  • Michal Kolář, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Vídeňská 1083, 14220 Prague 4, Czech Republic, Czech Republic
  • Hynek Strnad, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Vídeňská 1083, 14220 Prague 4, Czech Republic, Czech Republic
  • Vladimír Kořínek, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Vídeňská 1083, 14220 Prague 4, Czech Republic, Czech Republic

Short Abstract: T-cell factor/lymphoid enhancer-binding factor (TCF/LEF) proteins (TCFs) from the High Mobility Group (HMG) box family act as the main downstream effectors of the Wnt signaling pathway. The mammalian TCF/LEF family comprises four nuclear factors designated TCF7, LEF1, TCF7L1, and TCF7L2. The proteins display common structural features and are often expressed in overlapping patterns implying their redundancy. Such redundancy was indeed observed in gene targeting studies; however, individual family members also exhibit unique features that are not recapitulated by the related proteins. In this viewpoint, we summarize our current knowledge about the specific features of individual TCFs, namely structural-functional studies, posttranslational modifications, interacting partners, and phenotypes obtained upon gene targeting in the mouse. To do so, we employ several publicly available databases and bioinformatical approaches to evaluate the expression patterns and production of gene-specific isoforms of the TCF/LEF family members in human cells and tissues. The samples included mRNAs isolated from healthy adult tissues, cancer and leukemia cell lines, and colorectal carcinoma (CRC) specimens. Production of multiple TCF/LEF isoforms generated either by alternative promoter usage or mRNA splicing was found in various healthy tissues, cells or CRCs. Strikingly, increased abundance of TCF7 and LEF1 isoforms encoding dominant-negative proteins was observed in cancer specimens.

Session B-270: Investigation of meningioma samples by FTICR mass spectrometry
COSI: TransMed
  • Konstantin Bocharov, Moscow Institute of Physics and Technology, Institute of Energy Problems of Chemical Physics of RAS, Russia
  • Anatoly Sorokin, Moscow Institute of Physics and Technology, Russia
  • Vsevolod Shurhkhay, Moscow Institute of Physics and Technology, N. N. Burdenko Scientific Research Neurosurgery Institute, Russia
  • Igor Popov, Moscow Institute of Physics and Technology, Institute of Energy Problems of Chemical Physics of RAS, Russia
  • Evgeny Zhvansky, Moscow Institute of Physics and Technology, Russia
  • Nikita Levin, Moscow Institute of Physics and Technology, Russia
  • Alexander Potapov, N. N. Burdenko Scientific Research Neurosurgery Institute, Russia
  • Evgeny Nikolaev, Moscow Institute of Physics and Technology, Institute of Energy Problems of Chemical Physics of RAS, Skolkovo institute of science and technology, Russia

Short Abstract: Rapid and reliable tumor tissue identification during surgical operations is a challenging problem especially in neurosurgery, where the tissue is formless, and precision of tumor/normal tissue border determination is vital. Recently we presented a tissue profiling method based on a combination of electrospray ionization and liquid extraction of phospholipids directly from tumor tissue, which allowed to obtain a mass spectrometric profile in less than a minute [1]. For rapid tumor type identification, we proposed a method based on a search through a reference database containing mass spectra of samples characterized by histopathological methods. To validate the results of tissue profiling we identified the features that forming the molecular profiles of meningioma samples by high-resolution mass spectrometry which differs them from healthy brain tissue. Tumor samples were collected from dissected tissues (brain tumors) during neurosurgical operations in the N. N. Burdenko Scientific Research Neurosurgery Institute. Unmodified brain tissue samples were collected in the N. N. Burdenko Scientific Research Neurosurgery Institute during surgery of patients with drug-resistant epilepsy. New special spray-from-tissue ambient ion source [1] was used for tumor samples profiling. In this source, liquid extraction is immediately followed by ionization. For classification of tumor samples by lipid profiles a database filled with the results of the investigation from more than 100 brain tissue samples was created. High-resolution mass spectrometry data were obtained using Thermo LTQ FT Ultra mass spectrometer. For identification of potential biomarkers of meningioma molecular profiles of 28 meningioma samples was compared with profiles of nine unmodified brain samples. A number of features that differ tumor from healthy brain samples were identified and compared to the literature. The study was supported by Russian Scientific Foundation grant #16-15-10431.

Session B-272: Knowledge extraction using gene networks for cancer precision medicines
COSI: TransMed
  • Aurélie Martin, IPSEN Innovation, France
  • Laurent Naudin, IPSEN Innovation, France
  • Sébastien Tourlet, IPSEN Innovation, France

Short Abstract: Multi scale networks have become a key approach to understand systems of interacting genes/proteines connecting diverse phenomena ranging from the molecular scale to the clinical descriptors. This is a key application for the discovery of potential therapeutic targets as well as the identification of patient stratification candidate biomarkers. This database is the results of a methodological exercise to automate a pipeline from the cleaning of raw data to the analysis of the biological meaning by graph theory on a multi-scale biological network. Furthermore, we fed the platform with curated heterogeneous datasets, pre-clinical and clinical, including molecular and phenotypic information. We identified about 6000/7000 genes co-regulated split into : driver genes (OMIM® database), gene co-expression network (GEO, Array Express), gene biological functions (Gene Ontology), protein interactions (GeneGo MetaCore, HPRD, IntAct, DIP, REACTOME) and cellular pathways. In conclusion, this multiscaled network was use to predict ranks for drugs in cancer precision medicine.

Session B-274: Predictive ranking of drug efficiency in cancer precision medicine using topological features and association rules in context dependent gene networks
COSI: TransMed
  • Aurélie Martin, IPSEN Innovation, France
  • Frédéric Parmentier, Ariana Pharma, France
  • Sébastien Tourlet, IPSEN Innovation, France
  • Laurent Naudin, IPSEN Innovation, France

Short Abstract: The number of antineoplastic and immunomodulation agents approved by the FDA and other agencies has greatly risen during the last decade. Although this shows a positive trends for therapeutic innovation in oncology, it is also a sign of the immaturity of the field. The number of commercially available anti-cancer drugs reaches almost 500, and yet the success of treatments remains mitigated: low efficacy, tumor treatment escapes and adverse events remains issues that still need to be tackled in a systematic manner to improve success rate of cancer treatments. To address these issues, we developed a framework to rank anticancer drugs in a context dependent manner: using lists of drug targets and pertinent cancer drivers, we propose an integrated method to rank drugs based on multi-layer gene networks and association rules mining. Our method is relying on two scores that are complementary to assess the likelihood of efficiency of a drug in a given context. We show that this method is coherently ranking drug in different contexts, either for primary tumor, or recurring tumor that is escaping a treatment.

Session B-276: Selection of stable biomarker signature for prediction of metabolic phenotypes
COSI: TransMed
  • Jelena Čuklina, ETH Zurich, Switzerland
  • Yibo Wu, ETH Zurich, Switzerland
  • Evan Williams, ETH Zurich, Switzerland
  • María Rodríguez-Martínez, IBM Zurich Research Laboratory, Switzerland
  • Ruedi Aebersold, ETH Zurich, Switzerland

Short Abstract: In biomarker research, the goal is to construct an accurate prediction rule on the basis of a small number of predictors to make it practical in the clinical setting. Such rule typically is used to determine presence of disease/malignancy, to stratify patients into subtypes or to select an optimal treatment option. Advances in high-throughput data generation have dramatically expanded the search space for biomarker discovery, making the selection of an optimal biomarker signature from large and noisy datasets challenging. In this project, we aim to benchmark the procedure of biomarker signature selection, using measurements of 35 murine strains from the BXD genetic reference strain panel [1]. Animals of each strain were respectively exposed to high-fat and chow diets [2], yielding 70 samples in total. We use 2100 liver proteins measured with SWATH-MS to predict seven metabolism-related continuous phenotypic traits: body weight, fat mass, lean mass, blood glucose and insulin levels, body temperature during the cold test, respiration volume. Some of these traits are strongly linked to liver biochemistry (fat, glucose), and some are, to our knowledge, independent (e.g. body temperature). Thus, we select biomarker signature of few proteins to predict the respective phenotype using several feature selection approaches. The ability of the model to generalize shows whether the model derived from the experiment can be ultimately translated to clinical setting. For each trait, we select best features multiple times using cross-validation. If the signatures for each sampling are similar (the signature is stable), the joint signature can be generalized to an independent population. Using various metrics, we estimate the stability of biomarker signatures for each feature selection approach. We show that the stability of the signature heavily depends on the relevance of the molecular profile to the phenotype it aims to predict. For our dataset, biomarker signature derived from liver proteome is stable for fat mass and glucose level, which are traits related to liver metabolism, and is highly unstable for body temperature. Thus, stability can be used to assess whether the biomarker signature is in principle identifiable from the given dataset.

Session B-278: Deciphering aggressive CpG Island Methylator Phenotype in adrenocortical carcinoma
COSI: TransMed
  • Gwenneg Kerdivel, INSERM, France
  • Jérôme Bertherat, INSERM, France
  • Guillaume Assié, INSERM, France
  • Valentina BOEVA, INSERM, France

Short Abstract: Adrenocortical carcinoma (ACC) is an aggressive cancer of the adrenal gland. Recently, a subtype of ACC characterized by a CpG island methylator phenotype (CIMP) has been associated with especially poor prognosis. No drivers of CIMP in ACC has been identified. Firstly, using methylation array data and gene expression datasets of human ACCs from TCGA, we aimed to better understand what mechanisms induce CIMP in ACC. Analysis of RNA-Seq data showed that high CIMP is associated with an increased expression of DNMT1 and DNMT3A. The expression levels of these genes were negatively correlated with the overall survival of patients and positively correlated with the mitotic rates. Secondly, we intended to investigate the processes involved in the higher aggressiveness of CIMP ACCs. Integration of methylation and expression data identified sets of genes that exhibit differential methylation status and expression level between low and high CIMP tumors. Pathway analysis revealed a significant enrichment in genes related to immune system among the genes repressed by methylation. Consequently, we then investigated the abundance of several tumor infiltrating immune cells using the MCP-counter tool and we observed a decrease in the abundance of most of the populations studied in high CIMP. We further showed that genes overexpressed in high CIMP were involved in Wnt signaling pathways. This pathway is involved in proliferative and invasive processes in many cancers but also in immune response. Altogether, our results suggest that targeting DNA methylation or Wnt pathway could be promising therapeutic strategies for patients with high CIMP ACC.

Session B-280: Molecular Tumor Board Zurich - Comprehensive Molecular Cancer Diagnostics for the Clinic
COSI: TransMed
  • Linda Grob, ETH Zurich, Switzerland
  • Anja Irmisch, University of Zurich Hospital, Switzerland
  • Mitchell P Levesque, University of Zurich Hospital, Switzerland
  • Franziska Singer, ETH Zurich, Switzerland
  • Daniel Stekhoven, ETH Zurich, Switzerland
  • Nora Toussaint, ETH Zurich, Switzerland

Short Abstract: High-throughput genomics have changed the way biomedical research is performed. The transition from directed testing of a few specific targets to analyzing comprehensive high-throughput data, offers tremendous possibilities, particularly for the diagnosis of patients with rare diseases, for tumors lacking known targetable mutations, or for patients for whom routine diagnostic and treatment paradigms have failed. Despite the great potential, the use of high-throughput techniques to expand standard diagnostics is not well established in the clinics. Establishing high-throughput molecular diagnostics for clinical use requires specific protocols accounting for stringent quality control, privacy issues, and thorough process documentation. To this end, we, a group of bioinformaticians, statisticians, and cancer biologists, have collaborated to develop a workflow for the molecular profiling of matched tumor and normal samples to improve clinical decision support. In order to gain a more comprehensive understanding of the tumor, we have recently begun to also include transcriptomic data in the analysis. Using publicly available transcriptome data as a reference, this will allow us to assess over-/underexpression of genes of interest and to complement genome measurements with gene regulatory data. In addition to the identification of somatic variants, expression changes, and gene fusions, our workflow links the detected alterations to possible treatment options. The analysis results are summarized in a concise and clearly structured clinical report designed to form the basis for discussions in a clinical molecular tumor board. Here, we showcase the designed workflow on dermatology samples.

Session B-282: Patient Similarity and Outcome Stratification in Intensive Care
COSI: TransMed
  • Annelaura Bach Nielsen, NNF Center for Protein Research, University of Copenhagen, Denmark
  • Mette K. Beck, NNF Center for Protein Research, University of Copenhagen, Denmark
  • Hans-Christian Thorsen-Meyer, Department of Intensive Care, Rigshospitalet, University of Copenhagen, Denmark
  • Anders Perner, Department of Intensive Care, Rigshospitalet, University of Copenhagen, Denmark
  • Pope L. Moseley, Departments of Medicine and Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, United States
  • Søren Brunak, NNF Center for Protein Research, University of Copenhagen, Denmark

Short Abstract: Background: The most critically ill patients in our health care system are treated in the intensive care unit (ICU). Here, monitoring devises gather a wide range of physiological information every second, which together with clinical, administrative and laboratory information makes the ICU the most data intensive ward at any hospital. In addition to the data intensity and disease severity, patients display a highly heterogeneous disease background, altogether placing the ICU as an extremely complex clinical environment. Aim: The complexity, heterogeneity and lack of effective interventions together with the massive amount of data represent an opportunity to use data mining techniques for better patient stratification and care delivery in the ICU. Method: Based on routinely collected information in the electronic health records (EHRs) of 11.163 intensive care patients, fine-grained patient phenotypes will be characterized and patient stratification approaches will be applied to identify subgroups of patients with different treatment outcomes. Initially, we are developing a data-driven framework to handle the many EHR data types as well as the different time-scales the data originates from. The massive amount of information will in turn be integrated into new patient similarity measures and outcome stratification models. Perspectives: Discovery of correlations between benefits or risks from treatment and specific patient subgroups by these models, can potentially be implemented in decision support systems, where knowledge about successes and failures of past patients can be used to evaluate treatment options for a similar newly admitted patient.

Session B-284: The druggable immune system: drug repositioning in immune transcriptome.
COSI: TransMed
  • Kevin Troule, Spanish National Cancer Research Centre (CNIO), Spain
  • Héctor Tejero, Spanish National Cancer Research Centre (CNIO), Spain
  • Javier Perales-Patón, Spanish National Cancer Research Centre (CNIO), Spain
  • Fatima Al-Shahrour, Spanish National Cancer Research Centre (CNIO), Spain
  • Gonzalo Gómez-López, Spanish National Cancer Research Centre (CNIO), Spain

Short Abstract: INTRODUCTION: Traditional drug development is time consuming and expensive. Drug repositioning approaches have been proposed as potential alternative. We have applied SATIE, a novel drug repositioning methodology to predict drugs capable to boost or repress immune response associated to several cell subpopulations. METHODS: We have applied SATIE to study 4,872 immunologic gene expression signatures (1). SATIE predicts single and sequential drug treatments employing data from L1000, CCLE, GDSC2.0 and CTRP projects, comprising more than 5,000 compounds and ~4 million drug-drug interactions. The analyses have been performed at two levels: i) drugs predictions that can either promote or revert a given immune signature (D1) and ii) predictions for drug sensitivity or resistance in immune cell populations (D2). RESULTS: We have obtained at least one significant signature-drug prediction for 23% in D1 analysis and 73% for D2. In total, we obtained 29,026 (D1) and 13,323 (D2) immune gene expression signature-drug interactions corresponding to 1,405 and 613 drugs for D1 and D2 respectively (FDR < 0.01). We have manually validated our predictions checking scientific literature. Thus, our approach predicts that mTOR inhibitors (i.e.rapamycin) enhances CD4+CD25+ Treg cells (2) while PI3K inhibitors (i.e.wortmannin), suppress Treg activity (3). Similarly, SATIE also shows mercaptopurine as a NKs suppressor (4). CONCLUSIONS: We have built an extensive and prioritized catalogue of significant drug predictions for the largest public collection of immune gene expression signatures currently available. Our approach can be extended to predict drug treatments in immunological studies. 1) Godec J, Tan Y, Liberzon A, Tamayo P, Bhattacharya S, Butte A, Mesirov JP, Haining WN, Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation, 2016, Immunity 44(1), 194-206. 2) Scotta C, Esposito M, Fazekasova H, Fanelli G, Edozie FC, Ali N, et al. Differential effects of rapamycin and retinoic acid on expansion, stability and suppressive qualities of human CD4+CD25+FOXP3+ T regulatory cell subpopulations. Haematologica 2013;98:1291–9. 3) Rasha A. Selective Inhibition of Regulatory T Cells by Targeting the PI3K–Akt Pathway. Cancer immunology Research 1-10 4) Yusung S. et al. NK cells are biologic and biochemical targets of 6-mercaptopurine in Crohn's disease patients. Clin Immunol. 2017 Feb;175:82-90. doi: 10.1016/j.clim.2016.12.004.

Session B-286: Computational identification of drug opportunities for the treatment of pancreatic cancer driven by transcriptome dissection
COSI: TransMed
  • Javier Perales-Patón, Spanish National Cancer Research Centre (CNIO), Spain
  • Héctor Tejero, Spanish National Cancer Research Centre (CNIO), Spain
  • Gonzalo Gómez-López, Spanish National Cancer Research Centre (CNIO), Spain
  • Alfonso Valencia, BSC, Spain
  • Fatima Al-Shahrour, Spanish National Cancer Research Centre (CNIO), Spain

Short Abstract: Pancreatic cancer is a leading cause of cancer death in solid malignancies. The lethality stems from a late diagnosis and a lack of therapeutic strategies. Despite great efforts in cancer research, the state-of-the-art treatments in pancreatic cancer have not substantially improved during the past 30 years. Therefore, there is an urgency in finding new therapies against pancreatic cancer. In order to identify new therapeutic options we have applied computational drug repositioning methodologies using gene expression profiles from patients’ samples of pancreatic tumours and normal tissue. The transcriptome of the disease was characterized using a large collection of data sets from public repositories. Then, we have carried out three different strategies: 1) identification of drugs whose signatures might revert the phenotype of malignant cells towards normal cells, 2) identification of drugs which might lead to cancer cell death on specific tumour molecular classes, 3) prioritization of essential druggable genes governing cellular programs involved in cancer. For the first two approaches, we have used a compendium of drug response signatures derived from experiments of drug perturbations in cancer cell lines from public pharmacogenomics resources (L1000 LINCS Collection, GDSC, CCLE, CTRP). For the third approach, we have identified gene co-expression modules and used custom annotations related to cancer to prioritize important genes. The selected drug candidates will be tested in preclinical trials using patient-derived xenograft models.

Session B-288: PanDrugs: Prioritizing drug treatment in cancer patients according to individual genomic data
COSI: TransMed
  • Elena Piñeiro-Yáñez, Spanish National Cancer Research Centre (CNIO), Spain
  • Miguel Reboiro-Jato, Universidad de Vigo, Spain
  • Javier Perales-Patón, Spanish National Cancer Research Centre (CNIO), Spain
  • Kevin Troule, Spanish National Cancer Research Centre (CNIO), Spain
  • Jose Manuel Rodriguez, Spanish National Cancer Research Centre (CNIO), Spain
  • Hector Tejero, Spanish National Cancer Research Centre (CNIO), Spain
  • Takeshi Shimamura, Loyola University, United States
  • Julian Carretero, Universidad de Valencia, Spain
  • Gonzalo Gomez-Lopez, Spanish National Cancer Research Centre (CNIO), Spain
  • Daniel Glez-Peña, Universidad de Vigo, Spain
  • Manuel Hidalgo, Beth Israel Deaconess Medical Center, United States
  • Fatima Al-Shahrour, Spanish National Cancer Research Centre (CNIO), Spain

Short Abstract: The paradigm of personalized medicine is the identification of the appropriate drug for the right patient using genomic data. This personalized treatment approach has a particular relevance in oncology, were anticancer drugs are effective in only a small subset of patients, and where the success of a treatment depends on each individual molecular profile, which a priori can be considered as very heterogeneous. We have developed a computational approach (PanDrugs, http://www.pandrugs.org) based on the analysis and integration of genomic data (mutations, copy number variations or gene expression levels), with functional data (proteins essentiality and tumor vulnerability) and pharmacological data (approval status and drug indications). The relevance and accionability of alterations in a patient is established according to an integrated annotation system that can used as input both a list of genes and a list of variants in a vcf format. The relevance and the actionability of the alterations is summarized in a drug-gene score that allows the prioritization of a therapeutic strategy to guide clinical decisions using straightforward drug-target associations and also a target pathway context. We have tested this approach using publicly available data (ICGC and TCGA cancer genome projects) and patient’s tumor genomic data that are analyzed in our institution as part of CNIO Personalized Medicine initiative.

Session B-290: Transcriptome based classifier for cutaneous T-cell lymphoma and dermatitis
COSI: TransMed
  • Richa Batra, Institute of Computational Biology, Helmholtz Zentrum Munich, Germany
  • Nikola Mueller, Institute of Computational Biology, Helmholtz Zentrum Munich, Germany
  • Fabian Theis, Institute of Computational Biology, Helmholtz Zentrum Munich, Germany
  • Kilian Eyerich, Department of Dermatology and Allergy, Technical University of Munich, Germany
  • Sophie Roenneberg, Department of Dermatology and Allergy, Technical University of Munich, Germany
  • Manja Jargosch, Department of Dermatology and Allergy, Technical University of Munich, Germany

Short Abstract: Mycosis fungoides (MF) is the most common type of cutaneous T cell lymphoma (CTCL). The early morphological presentations of MF are similar to common inflammatory skin disease like Atopic dermatitis (AD). Commonly, clinical diagnostic is performed by evaluation of affected skin by an expert. Thus, the phenotypic similarities with AD renders the diagnosis challenging. Here, we examined the two diseases at the molecular level. We studied the transcriptome of these diseases using RNASeq. Over 500 genes were differentially expressed in MF lesions when compared against non-lesional samples. Of these ~ 100 genes are also identified as differentially regulated in Eczema. Further, we explored the cellular processes highlighting the molecular similarities and differences of the two. To distinguish the two conditions, ~ 30 genes were shortlisted based on delta log fold change and variability. Using these genes, we built a classifier distinguishing the MF and AD at the molecular level.

Session B-292: A novel primer design approach for the amplification of immunoglobulins encoding broadly neutralizing antibodies against human immunodeficiency virus type 1
COSI: TransMed
  • Matthias Döring, Max Planck Institute for Informatics, Germany
  • Christoph Kreer, University Hospital of Cologne, Center for Molecular Medicine Cologne, Germany
  • Nathalie Lehnen, University Hospital of Cologne, Center for Molecular Medicine Cologne, Germany
  • Nico Pfeifer, Department of Computer Science, University of Tübingen, Germany
  • Florian Klein, University Hospital of Cologne, Center for Molecular Medicine Cologne, Germany

Short Abstract: Background The discovery of highly potent broadly neutralizing antibodies (bnAbs) has initiated promising prevention and treatment strategies for individuals infected with human immunodeficiency virus type 1 (HIV-1). Investigating the development of bnAbs is challenging because of the high diversity of the antibody repertoire and a high level of somatic mutations found in bnAbs targeting HIV-1. Therefore, the amplification of cDNA from B cells using multiplex polymerase chain reaction (mPCR) critically depends on the used primers, which should simultaneously target various immunological gene segments with and without somatic hypermutations. In order to systematically design high-quality primers, we implemented a novel primer design framework and used it to generate primers targeting the three human immunoglobulin loci IGH, IGK, and IGL. Materials and Methods We developed a primer design procedure consisting of three stages. In the first stage, an initial set of degenerate primers is constructed using multiple sequence alignments of target region substrings, hierarchical clustering based on sequence similarities, and consensus formation through the resulting phylogenetic tree. Second, the initial primer set is filtered according to constraints on up to thirteen physicochemical properties (e.g. GC clamp, secondary structures, and dimerization) in order to obtain a reduced set of high-quality primer candidates. Third, an instance of the NP-complete set cover problem is solved in order to obtain a minimal set of primers maximizing the coverage of the template sequences. For solving the optimization problem, we implemented an integer linear program formulation as well as a greedy approximation. To design primers for human antibody cDNA, we retrieved the variable regions of the functional genes from IGH (n = 147), IGK (n = 62), and IGL (n = 35) from the IMGT database. We determined the quality of the designed primer sets by comparing their physicochemical properties with those of existing primer sets for IGH (n = 28), IGK (n = 28), and IGL (n = 14), which were retrieved from the IMGT database and the literature. For each locus we tallied the frequencies of fulfilled and failed constraints for both newly designed primer sets and reference sets in order to perform a right-tailed Fisher’s exact test on the resulting contingency table. Results We used integer linear programs to design three primer sets that bind with at most one mismatch to the leaders of IGHV (n = 14), IGKV (n = 7), and IGLV (n = 8). Each of the primer sets achieved in silico coverages of 100% and fulfilled significantly larger numbers of physicochemical constraints than the reference primer sets that we gathered from the literature (Bonferroni-corrected p-values of 3.2e-22, 7.7e-12, and 7.5e-14). Conclusions We have developed openPrimeR, an R package and Shiny application for designing, evaluating, and comparing primer sets for mPCR. Using the new tool, we constructed primer sets that might have the potential of boosting the further investigation of bnAbs against HIV-1 by enabling researchers to identify previously undetectable antibody variants. openPrimeR includes a large number of curated primer sets, which may prove a valuable resource for immunologists that are seeking for suitable primer sets for specific mPCR applications. We are currently in the process of validating the designed primer sets by performing mPCR on germline and hypermutated immunoglobulins. Our tool is very general and could be applied to design primers for many other interesting sequence regions such as genetic regions of HIV or HCV.

Session B-294: The database for mining the brain tumor biomarkers from mass-spectrometry data
COSI: TransMed
  • Anatoly Sorokin, Moscow Institute of Physics and Technology, Russia
  • Vsevolod Shurhkhay, N. N. Burdenko Scientific Research Neurosurgery Institute, Russia
  • Igor Popov, Moscow Institute of Physics and Technology, Russia
  • Evgeny Zhvansky, Moscow Institute of Physics and Technology, Russia
  • Nikita Levin, Moscow Institute of Physics and Technology, Russia
  • Alexander Potapov, N. N. Burdenko Scientific Research Neurosurgery Institute, Russia
  • Evgeny Nikolaev, Skolkovo institute of science and technology, Russia

Short Abstract: The detection of brain tumor by mass-spectrometry profiling requires identification of biomarkers for each type of tumor. The first step in biomarker mining is collection of tumor samples and organisation of the database for automated analysis of biomarker candidates. Tumor tissues were dissected during neurosurgery, and each sample was characterized by histologist. Mass spectra for each sample were measured with continuous flow needle electrospray ionization method [1] on ICR-based high-resolution mass spectrometer LTQ FT Ultra (Thermo Finnigan) with a resolution 56000 at m/z 800. Mass spectra were filtered from noise, and the isotopic structure was removed. Information about tumor type, tumor localization, sample position within the tumor, the medication used prior operation was collected and stored in NoSQL database together with spectra. A set of over 300 samples of different tumor tissues and surrounding tissues from over 250 patients were collected in the study. All of the samples were provided by the Burdenko Neurosurgical Institute. Collected spectra were filtered and stored in column-oriented database MonetDB. User interface allows for visualisation, comparison, transformation (PCA, ICA etc.) of the spectra. The integration with R/Bioconductor makes whole range of statistical analysis and machine learning techniques readily available from API as well as from user interface. To demonstrate the infrastructure performance 9 classifiers were trained on 20 astrocytoma and 9 unmodified brain samples with different settings. It was shown that the most sensitive classifier among tested was PLS-DA, while the most specific one was Random Ferns. The result of the work creates an infrastructure suitable for identification of biomarkers of various types of the brain tumor. The study was supported by Russian Scientific Foundation grant #16-15-10431. References Kononikhin A., Zhvansky E., Shurkhay V., Popov I., Bormotov D., Kostyukevich Y., Karchugina S., Indeykina M., Bugrova A., Starodubtseva N., Potapov A., Nikolaev E. “A novel direct spray-from-tissue ionization method for mass spectrometric analysis of human brain tumors” Analytical and Bioanalytical Chemistry. 407 7797 (2015).

Session B-296: SurvClip: to predict Ovarian Cancer prognosis through survival modules
COSI: TransMed
  • Paolo Martini, University of Padova, Italy
  • Enrica Calura, University of Padova, Italy
  • Chiara Romualdi, University of Padova, Italy

Short Abstract: In the last decades, the field of molecular medicine has rapidly grown. Thanks to large efforts like the TCGA, we have tons of data that need to be mined. The goal is to devise tools for the diagnosis, prognosis or the treatment of diseases. The identification of prognostic markers using survival analysis on gene expression is one of the most useful techniques. Still in some cases this is not quite sufficient because a single gene cannot fully predict patients’ outcome. We devised a new tool called survClip, that allows survival analysis using the topology of biological pathways. The method is based on data reduction techniques combined with the Graphical Models. By integrating the survival prediction power of gene expression with the topology of the pathway, this tool allows identifying small portions of the pathways called survival modules, that can predict patients’ survival. This piece of information together with the classical gene-by-gene analysis enhance the formulation of mechanistic pathogenic hypothesis. This means not only an improved prognosis through a more specific therapy but also possible pathological mechanism that can be used to devise new treatments. We tested the ability of survClip on both real and simulated data with encouraging results. We focused on the results on Ovarian Cancer where we are currently trying to integrate different kind of data like DNA-methylation and somatic mutations to improve the predictions of the survival modules.

Session B-298: DisGeNET, a centralized repository of the genetic basis of human diseases
COSI: TransMed
  • Janet Piñero, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Àlex Bravo Serrano, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Alba Gutierrez, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Emilio Centeno, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Jordi Deu-Pons, Institute for Research in Biomedicine (IRB Barcelona), Spain
  • Javier García-García, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Núria Queralt, Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, USA, United States
  • Ferran Sanz, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain
  • Laura I. Furlong, Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM, Spain

Short Abstract: DisGeNET (http://www.disgenet.org/) is one of the largest available collections of genes and variants associated to human diseases. DisGeNET integrates data from several specialized resources on gene and variant-disease associations with information extracted from automatically mining Medline abstracts. Here, we present DisGeNET 5.0, containing more than 500,000 gene-disease associations, and over 135,000 variant-disease associations, between > 17,000 genes, >80,000 variants and > 20,000 diseases, abnormal phenotypes and traits. This information is annotated using community-based standards, including a variety of disease vocabularies and is enriched and expanded by linking it to other key resources covering the chemical and omics spaces. DisGeNET v. 5.0 also includes several new improvements: a) more gene-disease associations, b) a new curated source: the PsyGeNET database, c) an improvement of the text-mining pipeline to extract gene-disease and variant-disease associations from publications, d) non-coding variants associated to disease, e) disease-phenotype associations, f) gene-phenotype associations from the Human Phenotype Ontology. With its fifth release, DisGeNET is an established and mature resource, which is increasingly used in the investigation of the genetic basis of human diseases and to support drug discovery projects.

Session B-300: An Analysis of Pre-diabates using Artificial Neural Network in the Regional Cohort of Korean Population
COSI: TransMed
  • Dong Gyu Lee, National Research Institute of Health, South Korea
  • Sang Cheol Kim, National Research Institute of Health, South Korea
  • Seong Beom Cho, National Research Institute of Health, South Korea

Short Abstract: Pre-diabetes with impaired fasting glucose (IFG) and impaired glucose tolerance (IGT) are a risk factor for development of type 2 diabetes mellitus (T2DM). Although there are many researches that perform prediction analysis of future T2DM using baseline clinical data, clinical researches investigating future development of T2DM among IFG and IGT are relatively scarce. We establish an artificial neural network (ANN) for prediction of T2DM development among pre-diabetes patients with IFG and IGT in ten-year follow-up data of regional cohort of Korean population. The total number of pre-diabetes is 3,520 patients who include four follow-ups such as 4-year, 6-year, 8-year, and 10-year. Input variables are gender, age, body mass index (BMI), hypertension, family history, hemoglobin A1C, zero-hour glucose, one-hour glucose, and two-hour glucose. To model an ANN, we divided whole dataset into six clusters that ensure high similarity and remove noisy data on density-based clustering method. Then, using data of six clusters, individual model is established by multi-layer feed-forward artificial neural network method. We prove more effective classification performance of ANN models on statistical measures such as accuracy, sensitivity, specificity, positive prediction value, negative prediction value, and area under the curve at every follow-up. On relative importance of input variables, BMI, gender and age are significant of all variables as well as glucose-related risk factors throughout the follow-up period. For the diagnosis of various diseases in the future, cluster-based ANN modeling method provides efficient predictive model that detects complex and non-linear association between multiple risk factors and diseases for medical decisions.

Session B-302: A shiny app to facilitate the analysis of tumor aneuploidy and clonality using SNP array data
COSI: TransMed
  • Lachlan McIntosh, WEHI, Australia
  • Christophe Lefèvre, WEHI, Australia

Short Abstract: During cancer progression, the accumulation of mutations and genome rearrangements of tumor cells may lead to intra-tumor heterogeneity (ITH) with the apparition of cancer subclones. It has been argued that subclones may exhibit different drug sensitivity and that clonal architecture of tumors may be an important biomarker of tumor evolution, drug resistance and treatment efficacy. To allow the characterisation of tumor heterogeneity from SNP microarray data, which usually require visual inspection of the data, we present a Shiny application for the interactive representation and analysis of segmented copy number alterations in single tumor samples. The software is based on grid plots, which are 2D representations of allelic copy number frequency estimates obtained by SNP array data pre-processed by normalisation and segmentation tools such as the AROMA and CBS packages. The software then implements methods for bias correction in grid plots, the identification and clustering of subclonal genomic fragments and the inference of intra-tumor heterogeneity. The application may help in understanding how ITH is related to drug resistance and prognosis, that is survival of patients diagnosed with cancers, and will facilitate the analysis of large datasets annotated with information on treatment and clinical outcome.

Session B-304: Identifying Gene Biomarkers of Breast Cancer Survivability from Time-Series Data
COSI: TransMed
  • Naveen Mangalakumar, University of Windsor, Canada
  • Abed Alkhateeb, University of Windsor, Canada
  • Huy Quang Pham, University of Windsor, Canada
  • Luis Rueda, University of Windsor, Canada
  • Alioune Ngom, School of Computer Science, University of Windsor, Canada

Short Abstract: Studying gene expression through various time intervals of breast cancer (BC) survival may provide insights into the recovery of the disease. In this study, we use a hierarchical clustering method to separate dissimilar groups of gene time-series profiles, which have the furthest distances from the rest of the profiles throughout different time intervals. The isolated outliers can be used as potential biomarkers of BC survivability. We have used the METABRIC/2016 dataset. We partition the time axis (time points) into bins of length six months starting from 0-6 up to 49-55 month intervals and, for each gene, we average its expression level over all patients who appear in a survival bin. Gene expressions throughout those time points are cubic spline interpolated to create a trending profile for each gene. After universally aligning the profiles to minimize the vertical area between each pair of profiles, we cluster them using hierarchical clustering based on minimized vertical distances. An appropriate number of clusters was chosen based on the Profile Alignment and Agglomerative Clustering (PAAC) Index as well as visual observation of the clusters. With 46 clusters, we identified 24 genes which can be considered as potential BC survivability biomarkers. Of these, PSAP, CD81, and EEF1A1 have been previously reported in the literature to play some role in BC studies. PSAP, in particular, has been associated with BC recurrence and potentially having some functional role in therapy resistance. Our study suggests that the combination of proper clustering, distance function and index validation for clusters is a suitable model to identify genes as informative biomarkers of BC survivability.

Session B-306: A data-driven computational approach to evaluate drug synergy and combinations via integrative omics analysis
COSI: TransMed
  • Yvonne Saara Gladbach, Rostock University, Germany
  • Lisa-Madeleine Sklarz, Rostock University, Germany
  • Mathias Ernst, Rostock University, Germany
  • Sina Sender, Rostock University, Germany
  • Hugo Murua Escobar, Rostock University, Germany
  • Christian Junghanß, Rostock University, Germany
  • Georg Fuellen, Rostock University, Germany
  • Mohamed Hamed, Rostock University, Germany

Short Abstract: Background The development of next generation sequencing opens new opportunities in personalized medicine such as predicting the best drug combinations with fewer side effects based on omics profiles of patients. RNA-Sequencing offers unprecedented information about the human transcriptome, but harnessing this information with bioinformatics tools is typically a bottleneck. Also, miRNA Sequencing studies analyzed hundreds of dysregulated miRNAs which were recently found to be deeply involved in disease etiology. Little is known about the genetic associations between mRNAs, miRNAs, other noncoding RNAs in deriving cancer pathogenesis, and their potential suitability as drug targets. Results Here we combined different transcriptomic assays (mRNAs and miRNAs) of individual and combined drug responses in order to understand pathways underlying drug resistance, to identify the best drug combination treatment as well as to understand the mechanism of action of individual and combined therapy in Acute Lymphoblastic Leukemia (ALL) cell lines. An integrative approach was applied to mRNA and miRNA sequencing data, on two B-ALL cell lines treated with three different drug classes. Since it is unfeasible to assay all different drug combinations, machine learning approaches will be applied on the heterogeneous transcriptomic profiles to build a drug combination prediction model that can be later validated by wet lab experiments. Conclusion Our analysis shows that mono application of drugs in both cell lines has a minor influence on proliferation and metabolism of tumor cells. On the other hand, combined treatment led to a remarkable enhancement of the inhibitory effects caused by drug mono applications.

Session B-308: Computational discovery of novel immune checkpoints
COSI: TransMed
  • Itamar Borukhov, Compugen Ltd., Israel
  • Amit Novik, Compugen Ltd., Israel
  • Amir Toporik, Compugen Ltd., Israel
  • Ofer Levy, Compugen Ltd., Israel
  • Sergey Nemzer, Compugen Ltd., Israel
  • Gady Cojocaru, Compugen Ltd., Israel
  • Yossef Kliger, Compugen Ltd., Israel
  • Assaf Wool, Compugen Ltd., Israel
  • Tomer Zekharya, Compugen Ltd., Israel
  • Yair Benita, Compugen Ltd., Israel
  • Zurit Levine, Compugen Ltd., Israel

Short Abstract: Antibody blockade of CTLA4 and PD-1 immune checkpoints emerged as an effective treatment modality for cancer. However, the majority of patients do not achieve sustained long term benefıt, suggesting a need for targeting of additional immune checkpoints. To identify additional B7/CD28 immune checkpoint targets, we developed a unique compendium of computational algorithms that identifıed multiple novel targets including TIGIT in 2008, which was an unknown protein at the time of discovery [Proc Natl Acad Sci USA 106, 17858 (2009)], and PVRIG which we recently disclosed. Since their initial discovery, these targets have been functionally validated and anti-tumor activity was demonstrated with antibodies that target them. In this presentation, we will describe the computational algorithms that led to the discovery of these novel immune checkpoints. These algorithms combine two complementary aspects: (i) endogenous immune checkpoint function prediction and (ii) prediction of immuno-modulatory activity in cancer. Immune checkpoint function was predicted based on gene structure similarity to B7/CD28 family members that is reminiscent of ancient common evolutionary origins. A gene structure alignment tool was developed to identify functional homologs of B7/CD28 genes even in the absence of sequence similarity. Next, the expression profıle of these candidates was modeled and compared to profıles of known immune checkpoints in normal and cancer tissues. We will review the details of TIGIT and PVRIG discovery, which were among the immune checkpoints predicted in this process. Our approach demonstrates the powerful ability of computational biology to translate genomic knowledge into rational and reliable drug target discovery.

Session B-310: A machine learning pipeline for the identification of predictive biomarkers and its application to knee osteoarthritis
COSI: TransMed
  • Nicola Lazzarini, Newcastle University, United Kingdom
  • Jaume Bacardit, Newcastle University, United Kingdom

Short Abstract: Machine learning methods have been proven powerful in the identification of disease driving factors thanks to their ability in recognizing complex patterns within data. We propose an analytics pipeline based on machine learning to generate, validate and interpret simple predictive models. At its core, the pipeline uses an updated version of the RGIFE heuristic [1] for the identification of small and highly predictive panels of biomarkers, using Random Forest as base classifier. The models generated by RGIFE are thoroughly characterised to assess and visualise the importance and the association of each candidate biomarkers with the disease at hand. To assess our pipeline, we have used longitudinal data from a clinical cohort about knee osteoarthritis (OA) data, currently among the higher contributors to global disability. Using five different definitions of knee OA, RGIFE identified small models that predict (AUC > 0.7) the 30-months incidence of the disease. Our analysis revealed the importance of integrating data from multiple sources, as the selected biomarkers contained clinical variables, food and pain questionnaires, biochemical markers and imaging-based information. The latter were found particularly important, hopefully, their use will be re-evaluated in primary care setting, especially when treating subjects at risk for future knee OA development. Finally, our pipeline confirmed the relevance of known biochemical markers for knee OA. Overall, we show the potential of applying machine learning to generate and interpret predictive models for biomedicine, in this instance to tackle prediction of knee OA incidence. [1] Swan et al., BMC Genomics 2015, 16(Suppl 1):S2

Session B-312: DrugTargetInspector: An assistance tool for cancer treatment stratification - reloaded
COSI: TransMed
  • Lara Schneider, Saarland University, Germany
  • Daniel Stöckel, Saarland University, Germany
  • Tim Kehl, Saarland University, Germany
  • Andreas Gerasch, Eberhard Karls University, Tübingen, Germany
  • Michael Kaufmann, Eberhard Karls University, Tübingen, Germany
  • Oliver Kohlbacher, Eberhard Karls University, Tübingen & Max Planck Institute for Developmental Biology, Germany
  • Andreas Keller, Saarland University, Germany
  • Hans-Peter Lenhof, Saarland University, Germany

Short Abstract: Treatment of cancer is difficult, mainly due to its intrinsic heterogeneity that is based on tumors’ unstable genetic structures. Under selective pressure, e.g. the medication with a certain drug or regimen, resistant subpopulations might emerge that then can cause relapse. In order to minimize this risk, optimal (combination) therapies have to be determined on the basis of an in-depth characterization of the tumor’s genetic and phenotypic markup, a process also known as stratified medicine or precision medicine. In this setting, various ‚-omics‘ datasets, e.g. transcriptomic or (epi-)genomic datasets, of the tumor under consideration have to be integratively analyzed and visualized. To this end, we present DrugTargetInspector (DTI), an interactive assistance tool for treatment stratification. DTI analyzes genomic, transcriptomic and proteomic datasets and provides information on deregulated drug targets, enriched biological pathways and deregulated subnetworks, as well as mutations and their potential effects on putative drug targets and genes of interest. DTI’s integrative approach allows physicians to elucidate recommended treatment options, as well as putative alternatives that might be neglected otherwise. DTI can be freely accessed at https://dti.bioinf.uni-sb.de.

Session B-314: Metabolomics analyses in cardiovascular diseases
COSI: TransMed
  • Daniela Börnigen, University Heart Center Hamburg-Eppendorf, Germany
  • Francisco Ojeda, University Heart Center Hamburg-Eppendorf, Germany
  • Christian Müller, University Heart Center Hamburg-Eppendorf, Germany
  • Tina Haase, University Heart Center Hamburg-Eppendorf, Germany
  • Julia Krause, University Heart Center Hamburg-Eppendorf, Germany
  • Manuel Kratzke, Biocrates Life Sciences AG, Austria
  • Judith Wahrheit, BIOCRATES Life Sciences AG, Austria
  • Diana Lindner, University Heart Center Hamburg-Eppendorf, Germany
  • Dirk Westermann, University Heart Center Hamburg-Eppendorf, Germany
  • Renate Schnabel, University Heart Center Hamburg-Eppendorf, Germany
  • Stefan Blankenberg, University Heart Center Hamburg-Eppendorf, Germany
  • Tanja Zeller, University Heart Center Hamburg-Eppendorf, Germany

Short Abstract: Cardiovascular disease (CVD) morbidity poses a major public health burden. It has a complex and heterogenic etiology involving environmental and genetic factors of disease risk. To gain deeper understanding of CVD, there have been efforts to identify risk factors on a molecular level. Besides genomic studies, valuable insights have been acquired from metabolomics studies. Still, only little is known about systematic changes of metabolite levels in cardiovascular phenotypes. Here, we studied the metabolome of patients with CVD (myocardial infarction (MI), atrial fibrillation (AF)) in mouse models and in human case cohorts. All metabolome concentrations were measured in a standardized way, processed, normalized, and analyzed. Thus, we can translate findings from the mouse model into human, which were subsequently analyzed in disease-specific case cohorts. First, we investigated the metabolome in a mouse model of MI (n=112), assessing differences of metabolites between MI mice and control mice at different time points. Performing an ANOVA and Post-Hoc-Test revealed multiple different metabolite profiles between time points and MI/control, among which most metabolites were lipids. Next, we studied the metabolome in human and its association with AF in a cohort of patients with prevalent AF (n=158) and a large-scale population-based case cohort including subjects with incident AF (n=13,228), identifying one metabolite to have a high hazard ratio in developing AF. Our studies revealed changes in the metabolome of patients with CVD. In future analyses we will investigate more CVD phenotypes in additional disease- and population-based cohorts to gain more insight into the metabolome and the potential role of metabolites as biomarkers.

Session B-316: The connectivity of cancer patients in multi-omics similarity networks is clinically relevant
COSI: TransMed
  • Francisco Azuaje, Luxembourg Institute of Health (LIH), Luxembourg
  • Leon-Charles Tranchevent, LIH, Luxembourg
  • Sang-Yoon Kim, LIH, Luxembourg
  • Sabrina Fritah, LIH, Luxembourg
  • Ann-Christin Hau, LIH, Luxembourg
  • Jagath C. Rajapakse, Nanyang Technological University (NTU), Singapore
  • Simone P. Niclou, LIH, Luxembourg

Short Abstract: The analysis of patient similarity networks (PSNs) offers opportunities for monitoring and treating cancers in a more personalized, tumor-specific manner. PSNs encode multi-dimensional similarity relationships among patients based on diverse types of “omics” data. Here, we hypothesize that the position that patient-associated nodes occupy in such networks is clinically-relevant. Based on the adaptation of a technique that detects statistically highly connected network nodes, we show that patients corresponding to core and peripheral nodes in multi-omics networks display differential survival times in breast and brain cancer cohorts. We also demonstrate that standard prognostic indicators for these cancers cannot ascertain the survival groups identified by our method. We illustrate an extension of this method to the integrated analysis of multi-source networks. We further investigated the relevance of our multi-omics network models through topological data analysis (TDA). Using Ayasdi’s TDA platform, we verified that our node connectivity scores are useful for distinguishing between longer and shorter survival patients. Our network-driven analysis strategy provides the basis for novel, clinically-meaningful stratifications of cancer patients.

Session B-318: Identification of genomic predictors for treatment response to cancer immunotherapy
COSI: TransMed
  • Yu-Chao Wang, Institute of Biomedical Informatics, National Yang-Ming University, Taiwan
  • Guan-Yi Lyu, Institute of Biomedical Informatics, National Yang-Ming University, Taiwan
  • Yi-Chen Yeh, Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taiwan

Short Abstract: Immunotherapy is one of the new therapeutic approaches for cancer. The most popular drugs of immunotherapy are the immune checkpoints inhibitors, such as anti-PD-1 and anti-CTLA-4. Although the efficacy of these drugs have been demonstrated, there are still some patients who do not respond to them. Previous studies have shown that mutation load (total number of nonsynonymous mutations) might be a useful predictive biomarker for treatment responses. However, whole-exome sequencing is needed to unravel the mutation load of the tumor, which is not cost and time-effective for standard clinical test. Therefore, focused on lung adenocarcinoma, the objective of this study is to construct a mutation load estimation model of a small set of genes as a genomic predictor for immunotherapy treatment response. With the somatic mutation data downloaded from The Cancer Genome Atlas (TCGA) database, a computational framework was developed. The constructed estimation model consisted of 24 genes, which was applied to two independent data to test the estimation performance. R^2 between predicted and actual mutation load are 0.772 and 0.657, respectively. In addition, the predicted mutation load was also employed to classify the patients as durable clinical benefit or no durable benefit with 85% sensitivity, 93% specificity, and 89% accuracy. Based on the constructed model, we can design a customized panel of targeted sequencing of these selected genes instead of whole-exome sequencing for standard clinical examination.

Session B-320: Proteomic data analysis of pancreatic cancer patient sera revealed how chemotherapy shapes the immunogenicity of developing tumors
COSI: TransMed
  • Laura Follia, Università degli Studi di Torino, Italy
  • Giorgia Mandili, Università degli Studi di Torino, Italy
  • Giulio Ferrero, Università degli Studi di Torino, Italy
  • Francesca Cordero, Università degli Studi di Torino, Italy
  • Marco Beccuti, Università degli Studi di Torino, Italy
  • Hiroyuki Katayama, MD Anderson Cancer Center, United States
  • Hong Wang, MD Anderson Cancer Center, United States
  • Amin A. Momin, MD Anderson Cancer Center, United States
  • Michela Capello, MD Anderson Cancer Center, United States
  • Samir M. Hanash, MD Anderson Cancer Center, United States
  • Francesco Novelli, Department of Molecular Biotechnologies and Health Sciences, University of Turin, Turin 10126, Italy, Italy

Short Abstract: Pancreatic Ductal Adenocarcinoma (PDAC) is an aggressive malignancy and today surgery remains the only potentially curative option. Regretfully, the lack of early diagnostic markers results in a diagnosis at inoperable disease stage. The chemotherapeutic drug gemcitabine, is the most common PDAC treatment besides surgery, but its efficacy remains poor. Chemotherapy (CT) is able to enhance tumor immunogenicity. Although advances in understanding of PDAC-immune system interactions, no immunotherapies are currently approved for treating PDAC. In this study, sera of 37 advanced PDAC patients collected before and after gemcitabine treatment were analyzed using Q-TOF and LTQ-Orbitrap mass spectrometers in order to identify the immune-modulating effects of CT on free-circulating proteins and IgG, IgM, and IgA immune complexes. We developed a new R computational pipeline to preprocess, normalize, and analyze obtained proteomic data. Our pipeline consists in data characterization, batch correction, differential antigen recognition analysis and functional enrichment. Using our approach, we identified the more abundant circulating immune complexes for the classes of immunoglobulin analyzed. As expected, these complexes were enriched in complement and coagulation processes. Differential antigen recognition between pre- and post-CT treatment revealed a set of antigens whose recognition is altered by CT treatment. Among them, we identified proteins involved in humoral immune response and predicted to be localized in extracellular vesicles. Our analysis revealed an extensive alteration of antigen recognition upon CT treatment that will be further investigated to understand the relationship between CT and its capacity to elicit an immune response against tumor cells.

Session B-322: Finding Biomarkers Associated with Prostate Cancer Gleason Stages using Next Generation Sequencing and Machine Learning Techniques
COSI: TransMed
  • Aram Karkar, University of Western Ontario - Schulich School of Medicine, Canada
  • Osama Hamzeh, University of Windsor, Canada
  • Abed Alkhateeb, University of Windsor, Canada
  • Luis Rueda, University of Windsor, Canada

Short Abstract: Prostate cancer is one of the most common cancers in men, up to 21 percent of all types of cancer. The Gleason classification system is one of the known systems used to grade the aggressiveness of cancer progression and can predict the prognosis and clinical course of the disease. Identifying transcripts associated with different stages of prostate cancer could play a significant role in early detection and treatment of the disease. RNA-Seq utilizes next generation sequencing (NGS) technology to provide a much deeper insight of the actual transcripts that show some activity in the prostate cancer cells. This work presents a machine learning model used to analyze 104 RNA-seq samples from prostate cancer patients with different Gleason stages. The data set was preprocessed to assemble 41,971 transcripts for each sample. The prediction model is built based on a hybrid feature selection approach, and a one-versus-the-rest multi-class classification method. The feature selection approach consists of information gain as a filter approach and the minimum redundancy maximum relevance (mRMR) wrapper, both applied in a two-stage process. The feature selection approach reduced the number of transcripts to a handful of meaningful ones. Afterward, the Naïve Bayes classifier was applied, yielding prediction accuracy of almost 100%. Accordingly, 26 transcripts, which were found to be differentially expressed among five different Gleason groups were identified at each step of one-versus-the-rest. Of these, three transcripts classify Gleason stage 336 versus the rest, 15 transcripts classify Gleason stage 347 versus the rest, seven transcripts separate Gleason stage 437 versus the rest, and one transcript was associated with Gleason stage 8+. Many of the identified transcripts have been found to be associated with prostate cancer as well as other types of cancer, such as PIAS3, UBE2V2, and EPB41L1. Further analysis and wet-lab experiments of those transcripts will help to find biomarkers for diagnosis, treatment, and prognosis of the disease.

Session B-324: megSAP - a Medical Genetics Sequence Analysis Pipeline
COSI: TransMed
  • Marc Sturm, Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Germany
  • Christopher Schroeder, Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Germany
  • Tobias Haack, Institute of Medical Genetics and Applied Genomics, University Hospital Tübingen, Germany

Short Abstract: Today, NGS is widely used in clinical diagnostics and translational research to identify disease-causing variants. While several commercial software suites for short-read NGS data analysis are available, they are normally quite costly and not easy to automate in a high-throughput setting. Thus, we have developed megSAP, a free-to-use open-source data analysis pipeline tailored towards research and diagnostics in medical genetics. megSAP offers a complete NGS data analysis pipeline (adapter trimming, mapping, duplicate removal if applicable, indel realignment, variant calling, variant normalization and variant annotation) that is complemented with quality control on several levels (raw reads, mapped reads and variant). It is entirely based on open-source tools that are free for commercial use, which rules out popular tools like GATK and Annovar. It also integrates free-to-use databases like 1000 Genomes, ExAC, Kaviar and ClinVar for annotation of variants. Optionally, commercial databases which are important for diagnostics (OMIM, HGMD and COSMIC) can be used if a license is available. Due to the comprehensive annotation, the variant lists (produced in VCF and TSV format) can be easily filtered to identify disease variants. megSAP is regularly updated (both tool and annotation databases). Each release is validated using the GiaB NA12878 gold-standard dataset, inter-laboratory comparisons and EMQN test schemes. Currently, megSAP is readily usable to analyze single-sample NGS data from whole-genome sequencing, whole-exome sequencing and panel sequencing (both shotgun and amplicon-based data). Several other applications (RNA-Seq, tumor-normal pairs, trios, and molecular barcodes) are already implemented and the corresponding documentation will be added shortly. To facilitate the installation of megSAP and thereby improve usability, we are working on a first containerized release using Docker.

Session B-326: Towards multiple kernel principal component analysis for integrative analysis of tumor samples
COSI: TransMed
  • Nora K. Speicher, Max Planck Institute for Informatics, Germany
  • Nico Pfeifer, Department of Computer Science, University of Tübingen, Germany

Short Abstract: Personalized treatment of patients based on tissue-specific cancer subtypes has strongly increased the efficacy of the chosen therapies. Even though the amount of data measured for cancer patients has increased over the last years, most cancer subtypes are still diagnosed based on individual data sources (e.g. gene expression data). We propose an unsupervised data integration method based on kernel principal component analysis. Principal component analysis is one of the most widely used techniques in data analysis. Unfortunately, the straight-forward multiple-kernel extension of this method leads to the use of only one of the input matrices, which does not fit the goal of gaining information from all data sources. Therefore, we present a scoring function to determine the impact of each input matrix. The approach enables visualizing the integrated data and subsequent clustering for cancer subtype identification. We apply the presented methodology to five different cancer data sets taken from The Cancer Genome Atlas and demonstrate its advantages in terms of results (i.e., survival analysis) and usability (i.e., choice of parameters).

Session B-328: mTCTScan: a comprehensive platform for annotation and prioritization of mutations affecting drug sensitivity in cancers
COSI: TransMed
  • Mulin Li, Tianjin Medical University, China
  • Hongcheng Yao, University of Hong Kong, China
  • Stephanie Jeanette Prinz, Mayo Clinic, United States
  • Dandan Huang, Tianjin Medical University, United States
  • Huanhuan Liu, Tianjin Medical University, China
  • Hang Xu, University of Hong Kong, China
  • Yiming Qin, University of Hong Kong, China
  • Weiyi Xia, University of Hong Kong,, China
  • Panwen Wang, Mayo Clinic, United States
  • Bin Yan, University of Hong Kong, China
  • Nhan Tran, Mayo Clinic, United States
  • Jean-Pierre Kocher, Mayo Clinic, United States
  • Pak Chung Sham, University of Hong Kong, China
  • Junwen Wang, Mayo Clinic, United States

Short Abstract: Cancer therapies have experienced rapid progress in recent years, with a number of novel small-molecule kinase inhibitors and monoclonal antibodies now being widely used to treat various types of human cancers. During cancer treatments, mutations can have important effects on drug sensitivity. However, the relationship between tumor genomic profiles and the effectiveness of cancer drugs remains elusive. We introduce mTCTScan web server (http://jjwanglab.org/mTCTScan) that can systematically analyze mutations affecting cancer drug sensitivity based on individual genomic profiles. The platform was developed by leveraging the latest knowledge on mutation-cancer drug sensitivity associations, and the results from large-scale chemical screening using human cancer cell lines. Using an evidence-based scoring scheme based on current integrative evidence, mTCTScan is able to prioritize mutations according to their associations with cancer drugs and preclinical compounds. It can also show related drugs/compounds with sensitivity classification by considering the context of the entire genomic profile. In addition, mTCTScan incorporates comprehensive filtering functions and cancer-related annotations to better interpret mutation effects and their association with cancer drugs. This platform will greatly benefit both researchers and clinicians for interrogating mechanisms of mutation-dependent drug response, which will have a significant impact on cancer precision medicine.

Session B-330: A computational framework for systems pathology of prostate cancer
COSI: TransMed
  • Malamati Koletou, USZ / IBM / ETH, Switzerland
  • Maria Gabrani, IBM, Switzerland
  • Tiannan Guo, ETH, Switzerland
  • Qing Zhong, USZ, Switzerland
  • Ulrich Wagner, USZ, Switzerland
  • Rudolf Aebersold, ETH, Switzerland
  • Peter Wild, USZ, Switzerland
  • Maria Rodriguez Martinez, IBM, Switzerland

Short Abstract: Introduction. Personalized medicine relies heavily on the patients’ data analysis including but not limited to the genomic datasets that are becoming increasingly more available nowadays. Taking this into consideration, prostate cancer can be used as a case study to develop an interdisciplinary project that combines hypothesis-driven diagnostic strategies with data-driven estimations by a novel computational framework. Prostate cancer is the second most frequent cancer type in men, but it is not always possible to make an accurate prognosis of the patient’s survival. This is mainly due to the lack of biomarkers that could be prognostic of a more aggressive phenotype. In this project, we aim to search for prostate cancer specific genomic alterations and study how they could improve the stratification of prostate cancer in two classes, significant and insignificant disease. Methods. The aim of this project is to develop a computational framework for the identification of informative genetic variants in prostate cancer which addresses the urgent clinical need to stratify primary prostate cancer tissues into two classes: aggressive and insignificant disease. The computational framework will help to classify patients into discriminative groups and generate the associated genotypic profiles. The new framework will employ machine learning techniques, mainly focusing on pattern detection. A very promising method to be applied is dictionary learning with sparse coding, an efficient tool that has been used in image processing. Briefly, what it can accomplish is to identify genomic alterations that make substantial contributions to variation of complex traits that is not based on exhaustive search and therefore is computationally efficient and can be applied to smaller patient cohorts. Additionally, the framework will be used to integrate different types of genomic alterations and infer patterns across different -omics datasets Results. Here we plan to show how the method can be applied in the TCGA Prostate Adenocarcinoma datasets and present an overview of potential applications for the stratification of patients. In the future, we plan to validate our findings using an independent cohort, the Zurich Prostate Cancer Outcome Cohort study. Conclusions. Our computational framework offers a novel perspective into analysing genomic data from relatively very small number of samples and the integration of multiple omics datasets. This is an interdisciplinary PhD project funded from SystemsX.ch and it aims to impact the capacity building in Systems Biology not only in the methodology and algorithmic level but also on the clinical level supporting precise, predictive and personalized medicine (3P-medicine).

Session B-332: Cell reprogramming for regenerative medicine
COSI: TransMed
  • Owen Rackham, Duke-NUS, Singapore
  • Jose Polo, Monash University, Australia
  • Julian Gough, MRC Laboratory of Molecular Biology, United Kingdom

Short Abstract: Transdifferentiation, the process of converting from one cell type to another without going through a pluripotent state, has great promise for regenerative medicine. The identification of key transcription factors for reprogramming is currently limited by the cost of exhaustive experimental testing of plausible sets of factors, an approach that is inefficient and unscalable. Here we present a predictive system (Mogrify) that combines gene expression data with regulatory network information to predict the reprogramming factors necessary to induce cell conversion. We have applied Mogrify to 173 human cell types and 134 tissues, defining an atlas of cellular reprogramming. Mogrify correctly predicts the transcription factors used in known transdifferentiations. Furthermore, we validated two new transdifferentiations predicted by Mogrify. We provide a practical and efficient mechanism for systematically implementing novel cell conversions, facilitating the generalization of reprogramming of human cells. Predictions are made available to help rapidly further the field of cell conversion. http://mogrify.net

Session B-334: Integrated genomic investigation of spatial and temporal heterogeneity in high grade serous ovarian cancer
COSI: TransMed
  • Luca Beltrame, Mario Negri Research Institute, Italy
  • Ilaria Craparotta, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy
  • Lara Paracchini, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy
  • Sara Ballabio, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy
  • Laura Mannarino, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy
  • Mariagrazia Pezzotta, Dept. of Anatomo-pathology, Manzoni Hospital, Lecco, Italy, Italy
  • Federica Villa, Dept. of Oncology, Manzoni Hospital, Lecco, Italy, Italy
  • Chiara Romualdi, Department of Biology, University of Padova, Italy
  • Maurizio D'Incalci, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy
  • Sergio Marchini, IRCCS Istituto di Ricerche Farmacologiche "Mario Negri", Italy

Short Abstract: Introduction: High Grade Serous Epithelial Ovarian Cancer (HGS-EOC) is characterized by a massive spread of multiple synchronous implantation sites, which exhibit a high level of both inter- and intra-patient mutational heterogeneity. An additional level of heterogeneity occurs after treatment, when the tumor relapses after the selective pressure of chemotherapy. The therapeutic strategies are usually based on a biopsy taken at diagnosis, which may not be representative of the disease. In order to identify potential biomarkers across both spatial and temporal heterogeneity, we analyzed whole-exome sequencing and array CGH data in two cohorts of patients from whom multiple biopsies were available. Materials and methods: Exome sequencing data were processed through a readily available and tested pipeline as previously described (Beltrame et al., 2015). Recurrent regions of copy number alterations in spatial and temporal biopsies were identified with GISTIC. Common aberrant regions between spatial and temporal biopsies were then verified in an independent dataset (TCGA). Results: Through our approach, we showed that HGS-EOC is characterized by a high level of spatial heterogeneity at the mutational level, present in both biopsies of the patient and within patients. At the copy number alteration level, spatial biopsies show a high rate of recurrent aberrations in specific genomic regions. One amplified region, located in chromosome 8 (8q24.3) was identified in both temporal and spatial biopsies and confirmed amplified in TCGA. Conclusions: Our results suggest that our bioinformatics approach is suitable for investigation of biomarkers that overcome spatial and temporal heterogeneity.

Session B-336: Computational deconvolution of mixed signals in tumor microenvironment using blind source deconvolution approach
COSI: TransMed
  • Urszula Czerwinska, Institut Curie, France
  • Emmanuel Barillot, Institut CURIE, France
  • Vassili Soumelis, Institut Curie, France
  • Andrei Zinovyev, Institut Curie, France

Short Abstract: Abstract Biological systems are characterized by high complexity at multiple levels of their organization. This a case of tumor microenvironment which includes distinct cell types that critically impact tumor development and response to treatment. Transcriptomic profile from the microenvironment represents a complex mixture that can be described by linear model: AX= B. Where B is the transcriptome matrix of one biological sample, X are mixing proportions and A is the matrix of expression of genes in each cell type. In our work, we propose to apply an unsupervised method that will decompose mixture into independent sources based uniquely on data structure and without any prior knowledge. We are applying Independent Component Analysis (ICA) (Hyv, Karhunen, & Oja, 2001) in order to solve blind source separation problem. Results Applied to Breast Carcinoma Dataset, our method showed ability to identify sources linked to three groups of immune cell types: (1) T-cells, (2) DC/Macrophages, (3) Monocytes/ Macrophages/ Eosynophiles/Neutrophiles. Applied to TCGA datasets, we analyzed the reproducibility of our method as well as possible solutions in order to build a tool allowing robust estimation of immune cell type abundance in tumor transcriptomes. In addition, we test our tool on a synthetic dataset, based on single-cell expression data (Tirosh et al, 2016) and FACs (cells sorting) data of tumor samples, providing unique a benchmark for this and previously developed methods. Conclusions Using unsupervised machine learning algorithm, we estimate presence of different cell types in transcriptome of tumor samples. We are to characterize the immune infiltration degree from the cancer transcriptomic datasets that remain the most abundant source of molecular profiling in cancer biology. In the case of success, the results will be used in the diagnosis and cancer therapy, especially immunotherapies. It can be also used as a basis of a mechanistic mathematical model and for a predictive model of patients’ survival and therapy choice.

Session B-338: Modelling cellular evolution during lymph node colonization in melanoma patients
COSI: TransMed
  • Martin Hoffmann, Fraunhofer ITEM, Division of Personalized Tumor Therapy, Regensburg, Germany, Germany
  • Sebastian Scheitler, Chair of Experimental Medicine and Therapy Research, University of Regensburg, Germany; Division Personalized Tumor Therapy, Fraunhofer ITEM, Regensburg, Germany, Germany
  • Isabelle Hodak, Chair of Experimental Medicine and Therapy Research, University of Regensburg, Germany, Germany
  • Anja Ulmer, Department of Dermatology, University of Tübingen, Germany, Germany
  • Christoph A. Klein, Chair of Experimental Medicine and Therapy Research, University of Regensburg, Germany, Germany

Short Abstract: The dynamics of early cancer spread and evolution are key to the development of adjuvant therapies. We therefore investigated the similarity between primary tumor samples and disseminated cancer cells (DCCs) and spreading of DCCs to the sentinel lymph node. We found that lymphatic dissemination occurs shortly after dermal invasion of the primary lesion. At lymph node arrival DCCs lack typical driver changes, including BRAF mutation and gained or lost regions harboring genes like MET or CDKNA2, which are acquired within the lymph node during colony formation. We identify the origin of different genomic alterations and propose a model of metastatic melanoma spread, evolution and colonization that will inform direct monitoring of adjuvant therapy targets.

Session B-340: Understanding human knockouts
COSI: TransMed
  • Suganthi Balasubramanian, Regeneron Genetics Center, Regeneron Pharmaceuticals, United States
  • Yao Fu, Bina Technologies, Part of Roche Sequencing, United States
  • Mayur Pawashe, Yale University, United States
  • Patrick McGillivray, Yale University, United States
  • Mike Jin, Yale University, United States
  • Jeremy Liu, Yale University, United States
  • Konrad  Karczewski, Massachusetts General Hospital, United States
  • Daniel MacArthur, Massachusetts General Hospital, United States
  • Mark Gerstein, Yale University, United States

Short Abstract: One of the most notable findings from personal genomics studies is that all individuals harbor loss-of-function (LoF) variants in some of their genes. The discovery of protective LoF variants associated with beneficial traits and their potential to enable identification of valuable drug targets has fueled an increased interest in putative LoF (pLoF) variants. For example, nonsense variants in PCSK9 are associated with low LDL levels which prompted the active pursuit of the inhibition of PCSK9 as a potential therapeutic for hypercholesterolemia and led to the development of two drugs which have been recently approved by the FDA. Other examples include nonsense and splice mutations in APOC3 associated with low levels of circulating triglycerides, a nonsense mutation in SLC30A8 resulting in about 65% reduction in risk for Type II diabetes, two splice variants in the Finnish population in LPA that protect against coronary heart disease, and two LoF-producing splice variants and a nonsense mutation in HAL associated with increased blood histidine levels and reduced risk of coronary heart disease. Given the translational potential of knockouts, there are several large-scale efforts to identify human knockouts. However, even though premature stop variants often lead to loss of function and are thus deleterious, predicting the functional impact of premature stop codons is not straightforward. Aberrant transcripts containing premature stop codons are typically removed by nonsense-mediated decay (NMD), an mRNA surveillance mechanism. However, large-scale expression analysis demonstrated that predicted NMD events due to premature stop variants are unsupported by RNA-Seq analyses. Truncating mutations can also give rise to functional protein products in some cases. Further, when a variant affects only some isoforms of a gene, it is difficult to infer its impact on gene function without the knowledge of the isoforms that are expressed in the tissue of interest and how their levels of expression affect gene function. Finally, loss-of-function of a gene might not have any impact on the fitness of the organism. Here, we present ALoFT (Annotation of Loss-of-Function Transcripts, aloft.gersteinlab.org), a method to annotate and predict the disease-causing potential of LoF variants. We build a random-forest classifier to distinguish between various LoF variants using a training dataset consisting of mutations that cause disease through a dominant mode of inheritance, through a recessive mode of inheritance and those that are neutral. Using data from Mendelian disease-gene discovery projects, we show that ALoFT can distinguish between LoF variants deleterious as heterozygotes and those causing disease only in the homozygous state. When applied to de novo putative LoF variants in autism-affected families, ALoFT distinguishes between deleterious variants in patients and benign variants in unaffected siblings. Finally, analysis of somatic variants in > 6,500 cancer exomes shows that putative LoF variants predicted to be deleterious by ALoFT are enriched in known driver gene. We extend this analysis to the large-scale published knockout datasets and provide a way to start prioritizing this class of important variants for understanding their genotypic impact on phenotype.

Session B-342: Integrative Bioinformatics Analysis of Proteins Associated with the Cardiorenal Syndrome in Type 2 Diabetes Mellitus
COSI: TransMed
  • Suchada Sukontanon, Burapha University, Thailand
  • Nuttinee Teerakulkittipong, Burapha University, Thailand

Short Abstract: Cardiorenal syndrome (CRS) is defined as a complex pathophysiological disorder of the heart and the kidneys in which acute or chronic dysfunction in one organ may induce acute or chronic dysfunction in the other organs. Various studies showed that the risk for cardiovascular events either increase three-folds or seventeen-folds for renal dysfunction in patients with type 2 diabetes mellitus (T2D). At present, it has been found the protein – protein interaction network involved in renal function decline with T2D or cardiorenal syndrome at the molecular level. However, the cardiorenal syndrome with T2D has not been studied. In particular, the interactions of proteins have been found to be associated among the combination of kidney disease and cardiovascular disease or type 2 diabetes mellitus respectively at the molecular level. This research aims to establish the target protein interaction network of renal function decline in type 2 diabetes mellitus and the cardiorenal syndrome from STITCH, STRING, PSICQUIC database based on Cytoscape program to simplify and analyze Network Behavior, Biological Process (BiNGO) and pathways with the PANTHER (Protein Analysis through Evolutionary Relationships). We found three proteins involved in all three combination diseases as follows Nuclear factor NF-kappa-B p105 subunit (NFκB1), Interleukin-1 beta (IL1B) and matrix metalloproteinase 2 (MMP2). NFκB1 and IL1B associated with Inflammation mediated by chemokine and cytokine signaling pathway that is an important cardiorenal connector and a hallmark of kidney and heart diseases. MMP2 protein is related to diverse functions such as remodeling of the vasculature, angiogenesis, tissue repair, tumor invasion, inflammation, atherosclerotic plaque rupture and AGE/RAGE signaling pathway in diabetic complications. AGE/RAGE signaling has been implicated in oxidative stress associated with diabetes-mediated vascular calcification through activation of Nox-1, TGF-β mediated fibrosis, nuclear factor κB (NFκB), and ERK1/2 pathways and decreased expression of SOD-1. Biological processes are cellular response to interleukin-1 and I-kappaB kinase/NF-kappaB signaling. Therefore, all these three biomarkers are associated with inflammation and leading to new functional targeted therapeutic of cardiorenal syndrome in type 2 diabetes patients.

Session B-344: A workflow for the integrative transcriptomic description of molecular pathology and the suggestion of inhibitory compounds exemplified by Parkinson’s disease
COSI: TransMed
  • Mohamed Hamed, Rostock university medical center, Germany
  • Yvonne Gladbach, Rostock university medical center, Germany
  • Sarah Fischer, Rostock university medical center, Germany
  • Mathias Ernst, Rostock university medical center, Germany
  • Stephan Struckmann, Rostock university medical center, Germany
  • Steffen Moeller, Rostock university medical center, Germany
  • Georg Fuellen, Rostock university medical center, Germany

Short Abstract: Motivation The volume of molecular observations on diseases in public databases is increasing at accelerating rates. A bottleneck is their computational integration into a coherent description, from which researchers may derive new well-founded hypotheses. An integration from multiple submissions is essential for rare diseases (because of low numbers) and common diseases alike (to cover genetic complexity and characterize environmental influences). More recently arose the need to also integrate data from different technologies (genetics, coding and regulatory RNA, proteomics). Many different bioinformatics service groups address these challenges around the globe. The Common Workflow Language (CWL) allows describing a formal executable backbone for the analysis. The CWL further facilitates the community’s exchange of optional features to adjust for recurring peculiarities. The analysis of data describing Parkinson’s disease (PD) is challenging and the etiology of PD remains partly uncharacterized. A model of its molecular pathophysiology may identify biomarkers for an early diagnosis and prognosis and it may support decision making for a more individualized therapy. Results We present a comprehensive workflow to integrate gene expression profiles, miRNA signatures, and publicly available regulatory databases to specify a partial model of the molecular pathophysiology of PD. Six genetic driver elements (2 genes and 4 miRNAs) and several functional network modules that could trigger PD were identified. Functional modules were assessed for their statistical significance and cellular functional homogeneity. We also predicted compounds to inhibit the observed transcriptional disturbances.

Session B-346: Classification of Paediatric Inflammatory Bowel Disease using Machine Learning
COSI: TransMed
  • Enrico Mossotto, University of Southampton, United Kingdom
  • James Ashton, University of Southampton, United Kingdom
  • Tracy Coelho, University of Southampton, United Kingdom
  • Mark Beattie, Southampton Children’s Hospital, United Kingdom
  • Benjamin MacArthur, University of Southampton, United Kingdom
  • Sarah Ennis, University of Southampton, United Kingdom

Short Abstract: Paediatric inflammatory bowel disease (PIBD), comprising Crohn’s disease (CD), ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU) is a complex and multifactorial condition leading to inflammation within the gastrointestinal (GI) tract, the incidence of which is increasing. Correct diagnosis and classification of disease has an impact on treatment, prognostication and outcome. It has been established that information derived from histology reports compared to endoscopy findings (that are used in the Paris classification of IBD subtypes) can give rise to discordant diagnostic subgroups. This study utilises machine learning (ML) to classify and compare disease using both endoscopic and histological data. Endoscopy and histology data for 287 children diagnosed with IBD were systematically recorded. The cohort comprises children diagnosed aged <18 years with Crohn’s disease (n=178), ulcerative colitis (n=80) and inflammatory bowel disease unclassified (n=29). Data from UC or CD affected patients were used to develop (n=106), then test & train (n=104) and validate (n=48) a ML model to classify disease subtype. Principal component analysis (PCA), multidimensional scaling (MDS) and hierarchical clustering were used as unsupervised algorithms to observe data structure. A supervised model for the CD/UC classification was build using a support vector machine (SVM) with linear kernel. The penalty parameter for the SVM as well as the subset of most important features were selected to maximise the classification accuracy over a development-dedicated data subset. Recursive feature elimination coupled with 5-fold cross-validation was used to rank and select maximally informative endoscopic and histological features. PCA and MDS revealed overlap of CD/UC with broad clustering but no clear subtype delineation reflecting clinical complexity in distinguishing IBD subtypes. Hierarchical clustering identified four novel patient subgroups characterised by differing colonic involvement. Three supervised classification models were developed utilising i) endoscopic only, ii) histological only and iii) combined endoscopic and histological data to yield classification accuracy of 71.0%, 76.9% and 82.7% respectively. The optimal combined model was tested on a statistically independent cohort of 48 additional PIBD patients from the same clinic and accurately classified 83.3% of patients. Twenty-nine IBDU patients were analysed using the combined model, seventeen of which were assigned a subtype diagnosis with a posterior probability >80%. This study employs mathematical modelling of endoscopic/histological data to potentially assist diagnostic accuracy. Additionally, the classification accuracy of the model of endoscopic disease alone is less than a combined model. These findings add further weight to the need to consider revision of contemporary Paris classification based on endoscopic findings only and provides a blueprint for machine learning use with clinical data.

Session B-348: Systems human genome and metagenome analysis on circulating proteins in a population cohort
COSI: TransMed
  • Daria Zhernakova, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Alexander Kurilshikov, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Biljana Atanasovska, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Trang Le, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Marc Jan Bonder, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Serena Sanna, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Rudolf Boer, University of Groningen, Department of Cardiology, Groningen, Netherlands, Netherlands
  • Folkert Kuipers, University of Groningen, Department of Pediatrics, Groningen, Netherlands, Netherlands
  • Lude Franke, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Cisca Wijmenga, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Alexandra Zhernakova, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands
  • Jingyuan Fu, University of Groningen, University Medical Center Groningen, Department of Genetics, Netherlands

Short Abstract: Both genetics and microbiome are known to be crucial factors in determining individual’s susceptibility risk for complex diseases, including immune diseases, cancers and cardiovascular diseases (CVD). However, the contribution of these factors to inter-individual variation of intermediate molecular phenotypes, for instance of circulating proteins, in the general population is largely unknown. These circulating proteins are often measured as biomarkers, holding a promise for early disease diagnosis and monitoring therapeutics. Understanding the impact of genetics and microbiome on circulating proteins can provide a better understanding of the underlying disease etiology. We have now measured serum levels of 92 CVD-relevant proteins in 1,294 individuals from a general Dutch population cohort (LifeLines-DEEP) for whom we also have data on the human genome and “the second human genome”: the metagenome. For each protein, we performed genome-wide analysis with 8 million SNPs and metagenome-wide association analysis with 340 bacterial species and 702 functional pathways determined by metagenomics sequencing. At FDR 0.05, we identified 72 proteins that were genetically controlled and 51 proteins associated with the gut microbiome. Serum levels of 37 proteins were affected by both genetics and microbiome. C-C motif chemokine 15 (CCL15), for example, is a liver-derived chemokine involved in immunoregulatory and inflammatory processes. In addition to its strong genetic regulation (association to rs854626 at P=2.5x10-136), an elevated serum level of CCL15 was also associated to higher bacterial capacity for fatty acid biosynthesis (P=5.3x10-4). We further confirmed the causal effect of fatty acids on CCL15 production by stimulating hepatocytes (HepG2) with free fatty acids, observing a 40% increase in CCL15 expression 24 hours after stimulation. Fourteen proteins were more affected by the gut microbiome than by genetics. For instance, adipose-derived cytokine PAI-1 is strongly associated with obesity and its elevation is also a risk factor for atherosclerosis. While we did not detect significant associations with genetics, serum levels of PAI-1 were not only associated to a lower richness of bacterial species (P=9.7x10-4) but also to 138 bacterial function pathways, in particular to bacterial energy metabolism. By using 92 CVD-related circulating proteins we demonstrate that serum proteomics are affected by both genetics and gut microbiome. Our data suggests that both the human genome and metagenome should be taken into account when using circulating proteins as potential biomarkers for disease monitoring or as therapeutic targets for personalized medicine.

Session B-350: eDiVA: Exome sequencing pipeline to identify clinically relevant mutations in disease studies
COSI: TransMed
  • Mattia Bosio, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
, Spain
  • Oliver Drechsel, Berlin institute of health, Germany
  • Rubayte Rahman, NIK Netherlands cancer center, Spain
  • Stephan Ossowski, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
, Spain

Short Abstract: Next generation sequencing boosted causal variant identification for patients affected by rare Mendelian diseases, helping to link genetic variation with phenotype. Multiple tools exist to call, annotate, and prioritize genetic variants, but with currently 25-40% success rate identification of causal mutations remains an issue. We developed eDiVA, a method to identify gene mutations causing rare Mendelian diseases in small families. eDiVA uses exome or whole genome sequencing (WES, WGS) data of parent-child trios or families, and focuses on optimized functional annotation, pathogenicity estimation and causal variant prioritization. eDiVA calls variants using best practices and custom in-house filtering to reduce false positives and increase the reliability of the results. Thereafter, variants are annotated with functional, OMICs and disease specific information from sources like CADD, SFIT, PhyloP, Condel, M-CAP, phyloP, dbSNP, 1000GP, EVS, ExAC, Clinvar, or OMIM to provide contextual information for each variant. eDiVA integrates annotation into a machine-learning variant pathogenicity classifier (eDiVA-score) trained with random forest on a training set composed of more than 150’000 variants from Clinvar and ExAC. eDiVA-score is then integrated with domain-knowledge to correctly prioritize variants following specific disease models, (autosomal dominant, recessive, de novo, X-linked, and compound heterozygous). A short-list of candidate causal variants is reported in excel format as output. We compared eDiVA-score with M-Cap, CADD, Eigen-score and Revel in thousands of semi-synthetic cases, generated by adding pathogenic variants from ClinVar to WES data from a trio sequenced by 1000GP. Furthermore, we benchmarked eDiVAs prioritization algorithm against PhenoDB and Phen-Gen on thousands of semi-synthetic cases not previously used for training or testing. eDiVA constantly outperformed existing methods in terms of ROC, precision and recall of pathogenic variants allowing to identify more pathogenic variants than other competitors and presenting them at the very top of the reported list of candidates. We also validated eDiVA studying 35 families affected by rare congenital diseases such as spinocerebellar ataxia, myasthenia, and immunodeficiency finding. We identified candidate causal variants for more than 50% of cases, while providing a prioritized list of less than 30 candidates in 90% of the cases. In summary, eDiVA, available at www.ediva.crg.eu, identifies causal variants for rare genetic diseases with high sensitivity and specificity using WES or WGS data.

Session B-352: Prediction of anti-cancer drug activity by kernelized multi-task learning
COSI: TransMed
  • Mehmet Tan, TOBB University of Economics and Technology, Turkey

Short Abstract: Chemotherapy or targeted therapy are two of the main treatment options for many types of cancer. Due to the heterogeneous nature of cancer, the success of the therapeutic agents differ among patients. In this sense, determination of chemotherapeutic response of the malign cells is essential for establishing a personalized treatment protocol and designing new drugs. With the recent technological advances in producing large amounts of pharmacogenomic data, \textit{in silico} methods have become important tools to achieve this aim. Data produced by using cancer cell lines provide a test bed for machine learning algorithms that try to predict the response of cancer cells to different agents. The potential use of these algorithms in drug discovery/repositioning and personalized treatments motivated us in this study to work on predicting drug response by exploiting the recent pharmacogenomic databases.We aim to improve the prediction of drug response of cancer cell lines. We propose to use a method that employs multi-task learning to improve learning by transfer, and kernels to extract non-linear relationships to predict drug response. We report the methodological comparison results as well as the performance of the proposed algorithm on each single drug.The results show that the proposed method is a strong candidate to predict drug response of cancer cell lines in silico, for pre-clinical studies.

Session B-354: The Utility of the Connectivity Map for HIV Cure Strategies
COSI: TransMed
  • Jeongmin Woo, University of Southampton, United Kingdom
  • Yawwani Gunawardana, University of Southampton, United Kingdom
  • Adam Spivak, University of Utah, United States
  • Laura Martins, University of Utah, United States
  • Jeff Murry, Gilead Sciences, United States
  • Richard Barnard, Merck Research Laboratories, United States
  • Bonnie Howell, Merck Research Laboratories, United States
  • Nadejda Beliakova-Bethell, University of California San Diego, United States
  • Cory White, University of Southampton, United Kingdom
  • Celsa Spina, University of California San Diego, United States
  • Douglas Richman, University of California San Diego, United States
  • Nancie Archin, University of North Carolina, United States
  • David Margolis, University of North Carolina, United States
  • Daria Hazuda, Merck Research Laboratories, United States
  • Romas Geleziunas, Gilead Sciences, United States
  • Alberto Bosque, The George Washington University, United States
  • Vicente Planelles, University of Utah, United States
  • Christopher Woelk, University of Southampton, United Kingdom

Short Abstract: Background: Latency reversing agents (LRAs) are used in shock and kill approaches to facilitate an HIV cure. The primary objective of this study was to determine the utility of in silico approaches for annotating known LRAs for mechanism of action (MOA) and for identifying novel LRAs. Methods: An LRA Annotation Pipeline was constructed using the Connectivity Map (CMAP) and the Chemical Entities of Biological Interest (ChEBI) database and an LRA Identification Pipeline was developed to reposition drugs in the CMAP as LRAs. The LRA Annotation Pipeline identified the top 10 drug compounds in the Connectivity Map (CMAP) with similar transcriptomic signatures to the query LRA using gene set enrichment analysis. Then the drug ontology terms from the ChEBI database that were significantly over-represented (permutation and hypergeometric testing) for these 10 drug compounds were assigned to the query LRA to delineate MOA. The LRA Identification Pipeline utilized transcriptomic signatures dysregulated during latency from four primary CD4 T cell models of HIV latency to identify drug compounds in the CMAP which could reverse these signatures and thus act as putative LRAs. Results: As a test case for the LRA Annotation Pipeline, Vorinostat was correctly identified as a histone deacetylase inhibitor using only its transcriptomic signature. Furthermore, GS-46, an LRA with unknown MOA, was predicted to share mechanisms with proteasome inhibitors and contain chemical structures relating to tetrazoles and catechins. The LRA Identification Pipeline was used to predict a large number of novel LRAs. MG-132, a proteasome inhibitor, which was the top ranked drug compound for two HIV latency models was previously confirmed as an LRA. Conclusions: The CMAP represents a valuable resource in the search for an HIV cure and will have great utility for identifying novel LRAs and characterizing LRAs with unknown MOA in the future.

Session B-356: Let-7a, MiR-34a and MiR-199a/b: Key Player Micro-RNAs Revealing Novel Insights into Immune System Modulation and Cancer Hallmarks for Hepatocellular Carcinoma
COSI: TransMed
  • Bangly Soliman, Department of Biochemistry, Faculty of Science, Ain Shams University, Cairo, Egypt, Egypt
  • Mahmoud Elhefnawi, Informatics and Systems Department, Division of Engineering Research, National Research Centre, Cairo & Science and Technology Research Centre, American University in Cairo, New Cairo, Egypt, Egypt

Short Abstract: Background Interest in let-7a, miR-34a, and miR-199 a/b is sparking as more insights into their roles as master regulators of cellular processes emerge. In particular, these three micro-RNAs possess tumor suppressor activity that makes them potential new anti-cancer agents for different tumors including hepatocellular carcinoma (HCC). Bioinformatics revealed the functionality of these candidate miRNAs through target prediction and functional analysis as we present here. Methods In-silico analysis using recent and innovative servers (miRror Suite, DAVID, miRGator v3.0, GeneTrail and others) was performed. This is in order to demonstrate the combinatorial and the individual target genes of let-7a, miR-34a and miR-199 a/b and further explore the role of these targets in HCC progression. The study focused on the miRNA targets from liver cancer specific expression datasets. Results Eighty seven common target mRNAs (p-value ≤0.05) were predicted to be regulated by the three miRNA set using miRror 2.0 target prediction tool. In addition, the functional enrichment analysis of these miRNA targets by DAVID functional annotation (Kyoto Encyclopedia of Genes and Genomes [KEGG]- BioCarta- Gene Ontology) and REACTOME revealed two major immune related pathways, eight HCC hallmark linked pathways and two pathways that mediate interconnected processes between immune system and HCC hallmarks. Moreover, protein-protein interaction network for the predicted common targets of the miRNAs was obtained by using STRING protein interaction database. The individual analysis of target genes and pathways for the three miRNAs of interest using miRGatorV3.0 server and GeneTrail revealed some novel predicted target oncogenes such as DDX19B, YAP1, SOX4 (which we validated experimentally) and a variety of new regulated pathways of immune system and hepatocarcinogenesis such as insulin signaling pathway, toll-like receptor signaling pathway, natural killer cell mediated cytotoxicity and adipocytokine signaling pathway etc. Conclusions New interesting links and interactions of let-7a, miR-34a, and miR-199 a/b with the immune system pathways, components and some major HCC hallmarks, like cellular response to stress, focal adhesion and ECM receptor interactions were recognized. Thus, central mechanisms for the roles of these miRNAs as onco-suppressors and efficient cancer therapeutics were supported.

Session B-358: Identification of Associations between Genotypes and Longitudinal Phenotypes via Temporally-constrained Group Sparse Canonical Correlation Analysis
COSI: TransMed
  • Li Shen, Indiana University, USA
  • Xiaoke Hao, Nanjing University of Aeronautics and Astronautics, China
  • Chanxiu Li, Nanjing University of Aeronautics and Astronautics, China
  • Jingwen Yan, Indiana University, USA
  • Xiaohui Yao, Indiana University, USA
  • Shannon Risacher, Indiana University, USA
  • Andrew Saykin, Indiana University, USA
  • Daoqiang Zhang, Nanjing University of Aeronautics and Astronautics, China

Short Abstract: Motivation: Neuroimaging genetics identifies the relationships between genetic variants (i.e., the single nucleotide polymorphisms (SNPs)) and brain imaging data to reveal the associations from genotypes to phenotypes. So far, most existing machine learning approaches are widely used to detect the effective associations between genetic variants and brain imaging data at one time-point. However, those associations are based on static phenotypes and ignore the temporal dynamics of the phenotypical changes. The phenotypes across multiple time-points may exhibit temporal patterns that can be used to facilitate the understanding of the degenerative process. In this paper, we propose a novel temporally-constrained group sparse canonical correlation analysis (TGSCCA) framework to identify genetic associations with longitudinal phenotypic markers. Results: The proposed TGSCCA method is able to capture the temporal changes in brain from longitudinal phenotypes by incorporating the fused penalty, which requires that the differences between two consecutive canonical weight vectors from adjacent time-points should be small. A new efficient optimization algorithm is designed to solve the objective function. Furthermore, we demonstrate the effectiveness of our algorithm on both synthetic and real data (i.e., the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, including progressive mild cognitive impairment (pMCI), stable MCI (sMCI) and Normal Control (NC) participants). In comparison with conventional SCCA, our proposed method can achieve strong associations and discover phenotypic biomarkers across multiple time-points to guide disease-progressive interpretation.

Session B-362: In Silico Prediction of Novel Therapeutic Targets Using Gene – Disease Association Data
COSI: TransMed
  • Enrico Ferrero, GSK (GlaxoSmithKline), United Kingdom
  • Ian Dunham, EMBL-EBI European Bioinformatics Institute, United Kingdom
  • Philippe Sanseau, GlaxoSmithKline, United Kingdom

Short Abstract: Background Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. The Open Targets platform aims to bridge the gap between targets and diseases by collecting different types of evidence that can link the two entities, such as genetic or expression data. Results Here, we explore whether gene – disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on a subset of the data and evaluate their performance using a nested cross-validation strategy. On an independent test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76. Finally, we use the model to make predictions on more than 15000 genes and then validate our predictions by mining the scientific literature for proposed therapeutic targets. Conclusions These results confirm that data linking genes and diseases can be used effectively to formulate or strengthen hypotheses in the target discovery process and show that this type of evidence contains enough information to accurately predict novel therapeutic targets.

Session B-364: The Cancer Targetome: A Critical Step Towards Evidence-Based Precision Oncology
COSI: TransMed
  • Aurora Blucher, Oregon Health & Science University, United States
  • Gabrielle Choonoo, Oregon Health & Science University, United States
  • Molly Kulesz-Martin, Oregon Health & Science University, United States
  • Guanming Wu, Oregon Health & Science University, United States
  • Shannon McWeeney, Oregon Health & Science University, United States

Short Abstract: A core tenet of precision oncology is the rational selection of pharmaceutical therapies to interact with patient-specific biological targets of interest, but it is currently difficult for researchers to obtain consistent and well-supported target information for pharmaceutical drugs. To address this gap we have aggregated drug-target interaction and bioactivity information for FDA-approved antineoplastic drugs across four publicly available resources to create the Cancer Targetome. Our work offers a novel contribution due to both the inclusion of putative target interactions encompassing multiple targets for each antineoplastic drug and the introduction of a framework for categorizing the supporting evidence behind each drug-target interaction. While our findings reinforce the notion of sparsity of publicly available drug-target interaction space, we find that a subset of protein kinase inhibitors has extensive public bioactivity information. Data quality assessment of experimental binding data highlights the uncertainty of primary target annotation and allows for incorporation of putative secondary or off-targets. The level of discordance among experimental binding data types has implications for prioritization pipelines within industry portfolios and for precision oncology research. Critical unmet needs in drug-target interaction curation and annotation efforts include metadata on cell lines, cancer status, and/or target knockdown. We provide use cases for the drugs imatinib and vandetanib to demonstrate the utility of this resource. Complementary to the Cancer Targetome, we also introduce the concept of light and dark pathways, which reflects whether or not a biochemical pathway is currently potentially targetable by FDA-approved antineoplastic drugs. Mapping all drug targets with a reported binding affinity of less than 100nM to Reactome pathways reveals that less than 40% of pathways are light or potentially targetable by FDA-approved antineoplastic drugs. The remaining dark pathways pose as open areas for future cancer therapeutics development. The Cancer Targetome resource provides researchers access to clearly-evidenced drug-target interaction data in a manner that facilitates informed decision-making within the highly contextual nature of drug and target prioritization in precision oncology. We also highlight the use of this resource as a foundation for our modeling efforts with respect to predicting drug response.

Session B-366: Prostate cancer heterogeneity revealed by spatial transcriptomics
COSI: TransMed
  • Stefanie Friedrich, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Sweden
  • Christoph Ogris, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Sweden
  • Erik Sonnhammer, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Sweden
  • Niklas Schultz, Science for Life Laboratory, Department of Genetics, Microbiology and Toxicology, Arrhenius Laboratory, Stockholm University, Sweden
  • Emelie Berglund, Science for Life Laboratory, Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Sweden
  • Jonas Masskola, Science for Life Laboratory, Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Sweden
  • Firas Al-Ubaidi, Department of Urology, Central Hospital, Sweden
  • Maja Marklund, Science for Life Laboratory, Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Sweden
  • Joakim Lundeberg, Science for Life Laboratory, Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Sweden
  • Thomas Helleday, Science for Life Laboratory, Division of Translational Medicine and Chemical Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Sweden

Short Abstract: 1.Introduction The project aims to understand the extent of intra tumour heterogeneity concerning gene expression, pathways, and mutational sub-clones, to link tumour heterogeneity to efficient cancer treatment and to reveal underlying patterns impacting cancer therapy. 2.Spatial transcriptomics Twelve tissue samples were taken from one cancerous human prostate. The tissues were analysed with Spatial Transcriptomics. Spatial Transcriptomic is a novel approach to visualise and analyse quantitatively the transcriptome with spatial resolution, that is in regions of 10-100 cells in the context of an entire biopsy (Ståhl et al., 2016). Spatial Transcriptomics was applied to assess gene expression with spatial resolution in individual cancerous tissues to identify phenotypic diversity in cancerous regions. We revealed expression profiles for PIN and cancerous areas, but also for stroma and immune cells within the tissue. Areas with different Gleason scores show distinctive gene expression signatures. With the identified expression profile in annotated areas marked by a pathologist, we discovered additional areas with abnormal gene expression profiles. 3.Copy number variations Bulk sequenced DNA analysis revealed different types of mutations in the tissue samples, in particular large genetic deletions which we can link to the size of the tumour area. Additionally, we built a similarity tree based on clear amplification and deletions with four distinct clusters; each cluster contains one cancerous sample.. This tree reveals that the deletion and amplification profile is unique for each cancerous tissue sample. Moreover, samples belonging to one cluster are physically close to each other. 4.Gene fusion Based on bulk sequenced RNA data we detected the third most common gene fusion in prostate cancer, the gene fusion SLC45A3-ELK4 in two different variants. With the ST data we can locate the gene fusion close to PIN, cancerous, and suspected cancerous areas within the tissue for six of the 12 samples. 5.Clonal evolution We performed DNA and RNAseq analysis for mutation calling and copy number calling for each of the twelve tissue sections. We identified 37 highly reliable somatic mutations and copy number alterations in tumour suppressor genes, potential tumour suppressor genes, and oncogenes; affected genes are for example MXI1, a tumor suppressor, PHOX2B, predisposition for cancer, and U2AF1, an oncogene. Based on these 37 somatic mutations we determined clonal heterogeneity. We infer multi-clonal tumour genesis with different mixture of clones in the cancerous samples. References Ståhl, P. L., Salmén, F., Vickovic, S., Lundmark, A., Navarro, J. F., Magnusson, J., ... & Mollbrink, A. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science, 353(6294), 78-82.

Session B-368: Classification and analysis of a large collection of in vivo assay descriptions
COSI: TransMed
  • Magdalena Zwierzyna, BenevolentAI/University College London, United Kingdom
  • John Overington, Medicines Discovery Catapult, United Kingdom

Short Abstract: Testing potential drug treatments in animal models is a crucial part of preclinical drug discovery. Yet, high failure rates for new therapies in the clinic demonstrate a growing need for better understanding of the relevance and role of animal model research. In this study, we use text mining methods and machine learning models to systematically analyze a large collection of drug discovery assay descriptions in rats and mice. First, we parse the assay descriptions and mine them for information about animal experiments: genetic strains, experimental treatments, and phenotypic readouts used in the assays. To automatically organize the extracted information, we construct a semantic space of assay descriptions using a neural network language model, Word2Vec, and show that related animal models and phenotypic terms tend to cluster together in the constructed semantic space. In addition, we show that random forest classifiers trained with features generated by Word2Vec predict the class of drugs tested in different assays with accuracy of 0.89. Finally, we combine information mined from text with structured annotations stored in the ChEMBL database to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas. Our results demonstrate that text mining and machine learning have a potential to contribute to the ongoing debate on the interpretation and reproducibility of animal studies through enabling access, integration, and large-scale analysis of in vivo drug screening data.

Session B-370: Using Large Scale Health Claims Data for Comorbidity Prediction of Epilepsy Patients
COSI: TransMed
  • Thomas Gerlach, UCB Biosciences GmbH, Germany
  • Holger Fröhlich, University of Bonn, UCB Biosciences GmbH, Germany

Short Abstract: Epilepsy is a complex brain disorder characterized by repetitive seizure events. Epilepsy patients often suffer from various and severe physical and psychological co-morbidities. While general comorbidity prevalences and incidences can be estimated from epidemiological data, such an approach does not take into account that actual patient specific risks can depend on various individual factors, including medication. This motivates to develop a machine learning approach for predicting risks of future comorbidities of epilepsy patients. In this work we use Big Data from electronic health claims records (> 5 Million observations), which provide a time resolved view on an individual’s disease and medication history. We enrich these data with information from several databases (DisGeNet, TTD, KEGG, DrugBank, SIDER, …) to capture putative biological effects of observed diseases and applied medications and extract >20,000 features from ~30,000 epilepsy patients. We investigate and compare different machine learning approaches, such as Random Survival Forests and deep learning techniques for predicting future comorbidity occurrence after first epilepsy diagnosis. Furthermore, we assess the influence of anti-epileptic drugs (AEDs) on predicted comorbidity risks, which may help to optimize epilepsy patient treatment in the future. Altogether we see this ongoing project as a step towards a better personalized treatment of epilepsy patients.

Session B-372: Reliably Detecting Clinically Important Variants Requires Both Combined Variant Calls and Optimized Filtering Strategies
COSI: TransMed
  • Matt Field, James Cook University, Australia
  • Dan Andrews, Australian National University, Australia
  • Chris Goodnow, Garvan Institue of Medical Research, Australia

Short Abstract: A diversity of tools is available for identification of variants from genome sequence data. Given the current complexity of incorporating external software into a genome analysis infrastructure, a tendency exists to rely on the results from a single tool alone. The quality of the output variant calls is highly variable however, depending on factors such as sequence library quality as well as the choice of short-read aligner, variant caller, and variant caller filtering strategy. Here we present a two-part study first using the high quality ‘genome in a bottle’ reference set to demonstrate the significant impact the choice of aligner, variant caller, and variant caller filtering strategy has on overall variant call quality and further how certain variant callers outperform others with increased sample contamination, an important consideration when analyzing sequenced cancer samples. This analysis confirms previous work showing that combining variant calls of multiple tools results in the best quality resultant variant set, for either specificity or sensitivity, depending on whether the intersection or union, of all variant calls is used respectively. Second, we analyze a melanoma cell line derived from a control lymphocyte sample to determine whether software choices affect the detection of clinically important melanoma risk-factor variants finding that only one of the three such variants is unanimously detected under all conditions. Finally, we describe a cogent strategy for implementing a clinical variant detection pipeline; a strategy that requires careful software selection, variant caller filtering optimizing, and combined variant calls in order to effectively minimize false negative variants. While implementing such features represents an increase in complexity and computation the results offer indisputable improvements in data quality.

Session B-374: Gene signature during infancy predicts seroconversion in children at risk for type-1 diabetes and uncovers drugs with clinical application
COSI: TransMed
  • Ahmed Mehdi, The University of Queensland Diamantina Institute, Translational Research Institute, Australia
  • Anette Ziegler, Technical Univeristy of Munich, School of Medicine, Germany
  • Ezio Bonifacio, Technical University of Dresden, Regenerative Therapies for Diabetes, Germany
  • Emma Hamilton-Williams, The University of Queensland Diamantina Institute, Translational Research Institute, Australia
  • Kim-Anh Le Cao, The University of Queensland Diamantina Institute, Translational Research Institute, Australia
  • Mark Harris, The University of Queensland Diamantina Institute, Translational Research Institute, Australia
  • Ranjeny Thomas, The University of Queensland Diamantina Institute, Translational Research Institute, Australia

Short Abstract: Autoimmune-mediated destruction of pancreatic islet beta-cells results in type 1 diabetes mellitus (T1D). Serum antibodies indicate islet autoimmunity with seroconversion often occurring in early childhood. However greater than 90% of children with increased genetic risk remain islet-antibody negative and currently there is no test to identify children who are likely to seroconvert or the timing of seroconversion. We present a novel data integration methodology to integrate two longitudinal clinical and gene expression cohorts of T1D at-risk children. We analysed all currently-available longitudinal peripheral blood gene expression profiles from 167 high-risk children to uncover differentially-expressed (DE) genes predicting seroconversion. After correcting for the timing of seroconversion, a linear equation identified 28 DE genes in seroconverting children relative to antibody-negative children. Nineteen of these were linked through the ubiquitin-proteasome pathway. Near birth expression of CLDN1, ADCY9, ZNF135, SPRY1 and GDPD5 stratified non-seroconvertors with specificity >94%. Using logistic regression, we identified sets of DE genes predicting early or late seroconversion with area under receiver operator characteristic curve of 0.74 and 0.84 respectively. These genes contribute to dendritic cell-T cell interaction. We then use these genes to mine existing databases and identified 43 drugs that could delay or reverse T1D autoimmunity. Of these drugs 14 overlapped with drugs predicted to induce "non T1D progressor" expression profile. A greater understanding of the factors determing the risk and timing of seroconversion will not only provide novel mechanistic insights but also permit subject stratification for future primary prevention trials.

Session B-376: BioThings Explorer: Utilizing JSON-LD for Linking Biological APIs to Facilitate Knowledge Discovery
COSI: TransMed
  • Jiwen Xin, The scripps research institute, United States
  • Cyrus Afrasiabi, the scripps research institute, United States
  • Sebastien Lelong, The scripps research institute, United States
  • Chunlei Wu, The scripps research institute, United States

Short Abstract: RESTful APIs have been widely used to distribute biological data in a programmatic manner. And many popular biological APIs such as MyGene.info, MyVariant.info, Drugbank, Reactome, Wikipathways and Ensembl, all adopt JSON as their primary data format. These disparate API resources feature diverse types of biological entities, e.g. variants, genes, proteins, pathways, drugs, symptoms, and diseases. The integration of these API resources would greatly facilitate scientific domains such as translational medicine, where multiple types of biological entities are involved, and often from different resources. To fulfill the task of integrating API resources, we have designed a workflow pattern using a semantic web technologies. This workflow pattern uses JSON-LD, a W3C standard for representing Linked Data. In our proposal, each API specifies a JSON-LD context file, which provides Universal Resource Identifier (URI) mapping for each input/output types. Besides, an API registry system is created, where API metadata info, such as query syntax, input/output types is collected, allowing API calls to be generated automatically. By utilizing this workflow, we are able to link different API resources through the input/output types which they share in common. For example, MyGene.info adopts Entrez Gene ID as its input type, which is also one of the output types for MyVariant.info. Thus, data in these two APIs could be linked together through Entrez Gene ID. Following this workflow, we have developed a Python package(https://github.com/biothings/biothings_explorer_web ) as well as a web visualization interface named ‘BioThings Explorer’(http://biothings.io/explorer/) using Cytoscape.js. These tools empower users to explore the relationship between different biological entities through the vast amount of biological data provided by various API resources in a visually organized manner. For example, users could easily explore all biological pathways in which a rare Mendelian disease candidate gene is involved, and then find all genes as well as chemical compounds which could regulate these biological pathways (IPython Notebook Demo: https://goo.gl/sx34T2), thus providing potential treatment options.

Session B-378: Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach.
COSI: TransMed
  • Suleiman Ali Khan, Institute of Molecular Medicine, Helsinki University, Finland, Finland
  • Krister Wennerberg, Institute of Molecular Medicine, Helsinki University, Finland, Finland
  • Mehreen Ali, Institute of Molecular Medicine, Helsinki University, Finland, Finland
  • Tero Aittokallio, Institute of Molecular Medicine, Helsinki University, Finland, Finland

Short Abstract: Motivation: Proteomics profiling is increasingly being used for molecular stratification of cancer patients and cell-line panels. However, systematic assessment of the predictive power of large-scale proteomic technologies for various drug classes and cancer types is currently lacking. To that end, we carried out the first pan-cancer, multi-omics comparative analysis of the relative performance of two proteomic technologies, targeted reverse phase protein array (RPPA) and global mass spectrometry (MS), in terms of their accuracy for predicting the sensitivity of cancer cells to both cytotoxic chemotherapeutics and molecularly-targeted anticancer compounds. Results: Our results in two cell-line panels demonstrate how MS profiling improves drug response predictions beyond that of the RPPA or the other omics profiles alone. However, frequent missing MS data values complicate its use in predictive modeling and required filtering, such as focusing on completely-measured or known oncoproteins, to obtain maximal predictive performance. Rather strikingly, the two proteomic profiles provided complementary predictive signal both for the cytotoxic and targeted compounds. Further, information about the cellular-abundance of primary target proteins was found critical for predicting the response of targeted compounds, although the non-target features also contributed significantly to the predictive power. The clinical relevance of the selected protein markers was confirmed in cancer patient data. These results provide novel insights into the relative performance and optimal use of the widely-applied proteomic technologies, MS and RPPA, which should prove useful in translational applications, such as defining the best combination of omics technologies and markers for understanding and predicting drug sensitivities in cancer patients.

Session B-380: TOPSPIN: a novel algorithm to predict treatment specific survival in cancer
COSI: TransMed
  • Joske Ubels, UMC Utrecht, Netherlands
  • Erik van Beers, SkylineDx, Netherlands
  • Annemiek Broijl, Erasmus MC, Netherlands
  • Pieter Sonneveld, Erasmus MC Cancer Institute, Netherlands
  • Martin van Vliet, SkylineDx, Netherlands
  • Jeroen de Ridder, UMC Utrecht, Netherlands

Short Abstract: Motivation: Cancer treatment can have heterogeneous response rates, while they are often associat-ed with serious side effects. Treatment response may well be improved by selecting the right treat-ment for the right patient. This requires the discovery of predictive markers, i.e. markers that are able to predict whether a patient will benefit from treatment. The discovery of markers like these, for exam-ple gene expression signatures, remains a major challenge. Here we propose TOPSPIN (Treatment Outcome Prediction using Similarity between PatIeNts), a new computational method to discover gene expression signatures capable of identifying a subgroup of patients more likely to benefit from a specif-ic treatment as compared to another treatment. TOPSPIN relies on the idea that potential signature gene sets can be identified by searching for patients with similar gene expression profiles, who exhibit large survival differences when receiving a different treatment. Results: We demonstrate the utility of TOPSPIN in a multiple myeloma dataset, where patients en-rolled in a phase III clinical trial either received the proteasome inhibitor bortezomib or not. Here we successfully select gene sets that can identify a subset of patients in which we find a significant haz-ard ratio between the bortezomib group and the group who received conventional therapy. The classi-fier trained with these gene sets validates on independent test data. It outperforms classifiers trained using nearest mean classifier, random forest and support vector machine.

Session B-382: A deep learning approach for uncovering lung cancer immunome patterns
COSI: TransMed
  • Moritz Hess, IMBEI University Medical Center Mainz, Germany
  • Stefan Lenz, IMBEI University Medical Center Mainz, Germany
  • Harald Binder, IMBEI University Medical Center Mainz, Germany

Short Abstract: Tumor immune cell infiltration is a well known factor related to survival of cancer patients. This has led to deconvolution approaches that can quantify immune cell proportions for each individual. What is missing, is an approach for modeling joint patterns of different immune cell types. Results: We adapt a deep learning approach, deep Boltzmann machines (DBMs), for modeling immune cell gene expression patterns in lung adenocarcinoma. Specifically, a partially partitioned training approach for dealing with a relatively large number of genes. We also propose a sampling-based approach that smooths the original data according to a trained DBM and can be used for visualization and clustering. The identified clusters can subsequently be judged with respect to association with clinical characteristics, such as tumor stage, providing an external criterion for selecting DBM network architecture and tuning parameters for training. We show that the hidden nodes of the trained networks cannot only be linked to clinical characteristics but also to specific genes, which are the visible nodes of the network. We find that hidden nodes that are linked to tumor stage and survival represent expression of T-cell and mast cell genes among others, probably reflecting specific immune cell infiltration patterns. Thus, DBMs, trained and selected by the proposed approach, might provide a useful tool for extracting immune cell gene expression patterns. In the case of lung adenocarcinomas, these patterns are linked to survival as well as other patient characteristics, which could be useful for uncovering the underlying biology.

Session B-384: Inferring clonal composition from multiple tumor biopsies
COSI: TransMed
  • Pavel Sumazin, Baylor College of Medicine, United States
  • Matteo Manica, ETH, Switzerland
  • Roland Mathis, IBM, Switzerland
  • María Rodríguez Martínez, IBM, Switzerland
  • Peter Wild, University Hospital Zurich, Switzerland

Short Abstract: Motivation. Knowledge about the clonal evolution of each tumor can inform driver-alteration discovery by pointing out initiating genetic events as well as events that contribute to the selective advantage of proliferative, and potentially drug-resistant tumor subclones. A necessary building block to the reconstruction of clonal evolution from tumor profiles is the estimation of the cellular composition of each tumor subclone (cellularity), and these, in turn, are based on estimates of the relative abundance (frequency) of subclone-specific genetic alterations in tumor biopsies. Estimating the frequency of genetic alterations is complicated by the high genomic instability that characterizes many tumor types. Results. Analysis of our mutation-centric model for genomic instability suggests that copy number variations (CNVs) that are commonly observed in tumor profiles can dramatically alter mutation-frequency estimates and, consequently, the reconstruction of tumor phylogenies. We argue that detailed accounting for CNVs based on profiles of multiple biopsies for each tumor are required to accurately estimate mutation frequencies. To help resolve this problem we propose an optimization algorithm—Chimæra: clonality inference from mutations across biopsies—that accounts for the effects of CNVs in multiple same-tumor biopsies to estimate both mutation frequencies and copy numbers of mutated alleles. We show that mutation-frequency estimates by Chimæra are consistently more accurate in unstable genomes. When studying profiles of multiple biopsies of a high-risk prostate tumor, we show that Chimæra inferences allow for reconstructing its clonal evolution. Data availability. Sequencing data is deposited in ENA project PRJEB19193 and source code in GitHub project Chimaera.

Session B-386: Molecular signatures that can be transferred across different omics platforms
COSI: TransMed
  • Michael Altenbuchinger, University of Regensburg, Germany
  • Rainer Spang, University of Regensburg, Germany
  • Philipp Schwarzfischer, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany
  • Thorsten Rehberg, Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany
  • Jörg Reinders, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany
  • Christian W. Kohler, Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany
  • Wolfram Gronwald, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany
  • Julia Richter, Department of Pathology, Hematopathology Section and Lymph Node Registry, University Hospital Schleswig-Holstein, Campus Kiel/Christian-Albrecht University, Kiel, Germany
  • Monika Szczepanowski, Department of Pathology, Hematopathology Section and Lymph Node Registry, University Hospital Schleswig-Holstein, Campus Kiel/Christian-Albrecht University, Kiel, Germany
  • Neus Masqué-Soler, Department of Pathology, Hematopathology Section and Lymph Node Registry, University Hospital Schleswig-Holstein, Campus Kiel/Christian-Albrecht University, Kiel, Germany
  • Wolfram Klapper, Department of Pathology, Hematopathology Section and Lymph Node Registry, University Hospital Schleswig-Holstein, Campus Kiel/Christian-Albrecht University, Kiel, Germany
  • Peter J. Oefner, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany, Germany

Short Abstract: Motivation: Molecular signatures for treatment recommendations are well researched. Still it is challenging to apply them to data generated by different protocols or technical platforms. Results: We analyzed paired data for the same tumors (Burkitt lymphoma, diffuse large B-cell lymphoma) and features that had been generated by different experimental protocols and analytical platforms including the nanoString nCounter and Affymetrix Gene Chip transcriptomics as well as the SWATH and SRM proteomics platforms. A statistical model that assumes independent sample and feature effects accounted for 69% to 94% of technical variability. We analyzed how variability is propagated through linear signatures possibly affecting predictions and treatment recommendations. Linear signatures with feature weights adding to zero were substantially more robust than unbalanced signatures. They yielded consistent predictions across data from different platforms, both for transcriptomics and proteomics data. Similarly stable were their predictions across data from fresh frozen and matching formalin-fixed paraffin-embedded human tumor tissue. Availability: The R-package “zeroSum” can be downloaded at https://github.com/rehbergT/zeroSum. Complete data and R codes necessary to reproduce all our results can be received from the authors upon request.

Session B-388: Prediction of HIV-1 sensitivity to broadly neutralizing antibodies shows a trend towards resistance over time
COSI: TransMed
  • Anna Hake, Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Germany
  • Nico Pfeifer, Department of Computer Science, University of Tübingen, Germany

Short Abstract: Treatment with broadly neutralizing antibodies (bNAbs) has proven effective against HIV-1 infections in humanized mice, non-human primates, and humans. Due to the high mutation rate of HIV-1, resistance testing of the patient’s viral strains to the bNAbs is still inevitable. So far, bNAb resistance can only be tested in expensive and time-consuming neutralization experiments. We introduce well-performing computational models that predict the neutralization response of HIV-1 to bNAbs given only the envelope sequence of the virus. Using non-linear support vector machines based on a string kernel, the models learnt even the important binding sites of bNAbs with more complex epitopes, i.e., the CD4 binding site targeting bNAbs, proving thereby the biological relevance of the models. To increase the interpretability of the models, we additionally provide a new kind of motif logo for each query sequence, visualizing those residues of the test sequence that influenced the prediction outcome the most. Moreover, we predicted the neutralization sensitivity of around 34,000 HIV-1 samples from different time points to a broad range of bNAbs, enabling the first analysis of HIV resistance to bNAbs on a global scale. The analysis showed for many of the bNAbs a trend towards antibody resistance over time, which had previously only been discovered for a small non-representative subset of the global HIV-1 population.

Session B-503: Glioblastoma epigenetic landscape is controlled by E2F1/E2F4 transcription facors
COSI: TransMed
  • Bartosz Wojtas, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland
  • Marta Maleszewska, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland
  • Bartlomiej Gielniewski, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland
  • Jakub Mieczkowski, Nencki Institute of Experimental Biology, Warsaw, Poland,
  • Michal Dabrowski, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland
  • Sam Mondal, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland
  • Janusz Siedlecki, The Maria Skłodowska Curie Memorial Cancer Centre and Institute of Oncology, Warsaw, Poland, Poland
  • Mateusz Bujko, The Maria Skłodowska Curie Memorial Cancer Centre and Institute of Oncology, Warsaw, Poland, Poland
  • Pawel Naumann, The Maria Skłodowska Curie Memorial Cancer Centre and Institute of Oncology, Warsaw, Poland, Poland
  • Wieslawa Grajkowska, The Children’s Memorial Health Institute, Warsaw, Poland, Poland
  • Katarzyna Kotulska, The Children’s Memorial Health Institute, Warsaw, Poland, Poland
  • Bozena Kaminska, Nencki Institute of Experimental Biology, Warsaw, Poland, Poland

Short Abstract: Glioblastomas (GBM) are considered to be one of the most difficult human malignancies to treat due to frequent dysfunctions of tumor suppressors and oncogenes. Recent studies characterizing different GBM subgroups indicate that besides genetic alterations, epigenetic modifications could be involved in tumor development and progression. There is a growing evidence of genetic alterations in epigenetic enzyme coding genes or their expression pattern occurring in gliomas. We hypothesized that epigenetic dysfunctions could be the primary cause of gliomagenesis.
Using TaqMan quantitative real-time PCR we evaluated the expression of major epigenetic enzymes and chromatin modifiers in 28 high grade gliomas (WHO grade IV tumors), 23 benign juvenile gliomas (juvenile pilocytic astrocytomas, JPAs, WHO grade I) and 7 normal human brain samples. We found a profound and global down-regulation of most of epigenetic enzymes and modifiers in GBM samples when compared to normal brains and JPAs. To find a common denominator that could explain coordinated downregulation of epigenetic enzymes/modifiers, we employed PWMEnrich tool for DNA motif scanning and enrichment analysis. We discovered the presence of high affinity motifs for a few transcription factors (including E2F1/E2F4) within the epigenetic enzyme/modifier promoters. Independently, we searched for correlation of expression of selected transcription factors and epigenetic enzymes/modifiers. We found a strong correlation between E2F1/E2F4 transcription factor expression and the expression of epigenetic enzymes/modifiers. We conclude that the E2F1/E2F4 transcription factors predominantly control epigenetic enzymes/modifiers and shape the epigenetic landscape in gliomas and subsequently tumor pathobiology.

Funding: Supported by a National Science Centre grant 2013/09/B/NZ3/01402 (MM)

Session B-505: From patients to molecules and back again: HLA-dependent varicella-zoster susceptibility
COSI: Transmed
  • Pieter Meysman, University of Antwerp, Belgium
  • Nicolas De Neuter Belgium
  • Esther Bartholomeus Belgium
  • Kris Laukens Belgium
  • Benson Ogunjimi Belgium

Short Abstract: The varicella-zoster virus (VZV) can cause two distinct diseases: chickenpox (varicella) and shingles (herpes zoster). Varicella is a common disease in young children, while herpes zoster is more frequent in older individuals. The human leukocyte antigen (HLA) genes are responsible for presenting viral peptides to the adaptive immune system and are highly variable between individuals.

In this study, we developed a computational interaction model to study the affinity of VZV peptides by different HLA variants. Using public patient data, we had already shown that the affinity for varicella-zoster peptides of HLA variants associated with a higher risk of herpes zoster complications have a lower relative affinity for varicella-zoster peptides than variants with a lower risk. To validate these previous findings, we collaborated with the Antwerp University Hospital to characterize the T-cell population and HLA types of a group of 50 volunteers with a history of herpes zoster. Using the established computational interaction model, we assigned a VZV Ranked Affinity Score (RAS) to each individual in the cohort based on his or her HLA type. The RAS value revealed that individuals with a history of herpes zoster had a lower affinity for critical VZV proteins than expected from the background population. This difference in RAS values was most noticeable in the youngest participants in our study for whom known herpes zoster associated factors such as age and immunosuppressants are absent.


View Posters By Category

Search Posters: