Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


TransMed: Translational Medical Informatics

COSI Track Presentations

Schedule subject to change
Monday, July 22nd
10:20 AM-11:10 AM
Five ways computational biologists can accelerate medicine
  • Isaac Kohane

Presentation Overview: Show

In the broadest sense it is uncontroversial that computational biology is helping advance biomedicine because knowledge-processing is at its core. I will argue that this is an unnecessarily limiting definition that both slows progress and prevents individual computational biologists from maximizing their personal impact. Among the approaches with which computational biologists can overcome this artificial limitation:
1. Short but intensive apprenticeships in the biomedical application domain that is your passion.
2. Overcome the sociological barrier between the intellectually parallel computational challenges of clinical medicine and molecular biology. Lead by example in crossing that chasm.
3. Identification within biomedicine of subtasks that most researchers and clinicians do not realize can benefit from massive improvement through computational methods.
4. Pitch In, in individual cases where standard medicine has failed, and help patients directly in their patient care (in ways MDs are unable to).
5. Become a leader. Not only by asking the important questions but obtain funding and teams to answer them. I will illustrate these points with prismatic (and real) examples drawn from the domains of discovery, rare diseases, clinical care and health policy.

11:10 AM-11:20 AM
Prioritising cancer therapeutic targets through CRISPR-Cas9 screens and multi-omics data integration
  • Fiona Behan, Wellcome Sanger Institute, United Kingdom
  • Francesco Iorio, Wellcome Sanger Institute, United Kingdom
  • Gabriele Picco, Wellcome Sanger Institute, United Kingdom
  • Kosuke Yusa, Wellcome Sanger Institute, United Kingdom
  • Mathew Garnett, Wellcome Sanger Institute, United Kingdom

Presentation Overview: Show

Abstract in the attached pdf.

11:20 AM-11:40 AM
Proceedings Presentation: Predicting drug-induced transcriptome responses of a wide range of human cell lines by a novel tensor-train decomposition algorithm
  • Michio Iwata, Kyushu Institute of Technology, Japan
  • Yoshihiro Yamanishi, Kyushu Institute of Technology, Japan

Presentation Overview: Show

Genome-wide identification of the transcriptomic responses of human cell lines to drug treatments is a challenging issue in medical and pharmaceutical research. However, drug-induced gene expression profiles are largely unknown and unobserved for all combinations of drugs and human cell lines, which is a serious obstacle in practical applications. Here, we developed a novel computational method to predict unknown parts of drug-induced gene expression profiles for various human cell lines and predict new drug therapeutic indications for a wide range of diseases. We proposed a tensor-train weighted optimization (TT-WOPT) algorithm to predict the potential values for unknown parts in tensor-structured gene expression data. Our results revealed that the proposed TT-WOPT algorithm can accurately reconstruct drug-induced gene expression data for a range of human cell lines in the Library of Integrated Network-based Cellular Signatures. The results also revealed that in comparison with the use of original gene expression profiles, the use of imputed gene expression profiles improved the accuracy of drug repositioning. We also performed a comprehensive prediction of drug indications for diseases with gene expression profiles, which suggested many potential drug indications that were not predicted by previous approaches.

11:40 AM-12:00 PM
Proceedings Presentation: Identifying and ranking potential driver genes of Alzheimer's Disease using multi-view evidence aggregation
  • Sumit Mukherjee, Sage Bionetworks, United States
  • Thanneer Malai Perumal, Sage Bionetworks, United States
  • Kenneth Daily, Sage Bionetworks, United States
  • Solveig Sieberts, Sage Bionetworks, United States
  • Larsson Omberg, Sage Bionetworks, United States
  • Christoph Preuss, The Jackson Labortory, United States
  • Gregory Carter, The Jackson Laboratory, United States
  • Lara Mangravite, Sage Bionetworks, United States
  • Benjamin Logsdon, Sage Bionetworks, United States

Presentation Overview: Show

Motivation: Late onset Alzheimer’s disease (LOAD) is currently a disease with no known effective treatment options. To address this, there have been a recent surge in the generation of multi-modality data (Hodes and Buckholtz, 2016; Muelleret al., 2005) to understand the biology of the disease and potential drivers that causally regulate it. However, most analytic studies using these data-sets focus on uni-modal analysis of the data. Here we propose a data-driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our paper are: i) A general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature-sets and identifying other potential driver genes which have similar feature representations, and ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study (GWAS) summary statistics.
While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types.

Results: We demonstrate the utility of our machine learning algorithm on two benchmark multi-view datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimers. We show that our ranked genes show a significant enrichment for SNPs associated with Alzheimers, and are enriched in pathways that have been previously associated with the disease.

12:00 PM-12:20 PM
Proceedings Presentation: Drug repositioning based on bounded nuclear norm regularization
  • Mengyun Yang, Central South University, China
  • Huimin Luo, Central South University, China
  • Yaohang Li, Old Dominion University, United States
  • Jianxin Wang, Central South University, China

Presentation Overview: Show

Motivation: Computational drug repositioning is a cost-effective strategy to identify novel indications for existing drugs. Drug repositioning is often modeled as a recommendation system problem. Taking advantage of the known drug-disease associations, the objective of the recommendation system is to identify new treatments by filling out the unknown entries in the drug-disease association matrix, which is known as matrix completion. Underpinned by the fact that common molecular pathways contribute to many different diseases, the recommendation system assumes that the underlying latent factors determining drug-disease associations are highly correlated. In other words, the drug-disease matrix to be completed is low-rank. Accordingly, matrix completion algorithms efficiently constructing low-rank drug-disease matrix approximations consistent with known associations can be of immense help in discovering the novel drug-disease associations.
Results: In this article, we propose to use a Bounded Nuclear Norm Regularization (BNNR) method to complete the drug-disease matrix under the low-rank assumption. Instead of strictly fitting the known elements, BNNR is designed to tolerate the noisy drug-drug and disease-disease similarities by incorporating a regularization term to balance the approximation error and the rank properties. Moreover, additional constraints are incorporated into BNNR to ensure that all predicted matrix entry values are within the specific interval. BNNR is carried out on an adjacency matrix of a heterogeneous drug-disease network, which integrates the drug-drug, drug-disease, and disease-disease networks. It not only makes full use of available drugs, diseases, and their association information, but also is capable of dealing with cold start naturally. Our computational results show that BNNR yields higher drug-disease association prediction accuracy than the current state-of-the-art methods. The most significant gain is in prediction precision measured as the fraction of the positive predictions that are truly positive, which is particularly useful in drug design practice. Cases studies also confirms the accuracy and reliability of BNNR.
Availability: The code of BNNR is freely available at https://github.com/BioinformaticsCSU/BNNR
Contact: jxwang@mail.csu.edu.cn

12:20 PM-12:40 PM
Proceedings Presentation: Enhancing the Drug Discovery Process: Bayesian Inference for the Analysis and Comparison of Dose-Response Experiments
  • Caroline Labelle, University of Montreal, Canada
  • Anne Marinier, University of Montreal, Canada
  • Sébastien Lemieux, University of Montreal, Canada

Presentation Overview: Show

Motivation: The efficacy of a chemical compound is often tested through dose-response experiments from which efficacy metrics, such as the IC50 , can be derived. The Marquardt-Levenberg algorithm (non-linear regression) is commonly used to compute estimations for these metrics. The analysis are however limited and can lead to biased conclusions. The approach does not evaluate the certainty (or uncertainty) of the estimates nor does it allow for the statistical comparison of two datasets. To compensate for these shortcomings, intuition plays an important role in the interpretation of results and the formulations of conclusions. We here propose a Bayesian inference methodology for the analysis and comparison of dose-response experiments.

Results: Our results well demonstrate the informativeness gain of our Bayesian approach in comparison to the commonly used Marquardt-Levenberg algorithm. It is capable to characterize the noise of dataset while inferring probable values distributions for the efficacy metrics. It can also evaluate the difference between the metrics of two datasets and compute the probability that one value is greater than the other. The conclusions that can be drawn from such analyzes are more precise.

Availability: We implemented a simple web interface that allows the users to analyze a single dose-response dataset, as well as to statistically compare the metrics of two datasets.

2:00 PM-2:40 PM
Transforming electronic data from clinical routine practice into a viable data source for biological modeling: culture, standardization and technologies
  • Charles Mayo, University of Michigan, United States

Presentation Overview: Show

The potential for annotated clinical data from routine practice to provide vital prognostics and outcomes information to unite with biological data develop models to guide decision frameworks is compelling. Making this change requires collaborative work along several fronts including, shifting clinical practice culture, filling gaps in standardization and ontologies, implementation of scalable technologies that work in clinical environments, and changing ideas about uses of observational data. This presentation will review efforts from several groups in the US and in Europe which are advancing clinics toward this computational translational science threshold.

2:40 PM-3:00 PM
Proceedings Presentation: Representation Transfer for Differentially Private Drug Sensitivity Prediction
  • Teppo Niinimäki, Aalto University, Finland
  • Mikko Heikkilä, University of Helsinki, Finland
  • Antti Honkela, University of Helsinki, Finland
  • Samuel Kaski, Aalto University, Finland

Presentation Overview: Show

Motivation: Human genomic datasets often contain sensitive
information that limits use and sharing of the data. In particular,
simple anonymisation strategies fail to provide sufficient level of
protection for genomic data, because the data are inherently
identifiable. Differentially private machine learning
can help by guaranteeing that the published results
do not leak too much information about any individual data point.
Recent research has reached promising results on differentially
private drug sensitivity prediction using gene expression data.
Differentially private learning with genomic data is challenging
because it is more difficult to guarantee privacy in high
dimensions. Dimensionality reduction can help, but if the dimension
reduction mapping is learned from the data, then it needs to be
differentially private too, which can carry a significant privacy
cost. Furthermore, the selection of any hyperparameters (such as the
target dimensionality) needs to also avoid leaking private

Results: We study an approach that uses a large public
dataset of similar type to learn a compact representation for
differentially private learning. We compare three representation
learning methods: variational autoencoders, PCA and random
projection. We solve two machine learning tasks on gene expression
of cancer cell lines: cancer type classification, and drug sensitivity
prediction. The experiments demonstrate significant benefit from all
representation learning methods with variational autoencoders
providing the most accurate predictions most often. Our results
significantly improve over previous state-of-the-art in accuracy of
differentially private drug sensitivity prediction.

Availability: Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer

3:00 PM-3:20 PM
Proceedings Presentation: Deep Learning with Multimodal Representation for Pancancer Prognosis Prediction
  • Anika Cheerla, Stanford University, United States
  • Olivier Gevaert, Stanford University, United States

Presentation Overview: Show

Motivation: Estimating the future course of patients with cancer lesions is invaluable to physicians; however, current clinical methods fail to effectively use the vast amount of multimodal data that is available for cancer patients. To tackle this problem, we constructed a multi-modal neural network based model to predict the survival of patients for 20 different cancer types using clinical data, mRNA expression data, microRNA expression data and histopathology whole slide images (WSIs). We developed an unsupervised encoder to compress these four data modalities into a single feature vector for each patient, handling missing data through a resilient, multimodal dropout method. Encoding methods were tailored to each data type - using deep highway networks to extract features from clinical and genomic data, and convolutional neural networks to extract features from WSIs.
Results: We used pancancer data to train these feature encodings and predict single cancer and pancancer overall survival, achieving a C-index of 0.78 overall. This work shows that it is possible to build a pancancer model for prognosis that also predicts prognosis in single cancer sites. Furthermore, our model handles multiple data modalities, efficiently analyzes WSIs, and represents patient multi-modal data flexibly into an unsupervised, informative representation. We thus present a powerful automated tool to accurately determine prognosis, a key step towards personalized treatment for cancer patients.

3:20 PM-3:30 PM
Rare Disease Gene Prioritization Using MEDLINE Derived Association Network
  • Rajgopal Srinivasan, TCS Research, India
  • Naveen Sivadasan, TCS Research, India
  • Aditya Rao, TCS Research, India
  • Thomas Joseph, TCS Research, India
  • Vg Saipradeep, TCS Research, India
  • Sujatha Kotte, TCS Research, India

Presentation Overview: Show

Rare disease gene prioritization approaches rely on high quality curated resources containing disease, gene and phenotype annotations. However, effectiveness of such approaches is constrained by the limited recall and high curation cost of annotated data.

We develop a tool PRIORI-T for rare disease gene prioritization that takes an input set of phenotypes describing a clinical case. PRIORI-T makes use of rare disease correlation pairs extracted from MEDLINE involving human rare diseases, phenotypes and genes. Further, the correlation pairs are augmented using novel associations inferred using the information propagation algorithm GCAS (Graph Convolution-based Association Scoring) and an association network is constructed. The gene prioritization performance of PRIORI-T was validated using the phenotype descriptions of 230 real-world rare disease clinical cases collated from recent publications.

PRIORI-T captured over 97% of the curated associations having at least one phenotype and gene association in the curated resources Orphanet and HPO. For the clinical cases, the causal genes were captured within Top-50 and Top-300 for more than 40% and 72% of the cases respectively. PRIORI-T outperformed other competing approaches for gene prioritization that rely primarily on curated resources. Combining PRIORI-T with variant prioritization tools could further improve the accuracy of identifying causal genes.

3:30 PM-3:40 PM
Improving the discriminatory power of combinatorial antigen recognition in T cell therapies for cancer
  • Ruth Dannenfelser, Princeton University, United States
  • Olga Troyanskaya, Princeton University, United States
  • Benjamin Vandersluis, Center for Computational Biology, Flatiron Institute, United States
  • Sarah Levinson, University of California, San Francisco, United States
  • Gregory Allen, University of California, San Francisco, United States
  • Wendell Lim, University of California, San Francisco, United States

Presentation Overview: Show

Advancements in cell engineering have given rise to the next generation of immune therapies whereby T cells can be engineered to recognize and target cancer cells. While studies have shown great promise in eradicating tumor cells, off-target tissue toxicities have made these treatments unsafe for the clinic, as antigens used for tumor recognition are also present on normal cells. Fortunately, recent developments in cell engineering have now made it possible to design T cells that respond to multiple antigens with Boolean (AND and NOT) logic. Here we leveraged this new technology and performed a comprehensive computational screen to find pairs of antigens that together will target tumor cells while avoiding tissue toxicity. More specifically, we searched the space of tumor and normal tissue expression data for all possible combinations of approximately 3,300 cell surface markers for 20 tumor types using a cluster-based approach to prioritize pairs that maximize precision and recall, finding safer antigens that can more likely overcome tumor heterogeneity. Furthermore, we analyzed the space of these candidate pairs and found that for most cancer types combinations of 2-3 antigens are likely to be sufficient for discrimination, and putative secondary antigens for current clinical CAR T antigen targets.

3:40 PM-3:50 PM
Harnessing genetic interactions to advance precision cancer medicine
  • Joo Sang Lee, Cancer Data Science Lab, NCI/NIH, United States
  • Avinash Das, Department of Biostatistics and Computational Biology, Harvard School of Public Health, United States
  • Eytan Ruppin, Cancer Data Science Lab, NCI/NIH, United States

Presentation Overview: Show

Precision cancer medicine approaches are typically focused on searching for ‘actionable’ mutations in these genes, aiming at their therapeutic targeting. However, identifying novel genetic interactions between cancer genes may open new drug treatment opportunities. We studied two fundamental types of genetic interactions: The well-known synthetic lethal interactions, describing the relationship between two genes whose combined inactivation is lethal to the cell; and the less-known synthetic rescues interactions, where a change in the activity of one gene is lethal to the cell but an alteration of its SR partner gene rescues cell viability. We shall describe a new approach for the data-driven identification of these genetic interactions by directly mining patients’ tumor data. Applying it to analyze the Cancer Genome Atlas (TCGA) data, we have identified the first pan-cancer genetic interaction networks shared across many types of cancer, which we then validated via existing and new experimental in vitro and in vivo screens. We find that: (a) synthetic lethal interactions offer an exciting venue for personalized selective anticancer treatments enabling the prediction of patients’ drug response and providing new selective drug target candidates, and (b) targeting synthetic rescue genes can mitigate resistance to primary cancer therapy, including both targeted and immunotherapy.

3:50 PM-4:00 PM
Combining Machine Learning with Single-cell Analysis for Individualized Precision Medicine
  • Benedict Anchang, Stanford University, United States
  • Loukia Karacosta, Stanford University, United States
  • Sylvia Plevritis, Stanford University, United States

Presentation Overview: Show

Cancer cells interact with their microenvironment during tumor progression changing their phenotypic states. This challenges the field of precision medicine which is currently not optimized for the individual patient. We now have the ability to obtain highly resolved molecular phenotypes from individual cells from patient samples that can be used to define cell states and study cellular responses to drugs. We present 2 Network-based computational frameworks referred to as STAMP and DRUGNEM with the potential to precisely determine the dynamic state of a disease and individualize combination therapy respectively for a given patient with applications in lung cancer and leukemia. STAMP combines mass cytometry time-series data with machine learning to predict the states of tumor cells from 5 lung cancer patients using a reference Epithelial Mesenchymal Transition (EMT) map trained with a Neural Network. DRUGNEM is used to individualize therapy for 30 ALL patients. Instead of trying to identify a mutation in the DNA and then try to find a drug that addresses that mutation, DRUGNEM isolates single cells from the patient. Then test those cells against a set of drugs to see which drug combinations are effective against the tumor by optimizing early intracellular responses using nested effects models.

4:40 PM-5:20 PM
Harnessing Big, Multidisciplinary Data to Inform Cancer Medicine
5:20 PM-5:30 PM
Using genetic similarities among ageing-related diseases to understand and intervene ageing
  • Handan Melike Donertas, EMBL-EBI, United Kingdom
  • Matias Fuentealba Valenzuela, Institute of Healthy Aging (UCL), United Kingdom
  • Linda Partridge, 1) Institute of Healthy Aging (UCL) 2) Max Planck Institute for Biology of Aging, United Kingdom
  • Janet Thornton, EMBL-EBI, United Kingdom

Presentation Overview: Show

Ageing is the major risk factor for many diseases. With the rise in life expectancy, overall burden of ageing-related diseases increases. The molecular link between ageing and age-related diseases, however, has not been explored in a systematic manner. In this study, we test whether diseases with similar age-of-onset share a genetic component that is also implicated in ageing. We perform GWAS on UK Biobank data, which includes genomic, medical and lifestyle measures for almost 500k participants. Our preliminary analysis comparing more than 100 diseases based on their age of onset profiles suggest late-life diseases do share a genetic component that is not prevalent in other diseases. Moreover, these results cannot be explained only by disease categories (e.g. cardiovascular, endocrine) or comorbidities. In order to explore the link between ageing and these diseases, we are now combining our results with publicly available datasets for ageing such as age-series gene expression profiles and lifespan assays using model organisms. Identifying a shared ageing-related mechanism among multiple diseases offer an opportunity to target or even prevent multiple pathologies with a limited number of drugs and decrease the effect of polypharmacy on elderly while retaining the benefits.

5:30 PM-5:40 PM
Clustering multivariate longitudinal clinical patient data using variational deep embedding with recurrence
  • Johann de Jong, UCB Biosciences, Germany
  • Holger Froehlich, University of Bonn / UCB Biosciences, Germany

Presentation Overview: Show

In the literature, the problem of clustering multivariate short time series is still largely unaddressed. However, multivariate short time series are common in clinical data, when multivariate patient measurements are taken over time. The clustering (stratification) of such clinical data is additionally complicated by the typically high degree of missingness.
For this purpose, we developed variational deep embedding with recurrence (VaDER). VaDER extends variational deep embedding (VaDE), a clustering algorithm built on variational autoencoder principles. VaDER enables the analysis of multivariate short time series with many missing values, by (1) incorporating long short term memory networks (LSTMs) into VaDE's architecture, and (2) defining an architecture and loss function that directly deal with missing values by implicit imputation and loss re-weighting.
We technically validated VaDER by accurately recovering clusters from noisy simulated data with known ground truth clustering. We then used VaDER to successfully stratify (1) Alzheimer's disease patients and (2) Parkinson's disease patients into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected significant underlying biological differences.
We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate short time series clustering in general.

5:40 PM-5:50 PM
From question to publication in five days - how OHDSI is changing medical evidence generation through open science
  • Kees van Bochove, The Hyve, Netherlands

Presentation Overview: Show

Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at the application of open common data and evidence models such as OMOP, FHIR, GA4GH, RADAR-BASE etc., and in particular zoom in on the OMOP CDM and the open source OHDSI ATLAS tooling, as used in a.o. the IMI EHDEN project.

Please check the webinar video as an alternative reference for how the OMOP CDM and the ATLAS open source tooling have a real impact on biomedical open science.

5:50 PM-6:00 PM
CDx / NGS & Regulation: 5 perspectives from the Pistoia Alliance
  • Dominic Clark

Presentation Overview: Show

Companion Diagnostics (CDx) are essential to the practice of Precision Medicine. Next Generation Sequencing is an increasingly important tool in the development of Companion Diagnostics (CDx). However, for a CDx to be deployed, many different biopharma industry sectors need to collaborate. This paper outlines some of the challenges and opportunities perceived by the bio-pharmaceutical industry, the Europe Molecular Quality Network, a regulatory agency, a notified body and a CDx service provider.