Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: A Quantum Machine Learning-Based Framework for Early Cancer Detection Through Transcriptome Profiles
COSI: TransMed
  • Rushank Goyal, All India Institute of Medical Sciences, India


Presentation Overview: Show

Cancer is a broad term for diseases characterized by uncontrollable and abnormal cell growth. In this research, a novel three-step framework based on quantum machine learning is developed using transcriptome data to identify key cancer biomarkers and combine them to create mathematical expressions that can predict the presence of cancer with high accuracy using the expression levels of five or fewer genes. Instead of relying on traditional black-box machine learning, my framework utilized a recently-developed technology called the quantum lattice to produce transparent and explainable models. For each dataset, after initial filtering through XGBoost and statistical significance testing to identify differentially expressed genes, the quantum lattice was trained for 10 epochs using the Akaike Information Criterion as its loss function. The framework was trained and tested on ten datasets. Median accuracies, sensitivities and specificities of 91%, 92.5% and 87.5% respectively were obtained. Overall, the models show greater accuracies than previous research while also using far fewer genes for predictions. In all, 38 biomarkers were identified, with 17 novel results, including 4 lncRNAs. The results obtained can be applied in practical settings for efficient early cancer detection and can provide insights into associations between certain genes and types of cancer.

Virtual: AI Based Clustering and Characterization of Parkinson’s Disease Trajectories
COSI: TransMed
  • Colin Birkenbihl, Fraunhofer SCAI, Germany
  • Ashar Ahmad, UCB Pharma, Belgium
  • Nathalie J Massat, UCB Pharma, Belgium
  • Tamara Raschka, Fraunhofer SCAI, Germany
  • Andreja Avbersek, Regeneron Inc., United States
  • Patrick Downey, UCB Pharma, Belgium
  • Martin Armstrong, UCB Pharma, Belgium
  • Holger Froehlich, Fraunhofer SCAI, Germany


Presentation Overview: Show

Parkinson’s disease (PD) causes a wide variety of symptoms. These, as well as the general progression rate of the disease, are highly variable across patients. This can impede the design of disease modifying trials. Here, a stratification of PD patients into more homogeneous progression subtypes may help to solve that problem, provide a better disease understanding, and higher success rates of clinical trials.
Therefore, we use the previously published AI algorithm VaDER to cluster multivariate longitudinal progression data of de novo PD patients from the PPMI study. A combination of six different clinical outcome scores is used to identify progression subtypes. Three subtypes, a ‘fast’, ‘moderate’ and ‘slow’ progressing one, could be identified. Further analysis showed that patients of the individual clusters differ in dopaminergic cell loss and their symptomatic treatment response, while no differences could be observed for the start of symptomatic treatment. Furthermore, significant associations between the different progression profiles of the patients with e.g. clinical measurements at study baseline could be identified.
Overall, the results contribute to a better disease understanding, and give the possibility to stratify patients into sub-groups based on their potential progression and the opportunity to design better clinical trials and novel drug targets.

Virtual: An alternative architecture for Data Integration in Translational Research
COSI: TransMed
  • Soumyabrata Ghosh, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
  • Kavita Rege, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
  • Xinhui Wang, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
  • Irina-Afrodita Balaur, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
  • Rajesh Rawal, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg
  • Venkata P. Satagopam, Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg


Presentation Overview: Show

The translational research involves integration and interpretation of a large spectrum of data-types from individual-centric clinical data to sample-centric omics data, which has been expanding quickly to incorporate newer technologies and experimental methods in order to address evolving health challenges. In general, the biomedical data are stored in heterogeneous data silos with inconsistent identifiers, complex architecture, and various query protocols. This data segregation increases the effort of both data discovery and analysis considerably. To address this challenge, we are proposing an alternative architecture for data integration based on the MonetDB column-centric database and existing data engineering components. The proposed architecture is designed to be extendable, data-type agnostic, and interoperable yet decoupled from the data-science or bioinformatics pipelines. The architecture is SQL-compliant, extendable to support big data files, and can be integrated with other database systems. The downstream analysis pipeline will access the data by calling an Application Programming Interface (API) of the database. The proposed architecture is promising at the Proof-of-Concept (PoC) implementation supporting upto 100,000 columns of translational data. Here, we will present our PoC architecture and implementation and discuss both advantages and limitations of the current approach.

Virtual: An interpretable multimodal machine learning framework effectively predicts status of distal sensorimotor polyneuropathy and reveals important pathological molecular signatures
COSI: TransMed
  • Annette Peters, Institute of Epidemiology, Helmholtz Center Munich, Germany
  • Michael P Menden, Institute of Computational Biology, Helmholtz Center Munich, Germany
  • Christian Herder, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
  • Dan Ziegler, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
  • Harald Grallert, Institute of Epidemiology, Helmholtz Center Munich, Germany
  • Michael Roden, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
  • Wolfgang Rathmann, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf,, Germany
  • Gidon Boenhof, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
  • Karsten Suhre, Department of Physiology and Biophysics, Weill Cornell Medicine - Qatar, Qatar
  • Phong Bh Nguyen, Institute of Computational Biology, Helmholtz Center Munich, Germany
  • Christian Gieger, Research Unit of Molecular Epidemiology, Helmholtz Center Munich, Germany
  • Melanie Waldenberger, Research Unit of Molecular Epidemiology, Helmholtz Center Munich, Germany
  • Gabi Kastenmueller, Institute of Computational Biology, Helmholtz Center Munich, Germany
  • Jerzy Adamski, Institute of Experimental Genetics, Helmholtz Center Munich, Germany
  • Barbara Thorand, Institute of Epidemiology, Helmholtz Center Munich, Germany
  • Holger Prokisch, Institute of Neurogenomics, Helmholtz Center Munich, Germany
  • Haifa Maalmi, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Germany
  • Daniel Garger, Institute of Computational Biology, Helmholtz Center Munich, Germany


Presentation Overview: Show

Complex metabolic diseases are usually influenced by many intrinsic and extrinsic factors, which requires effective multi-omics integration strategies to predict and study the disease pathology. Nevertheless, these methods often have to deal with multimodal data with high dimensionality and large amount of noise and accumulative low-effect signals. Here we present an interpretable machine learning framework to integrate multimodal data for prediction and biomarker detection of diseases. This framework selects biologically relevant features through view-specific gene set enrichment analysis and then integrates all modalities with a robust forward feature selection to predict disease phenotypes and prioritise important biomarkers. We implemented this method to effectively predict the presence and incidence of distal sensorimotor polyneuropathy (DSPN), a common complication of diabetes, in the KORA F4/FF4 study and further revealed important pathological signatures of incident DSPN. We believe that our method shows strong utility in precision medicine as a diagnostic, prognostic and biomarker detection tool for complex systemic diseases.

Virtual: Artificial intelligence-driven clustering, characterization, and prediction of Alzheimer’s disease progression
COSI: TransMed
  • Colin Birkenbihl, Fraunhofer Institute for Scientific Computing and Algorithmcs SCAI, Germany
  • Johann De Jong, UCB Pharma, Germany
  • Ilya Yalchik, Fraunhofer Institute for Scientific Computing and Algorithmcs SCAI, Germany
  • Daniel Domingo-Fernandez, Fraunhofer Institute for Scientific Computing and Algorithmcs SCAI, Germany
  • Holger Froehlich, Fraunhofer Institute for Scientific Computing and Algorithmcs SCAI, Germany


Presentation Overview: Show

Alzheimer’s disease (AD) is a progressive disease that manifests across multiple symptoms impairing cognition and activities of daily living. Herein, AD patients show great heterogeneity in their progression pattern both with respect to progression speed and severity of symptoms. In this work, we used a deep learning method to cluster the multivariate disease trajectories of 322 AD patients from the ADNI study along cognitive and functional scores. We identified three distinct clusters that separate AD patients into ‘slow’, ‘intermediate’, and ‘fast’ progression individuals. Using a machine learning model, we were able to predict the progression cluster of a patient from cross-sectional data at study baseline with an average area under the receiver operator curve (AUC) of 0.71. Our model performed especially well when identifying slow progressing individuals (0.78 AUC). To elucidate what biological mechanisms could drive the differences in disease progression patterns, we mapped patients' genetic data to biological pathways and thereby identified genes from those pathways that could represent potential target candidates for pharmaceutical interventions. We further repeated the clustering for 2257 patients from the NACC database and found similar patterns of brain atrophy and performance in clinical assessments across ADNI and NACC for their respectively equivalent progression clusters.

Virtual: Computational deconvolution of transcriptomics data reveals immune cell landscape of inflammatory infiltrates in giant cell arteritis
COSI: TransMed
  • Michal Zulcinski, University of Leeds, United Kingdom
  • Gary Reynolds, Newcastle University, United Kingdom
  • Lubna Shafi, University of Leeds, United Kingdom
  • Arundhati Chakrabarty, University of Leeds, United Kingdom
  • Mark M. Iles, University of Leeds, United Kingdom
  • Ann W. Morgan, University of Leeds, United Kingdom


Presentation Overview: Show

The molecular mechanisms underlying inflammatory infiltrates in giant cell arteritis (GCA), the most common form of vasculitis in people over 50 years old, remain largely unexplained. Better understanding of the molecular events and cellular players driving different inflammatory phenotypes is needed to improve molecular stratification of GCA management and to discover alternative treatment options, that are currently limited to the use of high doses of glucocorticoids with a substantial risk of side effects in the majority of patients. Our study integrates bulk and single-cell transcriptomes generated from temporal artery biopsies of GCA positive patients, along with various clinical and histological variables, to infer sample-specific cell-type proportions, and to reveal cell-type-specific expression profiles associated with distinct histological patterns of arterial inflammation. Statistical testing for clinical and histological features was performed using the non-parametric Mann-Whitney-Wilcoxon test to avoid making parametric assumptions and the False Discovery Rate was used to account for multiple testing. The findings reveal a previously unreported landscape of cell population abundance levels in GCA biopsies and provide novel insights into cell-type-specific expression profiles of both, transcripts already known to be involved in GCA pathogenesis, as well as novel molecular signatures that might have potential for therapeutic targeting.

Virtual: Cross-Tissue Drug Signature Predictions for Drug Repurposing
COSI: TransMed
  • Panagiotis Chrysinas, State University of New York at Buffalo, United States
  • Rudiyanto Gunawan, State University of New York at Buffalo, United States


Presentation Overview: Show

Drug repurposing represents an economically attractive avenue for finding treatments for diseases, and for rare diseases, it might be the only avenue. Data-driven approaches can help in narrowing down the list of potential therapeutic compounds for repurposing, enabled by the generation of large datasets of drug response such as the Connectivity Map (CMap) that contains 1.5 million molecular signatures for 71 different cell lines and 20,000 compounds. However, the data are generated predominantly using immortalized cancer cell lines, and thus, the signatures maybe cancer specific. In this work we developed strategies for mapping transcriptome signatures from reference cell lines to a target cell line of interest without requiring any drug signature data from the target cell line. The strategies involve the combination of predictor and corrector algorithms. Briefly, the predictor uses a simple averaging to produce the uncorrected drug signatures for the target cell line from the reference cell lines, while the corrector maps the initial signatures to the gene expression space of the target cell line using either Principal Component Analysis or an Autoencoder. We applied our method to the CMap dataset, and demonstrated its superiority over other imputation methods.

Virtual: Decoding tumour microenvironment heterogeneity using graph convolutional networks and multiplexed imaging
COSI: TransMed
  • Muhammed Khawatmi, University of Oxford, United Kingdom
  • Enric Dominigo, University of Oxford, United Kingdom
  • Fiona Ginty, University of Oxford, United Kingdom
  • Heba Sailem, University of Oxford, United Kingdom


Presentation Overview: Show

Determining the contribution of the tumour microenvironment (TME) to tumour progression and resistance has proven a complex challenge due to its heterogeneity. Multiplexed imaging provides an unprecedented opportunity for studying the interaction between cancer cells and the TME. We utilised a multiplexed tissue imaging dataset of 746 colorectal tumours from different stages. Each tumour section was stained with 60 markers simultaneously to visualise immune and stromal cells as well as key cancer signalling pathways. By implementing image analysis and segmentation approaches, around 3000 cells were quantified per tumour resulting in data of ~3 million single cells. We performed compartmentalised image analysis to determine signalling activities in cancer, stromal, and immune cells. We developed a graph convolutional network (GCNs) and visualisation approach to determine cellular subpopulations associated with patient survival. Using our approach, we found that signalling of mTOR pathway can have heterogeneous activation patterns in different TME compartments which correlate with different patient outcomes. We further validated our observations using transcriptional data from TCGA. Our findings can have a significant impact on the design of mTOR-based therapies and future clinical trials. This demonstrates the utility of GCNs in determining clinically relevant signatures and biomarkers from heterogeneous single cell imaging data.

Virtual: Deep learning-based prognosis prediction among preeclamptic pregnancies using electronic health record data
COSI: TransMed
  • Xiaotong Yang, University of Michigan, United States
  • Hailey Ballard, University of Florida, United States
  • Aditya Mahadevan, University of Florida, United States
  • Ke Xu, University of Florida, United States
  • David Garmire, University of Michigan, United States
  • Elizabeth Langen, University of Michigan, United States
  • Dominick Lemas, University of Florida, United States
  • Lana Garmire, University of Michigan, United States


Presentation Overview: Show

Background
Preeclampsia (PE) is one of the leading factors in maternal and perinatal mortality and morbidity worldwide. Optimizing the timing of delivery among patients with PE is essential to minimize the risk of severe maternal and neonatal morbidities.
Methods
In this study, we constructed a series of deep learning-based models to predict the prognosis, or the time to delivery using electronic health record (EHR) data. We identified 1533 preeclamptic pregnancies delivered at a single academic medical center from 2015-2021. Using the Cox-nnet v2 algorithm, we built a baseline model with baseline features, a full model including additional lab testing results, and vital signs at the time of diagnosis, and an early-onset preeclampsia (EOPE) sub-model. We validated the models with 1177 preeclamptic pregnancies from a second academic institution.
Results
The baseline model, the full model and the EOPE sub-model achieved average C-indices of 0.73, 0.78 and 0.73 on the testing sets respectively, all higher than the Cox-PH counterparts.
Conclusion
The time to delivery predicting models provide clinicians with valuable tools to quantify the risk of preterm delivery. Implementation of these actionable models into clinical care may be able to significantly improve the management of patients with PE.

Virtual: Detection of germline exon copy number variants in targeted NGS panels for hereditary endocrine tumors and Lynch syndrome
COSI: TransMed
  • Aleksandra Pfeifer, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Marta Cieślicka, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Agnieszka Pawlaczek, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Dorota Kula, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Jadwiga Żebracka-Gala, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Małgorzata Kowalska, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Anna Fiszer-Kierzkowska, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Artur Zajkowicz, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Tomasz Tyszkiewicz, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Patrycja Tudrej, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland
  • Małgorzata Oczko-Wojciechowska, Department of Clinical and Molecular Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice, Poland


Presentation Overview: Show

The aim of the study was to detect germline exon copy number variants (CNVs) in targeted NGS panels, for Lynch syndrome and hereditary endocrine tumors.

NGS libraries were prepared using targeted enrichment methods, and sequenced on Illumina MiniSeq platform. 177 samples were analyzed. Nine of them were positive for CNVs, according to former MLPA analysis. The bioinformatics analysis was performed using ExomeDepth. Additionally, an in-house base-resolution copy number visualization tool based on Schenkel LC et al. was used.

We detected 88 CNVs among all the samples. In positive samples, we successfully detected 5 deletions encompassing minimum 1 exon. But, we did not detect partial exon deletion present in three positive samples. 57 (65%) CNVs were present in PMS2 gene, which contains region homologous to PMS2CL gene, which may cause false positive findings. We also detected additional highly significant deletions of whole MEN1 gene in two samples, which were not formerly analyzed with MLPA. This deletion is known to cause multiple endocrine neoplasia type 1 (MEN1) syndrome.

Our in-house tool helped to visually inspect detected deletions at base-level resolution. Further analyses are needed to confirm detected CNVs with MLPA, and to apply the method for more samples.

Virtual: Discovery of Inhibitors for Serine-Glycine One-Carbon Pathway Enzyme SHMT2 by Deep Learning and Transfer Learning
COSI: TransMed
  • Alperen Dalkiran, Middle East Technical University, Turkey
  • Robert Hamanaka, University of Chicago, United States
  • Maria Martin, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Gokhan Mutlu, University of Chicago, United States
  • Volkan Atalay, Middle East Technical University, Turkey
  • Rengul Cetin-Atalay, University of Chicago, United States


Presentation Overview: Show

Targeting the key enzyme Serine hydroxymethyltransferase 2 (SHMT2) in the serine-glycine one-carbon pathway is an important approach with significant therapeutic potential for IPF and other fibrotic diseases. SHMT2 is required for the production of collagen, the overproduction of which leads to loss of lung function in IPF. Our objective is to find drug candidate compounds for SHMT2 by virtual screening that uses deep neural networks and transfer learning. There is not any study on drug-target interaction prediction and virtual screening for SHMT2 due to the limited number of active compounds against this enzyme. On the other hand, deep learning-based virtual screening methods require a large number of training data. Therefore, we explore transfer learning along with deep learning (Lee et al., 2019; Tan et al., 2018; Yosinski et al., 2014).

Metabolic reprogramming at the cellular level is a mechanism that is involved in the pathogenesis of various diseases associated with abnormal cell differentiation and proliferation. Cells rewire metabolic pathways to fulfill bioenergetic needs and to produce biomolecules necessary for disease-causing bioactivities. Metabolic reprogramming has been recognized as one of the hallmarks of cancer for more than a decade. Recently other chronic pathological conditions associated with differentiation, proliferation, and excessive biomolecule production have been linked to the altered metabolic activities of cells. Interstitial lung diseases (ILDs) are characterized by chronic pulmonary inflammation and fibrosis. The most common type of ILD is idiopathic pulmonary fibrosis (IPF) (Federer & Martinez, 2018). IPF is a lethal malignant condition with a mean survival of 3.8 years. Like cancer, metabolic reprogramming is a mechanism involved in IPF pathology. Fibrosis is characterized by the overgrowth, hardening, and scarring of various tissues due to excessive remodeling and accumulation of the ECM (extracellular matrix) proteins, particularly collagen fibers (Hamanaka & Mutlu, 2021). Collagens are highly enriched in glycine residues and the serine-glycine-one-carbon pathway produces the glycine required for collagen synthesis. Hence targeting the critical enzyme SHMT2 of the serine-glycine-one-carbon pathway is an important approach with significant outcomes for IPF therapeutics.
Since there is a limited number of active compounds against these enzymes, we apply transfer learning to the deep neural network-based drug-target interaction prediction method. There are three modes of transfer learning as indicated below. In all of the modes, a source deep neural network is initially trained (pre-trained) on a source task with plenty of instances. Once a pre-trained neural network (source network) is obtained, one of the following modes is applied for the target task in order to be able to obtain the final target network. These modes are for (re)training but with a target task dataset of limited size. A whole network can be divided into three parts of layers: lower layers where low-level features are extracted, upper layers where the high-level features are formed from low-level features, and the final classifier layers where the prediction is performed based on the high-level features.
Mode 1 Use of the pre-trained source model as is (full fine-tuning) In full fine-tuning, the whole trained source deep neural network is further re-trained for a few number epochs, that is fine-tuned with the target task dataset of limited size. The source network provides an initial configuration for the target network (instead of starting from a random configuration). There are no freezing layers in this mode and fine-tuning is applied to all of the weights.
Mode 2 Fine-tuning with freezing layers In Mode 2, the bottom layers of the source network are frozen; that is, these layers are not updated during training (backpropagation) of the target network. Frozen layers correspond to the layers where lower-level features are extracted. In this case, upper layers where high-level features (properties) are extracted and the final classifier layers are re-trained with samples from the target task dataset. An alternative is to freeze all of the feature extraction layers (lower layers and upper layers of the network) and to re-train only the final classifier layers.
The following steps are pursued.
1) Extract the compound bio-activity data of SHMT2 from ChEMBL
2) Construct prediction models FNN using transfer learning
a) source model: FNN model trained for proteins (enzymes) from EC 2.1 family
b) target models: re-trained source model for SHMT2 with three modes
3) Virtual screening: find compounds similar to active compounds of the target protein and screen them via trained target FNN models (obtained in Step 2).
We use a ligand-based binary classifier as the drug-target interaction prediction method. The binary classifier is a feed-forward neural network (FNN) with 2 hidden layers in addition to an input layer of 1024 neurons and an output layer of 1 neuron. After hyper-parameter selection, the numbers of neurons in the first and second hidden layers are 4096 and 1024, respectively. SHMT2 is an enzyme with EC:2.1.2.1. We selected the EC 2.1 family to form the source dataset. Extraction of bioactivity data from ChEMBL for the source enzymes in the EC.2.1 family resulted in 732 active and 1870 inactive data with the threshold of pChEML value of 7.0. 5-fold cross-validation is employed with the generated source datasets. In 5-fold cross-validation, ⅙ of the source dataset is used for testing, and the remaining ⅚ are used for training and validation. The target dataset is also extracted from ChEMBL for the target enzyme SHMT2 with the threshold of pChEML value of 6.0 and it consists of 6 active and 179 inactive bioactivity data. For the target network model, the performance is measured with the “leave-one-out” method since the number of bioactive data is very low. We used Matthew’s Correlation Coefficient (MCC) as the evaluation metric. The average MCC result for the source network model is 0.713 while it is 0.325 and 0.267 for the target network models of Mode 2 and Mode 1, respectively.
Once we have obtained ligand-based binary classifiers as the drug-target interaction prediction models, we can screen compounds against SHMT2. All compounds in DrugBank are screened with the trained target models and the consensus of two models (corresponding to two modes) is taken into consideration. Finally, 4 approved drugs and 8 experimental drugs come out to be candidates.

We will explore these candidates with our in silico tools, iBioProVis (Donmez et al., 2020) and CROssBAR (Dogan et al., 2021), and then perform in vitro experiments.

References
- Doğan T., et al., 2021. “CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations”, Nucleic Acids Research, 49:e96.
- Donmez, A., et al., 2020. “iBioProVis: interactive visualization and analysis of compound bioactivity space”, Bioinformatics, 36, 4227–4230.
- Federer, D.J. & Martinez, F.J. 2018. “Idiopathic pulmonary fibrosis”, N Engl J Med, 378, 1811–1823.
- Hamanaka, R.B., Mutlu, G.M. 2021. “The role of metabolic reprogramming and de novo amino acid synthesis in collagen protein production by myofibroblasts: implications for organ fibrosis and cancer”, Amino Acids, 53(12):1851-1862.
- Lee, M., et al., 2019. ” Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery”, J. Cheminform., 11, 46.
- Tan, C., et al, 2018. “A survey on deep transfer learning”, ICANN 2018. Lecture Notes in Computer Science, vol 11141. Springer, Cham.
- Yosinski, J., et al., 2014. “How transferable are features in deep neural networks?”, NIPS'14:2, 3320–3328.

Virtual: Epigenetic alterations in T-cell prolymphocytic leukemia
COSI: TransMed
  • Huihuang Yan, Mayo Clinic, United States
  • Shulan Tian, Mayo Clinic, United States
  • Henan Zhang, Mayo Clinic, United States
  • Wei Ding, Mayo Clinic, United States


Presentation Overview: Show

T-cell prolymphocytic leukemia (T-PLL) is a rare disease representing ~2% of the mature lymphocytic leukemias in adults. It has a rapid clinical course and responds poorly to chemotherapy and immunotherapy. Whole-genome and whole-exome sequencing have identified prevalence of structural variants, most notably inv(14)(q11q32), t(14;14)(q11;q32) and t(X;14)(q28;q11). Recurrent somatic mutations were also identified in genes encoding chromatin regulators and those associated with the JAK-STAT signaling pathway. Nevertheless, the extent of global epigenetic changes has not been investigated. We hypothesize that epigenetic alterations may play a key role in T-PLL pathogenesis. To systematically test this hypothesis, we mapped gene regulatory regions in T-PLL patients and healthy individuals using chromatin immunoprecipitation and sequencing (ChIP-seq) for H3K4me1, H3K4me3, and H3K27ac. The data were analyzed using the HiChIP bioinformatics pipeline. We identified a major loss of active enhancers in T-PLL, which was coupled with the down-regulation of nearby genes. These genes are enriched in the immune system, adaptive immune response and interferon gamma signaling pathway. Most importantly, we revealed a gain of super-enhancers targeting oncogenes such as TCL1A and MYC, which are known to play key roles in T-PLL. Together, our analysis highlights the roles of epigenetic alterations in T-PLL pathogenesis.

Virtual: Integrating multiple transcriptome-based methods to repurpose drugs for infectious diseases
COSI: TransMed
  • Kewalin Samart, Michigan State University, United States
  • Amy Tonielli, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States
  • Janani Ravi, Michigan State University, United States


Presentation Overview: Show

Transcriptome-based drug repositioning works based on finding drugs that successfully reverse the gene expression pattern in the disease (i.e., showing negative disease-drug ‘connectivity’). Over the past decade, many methods have been developed to improve the accuracy and robustness of quantifying disease-drug reversal relationships (DOI:10.1093/bib/bbab161). While they have been applied to complex diseases such as cancer, they are not widely used for infectious diseases (InfD), a leading cause of fatalities. Further, the recent rise of antibiotic resistance warrants new avenues for treatment. To address this critical need, we have developed a computational workflow to integrate connectivity-based methods to repurpose FDA-approved drugs for InfD. We construct InfD expression profiles from public datasets, and drug candidates are prioritized by integrating multiple connectivity metrics to reverse these disease signatures. Our approach includes methods to address heterogeneous datasets: experimental platforms, conditions, infection stages, and cell/tissue types using (i) individual and aggregated disease signatures, (ii) baseline comparisons to identify appropriate cell lines, and iii) gene/pathway-level comparisons. We are currently applying these methods to Mycobacterium tuberculosis infection data to identify (i) top-rated drug candidates/families and (ii) genes/pathways underlying drug-InfD pairs for experimental validation.

Virtual: Meta-analysis of preclinical pharmacogenomics studies discovers gene expression biomarkers predictive of drug response in the clinic
COSI: TransMed
  • Petr Smirnov, University of Toronto, Canada
  • Nikta Feizi, Princess Margaret Cancer Centre, University Health Network, Canada
  • Sisira Kadambat Nair, Princess Margaret Cancer Centre, University Health Network, Canada
  • Gangesh Beri, Princess Margaret Cancer Centre, University Health Network, Canada
  • Prasanna K Jagannathan, Princess Margaret Cancer Centre, University Health Network, Canada
  • Trevor J Pugh, Princess Margaret Cancer Centre, University Health Network, Canada
  • Benjamin Haibe-Kains, Princess Margaret Cancer Centre, University Health Network, Canada


Presentation Overview: Show

The availability of datasets combining drug screening and molecular profiling of cancer cell lines enables the discovery of predictive biomarkers of drug response. Previous studies have raised concerns about inconsistent drug sensitivity measurements between studies. To overcome these concerns, we apply statistical meta-analysis models to discard study-specific technical noise while preserving biologically relevant signals. Integrating data from 7 pharmacogenomic studies, we evaluated gene expression biomarkers across 11 tissue types and 70 compounds. From these data, we find 4338 significant associations, across 8 tissues and 34 different drugs. Approximately 50% of biomarkers identified in individual single studies were not significant in meta-analysis, highlighting the need for meta-analysis even in a preclinical context. To show clinical impact of these putative biomarkers, we focus on neoadjuvant Paclitaxel treatment in Breast Cancer, where large clinical datasets are available. Of 3 markers found in vitro for this indication, we show that one, expression of the ODC1 gene, achieves an AUROC of 0.73 and 0.71 for predicting response in two independent clinical breast cancer cohorts - a result competitive with multivariate models. Finally, we detail an integration with cBioPortal to bring our candidate biomarkers at the fingertips of the wider cancer research community.

Virtual: MLGL-MP: A Multi-Label Graph Learning Framework Enhanced by Pathway Interdependence for Metabolic Pathway Prediction
COSI: TransMed
  • Bing-Xue Du, School of Life Sciences, Northwestern Polytechnical University, China
  • Peng-Cheng Zhao, School of Life Sciences, Northwestern Polytechnical University, China
  • Bei Zhu, School of Life Sciences, Northwestern Polytechnical University, China
  • Siu-Ming Yiu, Department of Computer Science, The University of Hong Kong, Hong Kong, China
  • Arnold K Nyamabo, School of Computer Science, Northwestern Polytechnical University, China
  • Hui Yu, School of Computer Science, Northwestern Polytechnical University, China
  • Jian-Yu Shi, School of Life Sciences, Northwestern Polytechnical University, China


Presentation Overview: Show

Motivation: During lead compound optimization, it is crucial to identify pathways where a drug-like compound is metabolized. Recently, machine learning-based methods have achieved inspiring progress to predict potential metabolic pathways for drug-like compounds. However, they neglect the knowledge that metabolic pathways are dependent on each other. Moreover, they are inadequate to elucidate why compounds participate in specific pathways.
Results: To address these issues, we propose a novel multi-label graph learning framework of metabolic pathway prediction boosted by pathway inter-dependence, called MLGL-MP, which contains a compound encoder, a pathway encoder, and a multi-label predictor. The compound encoder learns compound embedding representations by graph neural networks (GNNs). After constructing a pathway dependence graph by re-trained word embeddings and pathway co-occurrences, the pathway encoder learns pathway embeddings by graph convolutional networks (GCNs). Moreover, after adapting the compound embedding space into the pathway embedding space, the multi-label predictor measures the proximity of two spaces to discriminate which pathways a compound participates in. The comparison with state-of-the-art methods on KEGG pathways demonstrates the superiority of our MLGL-MP. Also, the ablation studies reveal how its three components contribute to the model, including the pathway dependence, the adapter between compound embeddings and pathway embeddings, as well as the pre-training strategy. Furthermore, a case study illustrates the interpretability of MLGL-MP by indicating crucial substructures in a compound, which are significantly associated with the attending metabolic pathways. It’s anticipated that this work can boost metabolic pathway predictions in drug discovery.

Virtual: Patient-specific and disease-related determinants for cardiovascular disease risk stratification in the APPLE (Atherosclerosis Prevention in Paediatric Lupus Erythematosus) clinical trial cohort
COSI: TransMed
  • Junjie Peng, University College London, United Kingdom
  • George Robinson, University College London, United Kingdom
  • Stacy Ardoin, Nationwide Children's Hospital, United States
  • Laura Schanberg, Duke University School of Medicine, United States
  • Elizabeth Jury, University College London, United Kingdom
  • Coziana Ciurtin, University College London Hospital, United Kingdom


Presentation Overview: Show

Background
The risk of developing cardiovascular disease (CVD) through atherosclerosis in juvenile-onset systemic lupus erythematosus (JSLE) patients is significantly increased. This study aimed to stratify and characterize JSLE patients at elevated CVD-risk using patient/disease-related factors and metabolomic data from patients recruited to the APPLE clinical trial (designed to assess atherosclerosis development).
Methods
Unsupervised hierarchical clustering was performed to stratify patients by arterial intima-media thickness (IMT) measurements at baseline (N=151) and carotid (c)IMT progression over 36 months (placebo arm only, N=60). Baseline metabolomic profiles (~250 serum metabolites) were compared between clusters using conventional statistics, logistic regression, sparse PLS-DA and random forest classifier.
Results
Baseline IMT stratification identified 3 clusters with high, intermediate, and low baseline IMT measurements and progression trajectories over 36 months, each having distinct racial/BMI/household education/income characteristics. Analysis of cIMT progression over 36 months identified 2 patient groups (high and low progression). Unique metabolomic profiles differentiated high and low cIMT progression groups with good discriminatory ability (0.81 AUC in ROC analysis for the top 6 metabolites). cIMT progression in the placebo group correlated positively with disease activity, damage, serum complement and BMI.
Conclusion
Complex analysis of IMT patterns and progression identified novel determinants that could guide CVD-risk stratification research in JSLE.

Virtual: PIVOT: a machine learning approach to identify personalised driver genes using multi-omic data
COSI: TransMed
  • Malvika Sudhakar, Indian Institute of Technology Madras, India
  • Raghunathan Rengaswamy, Indian Institute of Technology Madras, India
  • Karthik Raman, Indian Institute of Technology Madras, India


Presentation Overview: Show

The field of personalised medicine is expanding, which ushers the need for personalised driver gene identification. While cohort-based tools identify frequently mutated driver genes, they fail to identify rare genes. Currently available personalised driver identification tools use unsupervised methods to map omic data onto a network to identify driver genes. Our method, PIVOT, is the first supervised machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the driver gene in the patient. In this study, we develop models on different omic data by training on validated driver events. Given the lack of any gold standard, we identify the best among four data labelling strategies based on classification metrics. Our models trained on multi-omic data performed best, achieving an accuracy of ≥0.99 for BRCA, LUAD and COAD datasets. Our predictions reveal commonly altered genes and new driver genes such as PRKCA and PSMD4 in multiple individuals. We also label rare driver events occurring in as few as one sample. Interestingly, we identify genes such as JAK1 with dual roles within the same cancer type. Overall, PIVOT labels personalised and rare driver genes as TSGs and OGs.

Virtual: Random survival forest for stratification of Crohn’s disease patients by stricturing endotype
COSI: TransMed
  • Imogen Stafford, University of Southampton, United Kingdom
  • Guo Cheng, University of Southampton, United Kingdom
  • Enrico Mossotto, University of Southampton, United Kingdom
  • Sarah Ennis, University of Southampton, United Kingdom


Presentation Overview: Show

Crohn’s disease (CD) is a subtype of the chronic condition inflammatory bowel disease (IBD), with genetics contributing to its aetiology. Some patients develop a stricturing, or narrowing, of the gastrointestinal tract. This complication can occur at any point in the disease course. We aimed to stratify CD patients by stricturing endotype risk using machine learning and survival analysis. IBD patients were whole exome sequenced, and their data joint-called. The per-gene, per-individual pathogenicity score GenePy was generated. Two feature selection methods were used: principle component analysis (PCA) and gene ranking by cox proportional hazard model (CPHM) C-index, and applied to gene panels curated for IBD. Bayesian optimisation tuned random survival forest (RSF) hyperparameters. Models were assessed by the test set (25% of dataset) C-index. 363 CD patients were included in modelling. Best results were obtained using the Kegg NOD-signalling gene panel. Using CPHM ranking the RSF model C-index was 0.51 on the test set, and with PCA the C-index was 0.58. Better performance using PCA was likely due to the sparsity of the input data. This combined with a relatively small sample size made stratification challenging. Larger ‘omics datasets alongside deep phenotyping data are imperative for continued progress towards personalised medicine.

Virtual: SLIDE-VIP - a novel computational framework predicts synthetic lethal interactions between key regulators of DNA Damage Response and chromatin modifiers
COSI: TransMed
  • Magda Markowska, University of Warsaw, Medical University of Warsaw, Poland
  • Magdalena Budzińska-Zaniewska, University of Warsaw, Ardigen S.A., Poland
  • Anna Anna Coenen-Stass, Merck KGaA, Germany
  • Ewa Kizling, University of Warsaw, Poland
  • Krzysztof Kolmus, Ardigen S.A., Poland
  • Krzysztof Koras, University of Warsaw, Poland
  • Eike Staub, eike.staub@merckgroup.com, Germany
  • Ewa Szczurek, University of Warsaw, Poland


Presentation Overview: Show

Discovering synthetic lethal (SL) gene pairs is an important step in developing new targeted cancer therapies: existing cancer-related inactivation of a gene from SL pair renders tumor cells to be susceptible to drug targeting its SL partner, leaving the rest viable. Single statistical tests for SL rarely return reliable results because of imperfections and noise in biological data. We devised SLIDE (Synthetic LethalIty Integrated Discovery Engine - Verified In Patients) framework for SL interaction discovery, combining eight different statistical tests, leveraging multi-omics data from four different sources: gene knock-out cell line screens (DepMap.org), cancer patient data (TCGA), drug screens (PRISM and GDSC) and genomic pathways (REACTOME, KEGG and PID). We carefully assess the quality of the outcomes for over 220 thousand pairs, rank the results with combined and adjusted p-values from cell line tests and choose potential SL pairs that are best supported by evidence from other type of data. Our approach rediscovers known and reveals novel SL pairs. The results provide new biomarker hypotheses for further validation and suggest that cancers with a high mutation rate in chromatin modifying genes may be efficiently targeted by DDRi.
All the results and their visualization are easily available at http://85.128.56.82:3838/

Virtual: SNP rs5744934 CC GENOTYPE PROTECTS ASTHMATIC CHILDREN ON CORTICOSTEROID TREATMENT AGAINST HPAS
COSI: TransMed
  • Wisdom Akurugu, University of Cape Town, South Africa
  • Carel van Heerden, Stellenbosch University, South Africa
  • Alvera Vorster, Stellenbosch University, South Africa
  • Nicola Mulder, University of Cape Town, South Africa
  • Ekkehard Zöllner, Paediatric Endocrine Unit, Department of Pediatrics and Child Health, Stellenbosch University, South Africa


Presentation Overview: Show

Corticosteroids are part of therapy for asthma management but can lead to hypothalamic-pituitary-adrenal suppression (HPAS). There is no single biochemical parameter for assessing HPAS yet. This study aimed to identify single nucleotide polymorphisms (SNPs) as markers for HPAS.

Thirty-four suppressed and 55 non-suppressed asthmatic children on inhaled corticosteroids data were analysed. HPAS was diagnosed based on 3 variables: post-metyrapone adrenocorticotropic hormone (PMACTH) < 106 pg/ml, 11-deoxycortisol (11DOC) < 208 nmol/l and 11DOC+C < 400 nmol/l. Whole-exome sequencing was performed, and SNPs called with Torrent Variant Caller and GATK algorithms. Associations were tested with PLINK analysis toolkit using candidate genes. Logistic regression was performed with covariates (age, sex, BMI, BMI z-scores and drug dose) and genetic models assessed. Significant SNPs were prioritised with functional annotation and mapping (FUMA) tool.

Analysing for association using only PMACTH, 11 significant SNPs were identified in the genomic locus chr12:133178417-133333059. The variant allele of SNP rs5744934(T>C) was inversely associated with HPAS (OR=0.20, CI=0.06-0.65, p-value=4.51x10-3) defined by all 3 variables. It was also linearly associated with PMACTH, and high PMACTH class. Its effect was recessive and independent of covariates. No homozygous variant PMACTH sample was suppressed.

The variant CC genotype of rs5744934 is associated with HPAS protection.

Virtual: Survival Multi-Modal Neural Ordinary Differential Equations for Mortality Prediction of Patients with Severe Lung Disease
COSI: TransMed
  • Thomas Linden, University of Bonn; Fraunhofer Institute for Algorithms and Scientific Computing, Germany
  • Cindy Ku, University of Bonn; Fraunhofer Institute for Algorithms and Scientific Computing, Germany
  • Philipp Wendland, University of Applied Sciences Koblenz; Fraunhofer Institute for Algorithms and Scientific Computing, Germany
  • Holger Fröhlich, University of Bonn; Fraunhofer Institute for Algorithms and Scientific Computing, Germany


Presentation Overview: Show

The ongoing pandemic situation has demonstrated that medical decisions taken in intensive care units (ICU) are often time critical. To provide optimal care and resource allocation it is necessary to identify patients with high mortality risk as early as possible. However, development of according machine learning models is challenging due to a) a mix of longitudinal and static data; b) differences in time intervals between measured outcomes; and c) frequently occurring missing values. In this work we introduce a novel machine learning method for mortality risk predictions based on ICU data of patients with severe lung disease taken from the MIMIC-III dataset. Our Survival Multi-Modal Neural Ordinary Differential Equation (SMNODE) model is a hybrid mechanistic / neural network-based approach, which handles a mix of longitudinal and static data, implicitly accounts for missing values and deals with right-censored clinical outcomes, such as survival. Comparison of SMNODEs against several competing methods demonstrated a good prediction performance (C-index ~0.75) for mortality prediction of pneumonia and mechanically ventilated patients. Using recent developments from the field of Explainable AI shows, which measurements might be most critical to watch within a clinical routine setting.

Virtual: Transformer-Based Risk Models for COVID-19 Disease Progression
COSI: TransMed
  • Manuel Lentzen, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Germany
  • Sai Pavan Kumar Veeranki, Steiermärkische Krankenanstaltengesellschaft m.b.H., Austria
  • Thomas Linden, University of Bonn, Fraunhofer Institute for Algorithms and Scientific Computing, Germany
  • Diether Kramer, Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Austria
  • Werner Leodolter, Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Austria
  • Holger Froehlich, Fraunhofer Institute for Scientific Computing and Algorithmcs, Germany


Presentation Overview: Show

Situations like the COVID-19 outbreak pose a tremendous challenge for healthcare systems since they can quickly collapse under the burden of the crisis. Personal risk models can help in these situations by accurately predicting disease progression, allowing for better public health measures in the future. Deep learning models, particularly transformer-based models, are promising for such applications, as they have recently been applied to structured EHR data for disease risk prediction. Therefore, we extended the recently published Med-BERT approach by adding information about medications, age, residency, and gender. Following pre-training on 3.8 million US patients in the IBM Explorys Therapeutic dataset, we used a subgroup of 101,046 COVID-19 patients to develop risk models for predicting acute respiratory manifestations and hospitalization. Comparisons with Random Forest and XGBoost revealed that transformer-based models predicted COVID-19 disease progression more accurately, with AUCs around 80% for both endpoints. Subsequently, we employed the integrated gradients approach combined with Bayesian networks to explain model predictions. Finally, we applied transfer learning to Austrian healthcare data and demonstrated the potential of transfer learning while also emphasizing practical limitations. In conclusion, this work illustrates the potential of transformer-based models for developing predictive models in precision medicine based on real-world data.

Virtual: Using Metabolomic and Transcriptomic profiles to Predict Development of Anti- IFNβ Antibodies in people with Relapsing Remitting Multiple Sclerosis
COSI: TransMed
  • Leda Coelewij, University College London, United Kingdom
  • Kirsty Waddington, University College London, United Kingdom
  • Marsilio Adriani, University College London, United Kingdom
  • Petra Nytrova, Charles University in Prague, Czechia
  • Eva Kubala Hardova, Charles University in Prague, Czechia
  • Anna Fogdell-Hahn, Karolinska Institutet, Karolinska University Hospital, Sweden
  • Pierre Dönnes, SciCross, Sweden
  • Rachel Farrell, University College London, United Kingdom
  • Inés Pineda-Torra, University College London, United Kingdom
  • Elizabeth Jury, University College London, United Kingdom


Presentation Overview: Show

Anti-drug antibodies (ADA) reduces treatment efficacy in multiple sclerosis (MS) patients. Predisposing factors for ADA are poorly characterised, creating an unmet need for biomarkers to predict ADA and subsequent treatment failure. A multiomic approach was used to predict ADA development in interferon-beta (IFNβ) treated MS patients.
Serum was collected from MS patients, through the ABIRISK consortium, before and 3 months after IFNβ treatment. ADA status was determined after 12 months on IFNβ. Supervised machine learning, including logistic regression and sPLS-DA, were used to predict ADA status using metabolomic data (n=82), while transcriptomic profiles were examined in matched patients (n=11).
Before treatment, machine learning models classified patients with an accuracy ≈84%, while lipid metabolism gene counts clustered patients by ADA status, suggesting disrupted lipid metabolism amongst ADA+ patients. Strong correlations between these gene counts and metabolite concentrations were observed. Furthermore, ADA+ patients had distinct responses to IFNβ at month 3, showing differentially regulated metabolites and a down regulation of interferon signalling genes previously associated with response to IFNβ. This could predict ADA early in the treatment course.
Metabolomics and transcriptomics are promising tools for prediction of ADA in IFNβ treated MS patients and could provide novel insight into mechanisms of immunogenicity.

Virtual: Using Ultrasound and Deep Learning for Respiratory Flow Quantification
COSI: TransMed
  • Lin Qi, University of Wiscosin - Madison, United States
  • Quinton Guerrero, University of Wiscosin - Madison, United States
  • Humberto Rosas, University of Wiscosin - Madison, United States
  • Guelay Bilen-Rosas, University Of Wisconsin School of Medcine and Public Health, United States
  • Irene Ong, University of Wisconsin-Madison, United States


Presentation Overview: Show

Delayed recognition of respiratory deterioration in patients receiving sedative medication can lead to respiratory failure and delayed initiation of life-saving interventions, which can increase morbidity and mortality. Early recognition of respiratory deterioration is impeded by several factors: current monitoring devices (e.g., pulse-oximetry) measure secondary downstream physiological changes after a respiratory deterioration has occurred, due to an unpredictable respiratory depressive effect of sedative medication compounded by the patient’s co-morbidities for respiratory depression. Here, we report the first use of ultrasound signaling at the air-tissue interface of the respiratory tract as a non-invasive monitoring method of airflow quantification during respiration. We aim to predict respiratory airflow from ultrasound data using the Bi-directional Long Short-Term Memory based models. Our current model, trained with data from simultaneous ultrasound and spirometer measurements of respiration, shows that Pulsed-wave Doppler signaling and B-mode motion measurements can quantify dynamic respiratory airflow velocities and respiration rate with mean squared error of 0.04 compared to the spirometer’s airflow readings. This result indicates that ultrasound signaling coupled with a deep learning model has the potential to be utilized as a non-invasive, continuous respiratory monitor for real-time objective feedback parameters of respiration for early detection of respiratory failure.

N-001: Predictive Model for Endometriosis with Clinical, Lifestyle and Genetic Information
COSI: TransMed
  • Michal Linial, The Hebrew University of Jerusalem, Israel
  • Ido Blass, The Hebrew University of Jerusalem, Israel
  • Nadav Rappoprt, Ben-Gurion University of the Negev, Israel
  • Tali Sahar, McGill University Health Centre, Montreal, Canada
  • Adi Shribman, The Academic College of Tel Aviv-Yaffo, Israel


Presentation Overview: Show

Endometriosis is a disorder in which endometrial tissues are implanted outside of the uterus. Endometriosis affects 5–10% of all women of reproductive age yet is under-diagnosed. This research aims to develop an endometriosis model using multiple inputs from the UK-biobank (UKBB). The data was split into those with a diagnosis of endometriosis (5,924; ICD-10: N80) and the rest (142,576). Over 1000 variables were used, including personal information regarding female health, lifestyle, self-reported data, genetic variants, and medical history prior to the endometriosis diagnosis. An endometriosis prediction model was developed using machine learning (ML) algorithms. CatBoost's gradient boosting methods produced the best prediction for the data-combined model, with an area under the ROC curve (ROC-AUC) of 0.78. We found that prior to being diagnosed with endometriosis, women had significantly more ICD-10 diagnoses than the average unaffected woman. Irritable bowel syndrome (IBS) and the length of the menstrual cycle were among the most informative variables ranked by SHAP values. Despite the restrictions of missing data and noisy medical input, we conclude that the UKBB's large population-based retrospective data is useful for the development of predictive models. The informative features extracted from the model may increase endometriosis diagnostic clinical utility.

N-002: Identification of chronic kidney disease mechanisms through integration of the transcriptomics, proteomics and metabolomics data in CPROBE cohort
COSI: TransMed
  • Fadhl Alakwaa, University of Michigan, United States
  • Vivek Das, Novo Nordisk Research Center, United States
  • Damian Fermin, University of Michigan, United States
  • Viji Nair, University of Michigan, United States
  • Sean Eddy, University of Michigan, United States
  • Tim Slidel, AstraZeneca, United States
  • David Lopez, Gilead Pharmaceuticals, United States
  • Dermot Reilly, Johnson & Johnson, United States
  • Wenjun Ju, University of Michigan, United States
  • Eugene Myshkin, Johnson & Johnson, United States
  • Yu Chen, Lilly and Company, United States
  • Anil Karihaloo, Novo Nordisk Research Center, United States
  • Asim Dey, Lilly and Company, United States
  • Matthias Kretzler, University of Michigan, United States


Presentation Overview: Show

Motivation:
Multi-dimensional -omics data integration offers a tremendous opportunity to understand the complex pathophysiology of kidney disease.

Methods:
Transcriptomic, urine, and plasma proteomic, and targeted urine metabolomic profiling of 37 chronic kidney disease (CKD) patients at baseline from the Clinical Phenotyping and Resource Biobank Core (C-PROBE) cohort with composite endpoint defined as end-stage kidney disease or 40% decline in eGFR over 10 years were generated and integrated using MOFA+ and DIABLO. Top-ranked features from both models were validated in an independent group of 96 patients from C-PROBE.

Results:
MOFA+ factor#2 explained the variability in all four omics data, especially urine proteomics, and showed significant association (p<0.05) with the composite endpoint. DIABLO identified 105 analytes from all four omics platforms. Both methods identified unique as well as shared common features and pathways, such as complement and coagulation cascades. These shared urine proteins showed significant association (p<0.05) with the composite endpoint and were validated in an independent sample in a survival model adjusted for sex, age, eGFR, and UACR.

Conclusion:
Despite the small cohort size, specific urine proteins that were significantly associated with CKD progression were identified. These progression-associated non-invasive urinary biomarkers may be useful for patient stratification in CKD.

N-003: Data-driven patient stratification and drug target discovery by detecting paired itemsets from medical information and omics data
COSI: TransMed
  • Sadao Kurohashi, Graduate School of Informatics, Kyoto University, Japan
  • Naonori Ueda, RIKEN Center for Advanced Intelligence Project, Japan
  • Atsushi Kumanogoh, Osaka University Graduate School of Medicine, Japan
  • Kenji Mizuguchi, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Yasushi Matsumura, Osaka National Hospital, Japan
  • Toshihiro Takeda, Osaka University Graduate School of Medicine, Japan
  • Chioko Nagao, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Takeshi Fujiwara, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Yosui Nojima, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Chihiro Higuchi, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Yi-An Chen, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Shoko Wakamiya, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), Japan
  • Eiji Aramaki, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), Japan
  • Shuntaro Yada, Graduate School of Science and Technology, Nara Institute of Science and Technology (NAIST), Japan
  • Ribeka Tanaka, Graduate School of Informatics, Kyoto University, Japan
  • Fei Cheng, Graduate School of Informatics, Kyoto University, Japan
  • Yayoi Natsume-Kitatani, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Takeshi Tomonaga, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Satoshi Muraoka, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Ryohei Narumi, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Jun Adachi, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Saori Amiya, Osaka University Graduate School of Medicine, Japan
  • Takatoshi Enomoto, Osaka University Graduate School of Medicine, Japan
  • Yuichi Adachi, Osaka University Graduate School of Medicine, Japan
  • Yoshimi Noda, Osaka University Graduate School of Medicine, Japan
  • Yuya Shirai, Osaka University Graduate School of Medicine, Japan
  • Takayuki Shiroyama, Osaka University Graduate School of Medicine, Japan
  • Kohtaro Miyake, Osaka University Graduate School of Medicine, Japan
  • Haruhiko Hirata, Osaka University Graduate School of Medicine, Japan
  • Masataka Kuroda, National Institutes of Biomedical Innovation, Health and Nutrition, Japan
  • Yoshito Takeda, Osaka University Graduate School of Medicine, Japan
  • Mari N. Itoh, National Institutes of Biomedical Innovation, Health and Nutrition, Japan


Presentation Overview: Show

One of the reasons of the high failure rate in Phase II clinical trials is that the drug efficacy confirmed in experimental animals is not observed in human. Therefore, we aimed to search for drug targets based on human information from the early stage of drug development.

With this background, we developed a novel algorithm, subset-binding, for detecting patient stratification rules by linking medical information and omics data. Subset-binding is based on the fuzzy association rule mining technique, which takes paired matrices (e.g. a medical information matrix and an omics data matrix) as input data. After detecting frequent itemsets (groups of co-occurring items) in each matrix independently, subset-binding links the co-occurring frequent itemsets between the two input matrices.

In this study, we collected medical information and proteomic data from patients with interstitial pneumonia, including idiopathic pulmonary fibrosis (IPF). Subset-binding analysis of collected medical information and proteome data revealed several proteins that can be linked to the characteristics of IPF, whose relevance was experimentally validated. This study demonstrated that data-driven linkage by subset-binding between phenotype-level data and biomolecule-level data of diseases can be used for drug target discovery through patient stratification.

N-004: Investigating immunity genes with associated phenotypes among high-risk COVID-19 population
COSI: TransMed
  • Zeeshan Ahmed, Institute for Health, Health Care Policy and Aging Research. Rutgers, The State University of New Jersey., United States
  • Eduard Renart, Rutgers Institute for Health, Health Care Policy and Aging Research, United States
  • Saman Zeeshan, Cancer Institute of New Jersey. Rutgers, The State University of New Jersey., United States


Presentation Overview: Show

We have done a global analysis of the genes targeting major components of the immune systems to identify possible variations, likely to be involved in COVID-19 predisposition. We performed gene-variant analysis on the samples collected from diverse populations. During our analysis, ACE2, TMPRSS4, TMPRSS2, SLC6A20, and FYCOI were found to have functional implications, and TMPRSS4 may have role in clinical manifestation of COVID-19 severity.

N-005: GVViZ: A physician-friendly platform enabling interactive gene-disease data annotation, expression analysis, and visualization for translational research
COSI: TransMed
  • Zeeshan Ahmed, Institute for Health, Health Care Policy and Aging Research. Rutgers, The State University of New Jersey., United States
  • Eduard Renart, Rutgers Institute for Health, Health Care Policy and Aging Research, United States
  • Saman Zeeshan, Cancer Institute of New Jersey. Rutgers, The State University of New Jersey., United States


Presentation Overview: Show

We introduce GVViZ; a new, robust, and user-friendly platform for RNA-seq-driven gene-disease data annotation, and expression analysis with dynamic heat map visualization. With successful deployment in clinical settings, GVViZ will enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data. GVViZ can assess genotype-phenotype associations among multiple complex diseases to find novel highly expressed genes. We have evaluated its clinical impact for different chronic diseases including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders.

N-006: Integration of Clinicopathological And Genomic Features To Predict The Risk Stratification of TCGA Lung Adenocarcinoma And Lung Squamous Cell Carcinoma Patients
COSI: TransMed
  • Mehmet Cihan Sakman, Mugla Sitki Kocman University, Turkey
  • Talip Zengin, Mugla Sitki Kocman University, Turkey
  • Tuğba Önal-Süzek, Mugla Sitki Kocman University, Turkey


Presentation Overview: Show

Background
Lung adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) patients’ risk group is currently determined via staging in the clinic. Although there are emerging targeted therapies, full molecular characterization of a patient’s tumor is a prerequisite for an efficient targeted personalized treatment strategy and is thus infeasible. In this paper, we reported a machine-learning-based risk stratification for LUAD and LUSC patients, using a few molecular targets and clinicopathological features.

Methods
We assessed the five prognostic prediction models to rank the importance of 1026 patients’ clinicopathological and somatically mutated gene features.
Findings

Among the 5 learning models, the highest accuracy was achieved by integrating the top 10 somatically mutated genes and the clinical features; %94 for LUAD and %91 for LUSC by Logistic Regression respectively. Feature importance ranking revealed new prognostic genes such as KEAP1 for LUAD and CSMD3 for LUSC and new clinicopathological factors such as the site of resection.

Conclusions
Using our proposed feature set, clinicians and patients can assess the risk group of their patients by a few molecular and clinical parameters. In addition, we are implementing a web interface for clinicians to assess the risk stratification of individuals in a clinical setting.

N-007: ICU Survival Prediction Incorporating Test-Time Augmentation to Improve the Accuracy of Ensemble-Based Models
COSI: TransMed
  • Seffi Cohen, Ben-Gurion University of the Negev, Israel
  • Nurit Cohen-Inger, BeyondMinds, Israel
  • Noa Dagan, Clalit Research Institute, Israel
  • Dan Ofer, Hebrew University of Jerusalem, Israel
  • Lior Rokach, Ben-Gurion University of the Negev, Israel


Presentation Overview: Show

Severity evaluation is crucial in clinical settings for evaluating patients prognosis. These calculators are used to evaluate survival chances and to optimize patient treatments and resources, notably in Intensive Care Units (ICU). In this work, we present a novel method for applying Test Time Augmentation (TTA) to tabular data. We used TTA along with an ensemble of 42 models to achieve superior performance on the MIT Global Open Source Severity of Illness Score (GOSSIS) initiative, of 131,051 ICU visits and outcomes. This method achieved an AUC of 0.915 on the private test set (19,669 admissions) and won first place at Stanford's WiDS Datathon 2020 challenge on Kaggle, while the widely used Acute Physiology and Chronic Health Evaluation (APACHE) IV model achieved an AUC of 0.868. In addition to improving predictions of patient risk, our method also reduces “unfair” bias

N-008: 1H-NMR metabolomics-based models to impute common clinical variables and endpoints in epidemiological studies
COSI: TransMed
  • Daniele Bizzarri, Leiden University Medical Center, Netherlands
  • Marcel Reinders, Tu Delft, Netherlands
  • Marian Beekman, Leiden University Medical Center, Netherlands
  • Anna Niehues, Radboud University Medical Centre, Netherlands
  • Peter-Bram Hoen, Radboud University Medical Centre, Netherlands
  • Eline Slagboom, Leiden University Medical Center, Netherlands
  • Erik van den Akker, Leiden University Medical Center, Netherlands


Presentation Overview: Show

1H-NMR metabolomics platform is rapidly gaining popularity in epidemiological research, as it provides a reproducible and cost-effective assessment of the blood metabolome. We will illustrate how we used 1H-NMR metabolomics data of a commercial platform to successfully predict 19 out of 20 routinely assessed clinical variables using a logistic ElasticNET. We will detail on how these models were trained and evaluated within the 26 biobanks participating in BBMRI-nl (~26,000 samples). We will continue by showing that these surrogates can be used to impute missing phenotypic information in external cohorts. Moreover, we will demonstrate that these metabolic surrogates can be used as substitutes for partially or completely unobserved confounders in association studies (Metabolome- or Transcriptome- Wide Association studies) and show that the metabolic surrogates themselves can be used as novel biomarkers, by presenting significant associations with incident all-cause mortality in the elderly population. Finally, we will present our new R-shiny tool (MiMIR) able to compute new and previously published multivariate metabolomics models in other cohorts with 1H-NMR metabolomics, calibrate their predicted values using Platt’s method, and compare the uploaded Nightingale metabolomics quantifications to the metabolites’ distributions observed in BBMRI-nl.

N-009: Predicting and characterizing a cancer dependency map of tumors with deep learning
COSI: TransMed
  • Yu-Chiao Chiu, UPMC Hillman Cancer Center, University of Pittsburgh, United States
  • Yufei Huang, UPMC Hillman Cancer Center, University of Pittsburgh, United States
  • Yidong Chen, Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, United States


Presentation Overview: Show

Genome-wide loss-of-function screens have revealed genes essential for cancer cell proliferation, called cancer dependencies. It remains challenging to link cancer dependencies to the molecular compositions of cancer cells, or to unscreened cell lines and further to tumors. Here we present DeepDEP, a deep learning model that predicts cancer dependencies using integrative genomic profiles. It employs a unique unsupervised pretraining that captures unlabeled tumor genomic representations to improve the learning of cancer dependencies. We demonstrated DeepDEP’s improvement over conventional machine learning methods and validated the performance with three independent datasets. By systematic model interpretations, we extended the current dependency maps with functional characterizations of dependencies and a proof-of-concept in silico assay of synthetic essentiality. We applied DeepDEP to pan-cancer tumor genomics and built the first pan-cancer synthetic dependency map of 8,000 tumors with clinical relevance. In summary, DeepDEP is a novel tool for investigating cancer dependency with rapidly growing genomic resources. This study was published in Science Advances in August 2021 (doi: 10.1126/sciadv.abh1275).

N-010: An in-silico approach to lead compound identification for Alzheimer’s Disease
COSI: TransMed
  • Smitha Sunil Kumaran Nair, Middle East College, Oman
  • Beema Shafreen Rajamohamed, Dr. Umayal Ramanathan College for women, India
  • Saqar Said Nasser Al Maskari, Middle East College, Oman
  • Nallusamy Sivakumar, Sultan Qaboos University, Oman
  • Kiran Gopakumar Rajalekshmi, Middle East College, Oman
  • Adhraa Al Mawaali, Ministry of Health, Oman


Presentation Overview: Show

An in-silico approach on multi-therapeutic agents against multi-therapeutic targets was carried out through docking studies to explore potential lead compounds for Alzheimer’s Disease (AD) clinical trials. Virtual screening was performed with four US FDA-approved control drugs (Donepezil, Galantamine, Rivastigmine, and Tacrine) for mild-moderate-severe stages of AD treatment. The panel of compounds identified through virtual screening was subjected to chemical absorption, distribution, metabolism, excretion, and toxicity (ADMET) and Pharmacokinetics (PK). The compound with good ADMET and PK scores was investigated further with molecular docking against the four therapeutic targets involved in AD. Ligands showing the highest binding affinity against cholinesterase inhibitors (AChE, BuChE), receptor antagonist (NMDA), and β-amyloid peptide (Aβ) were computed. It was observed that the compounds Quinazolidinone analogue, 2b, Isoquinoline-pyridine, 1, Benzylmorphine and Coelenteramide are the best lead candidates with the least side effects and better efficacy.

N-011: Use of machine learning to classify high-risk variants of uncertain significance in lamin A/C cardiac disease
COSI: TransMed
  • David Gordon, Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Jeffrey Bennett, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Uddalak Majumdar, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Patrick Lawrence, Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Adrianna Matos-Nieves, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Katherine Myers, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Anna Kamp, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Julie Leonard, Center for Injury Research and Policy, Abigail Wexner Research Institute, Nationwide Children’s Hospital, United States
  • Kim McBride, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States
  • Peter White, Institute for Genomic Medicine at Nationwide Children's Hospital, United States
  • Vidu Garg, Center for Cardiovascular Research and Heart Center, Nationwide Children’s Hospital, United States


Presentation Overview: Show

Variation in lamin A/C (LMNA) results in a spectrum of clinical disease, including arrhythmias and cardiomyopathy. Known benign variation is rare, and current in silico predictions have limited utility in driving ACMG classification of LMNA missense variants. Our study of a family with inherited conduction system disease revealed a novel segregating missense variant, p.Asp136Glu, initially reported as a VUS by a commercial testing company. Additional familial analysis and in vitro testing enabled classification of the variant as likely pathogenic per ACMG guidelines. However, extended familial analysis is not always feasible, leaving clinicians with little genetic guidance beyond the presence of a missense variant. This prompted the development of an ML algorithm to aid clinical interpretation of LMNA missense variants. While insufficient known benign variation exists to create an ML classifier, unsupervised clustering of previously observed variants in gnomAD and Clinvar using UMAP and K-means identified three clusters with significantly different proportions of reported pathogenic/likely pathogenic variants (38.8%, 15.0%, and 6.1%). We anticipate that these findings can be translated to clinical use by guiding the treatment of patients with a VUS present in a cluster enriched for pathogenicity and may prove useful in other genes where classification is difficult.

N-012: Learning a single-cell map of Acute Myeloid Leukaemia with auto-encoders
COSI: TransMed
  • Alice Driessen, IBM Research, Switzerland
  • Susanne Unger, University of Zurich, Switzerland
  • An-Phi Nguyen, IBM Research, Switzerland
  • Marianna Rapsomaniki, IBM Research, Switzerland
  • Burkhard Becher, University of Zürich, Switzerland
  • Maria Rodriguez Martinez, IBM, Zurich Research Laboratory, Switzerland


Presentation Overview: Show

Acute myeloid leukaemia (AML) is a haematological cancer in the bone marrow, with accumulation and expansion of immature cells of the myeloid lineage. Standard treatment of AML is chemotherapy, which does not achieve durable remission in most patients. Personalised medicine including immunotherapies have the potential to target chemotherapy resistant cells and achieve long-term remission. Identifying suitable targets for AML therapy is hampered by the heterogeneity and complex clonal composition of the cancer, as well as its complex evolution as the disease progresses. We aim to build a single-cell cytometry AML map to identify malignant cells and place them along the developmental trajectory using data from 20 patients and three time points over the course of the disease. We train a variational auto-encoder structure on healthy cells, which learns cellular reconstruction as well as the cell type classification. The latent space of the auto-encoder provides a meaningful representation of the healthy bone marrow cells to which we can map new cells. We use the trajectory assignment to segment patients into groups as well as to study the time evolution of the disease in terms of the distribution of malignant cells across the myeloid lineage.

N-013: Identification and validation of functional co-expression modules in frontotemporal dementia
COSI: TransMed
  • Juami van Gils, Vrije Universiteit Amsterdam, Netherlands
  • Claire Bridel, Amsterdam UMC, Netherlands
  • Suzanne Miedema, Vrije Universiteit Amsterdam, Netherlands
  • Jeroen Hoozemans, Amsterdam UMC, Netherlands
  • Yolande Pijnenburg, Amsterdam UMC, Netherlands
  • Augustus Smit, Vrije Universiteit Amsterdam, Netherlands
  • Annemieke Rozemüller, Amsterdam UMC, Netherlands
  • Sanne Abeln, Vrije Universiteit Amsterdam, Netherlands
  • Charlotte Teunissen, Amsterdam UMC, Netherlands


Presentation Overview: Show

Frontotemporal dementia (FTD) is characterized pathologically by neurodegeneration predominating in the frontal and temporal lobes (frontotemporal lobar degeneration, FTLD). Two major non-overlapping pathological subtypes of FTLD have been described: FTLD-tau, and FTLD-TDP. To identify proteomic changes in the frontal lobe cortex associated with FTLD-tau and/or FTLD-TDP, and uncover dysregulated biological processes in the two most common pathological variants of FTLD, we used quantitative label-free LC-MS and developed a workflow that combined existing clustering and network-based tools to identify coexpressed protein groups, and developed a new method to validate these modules. Current methods (e.g. GSEA) often require functional annotations to identify functional groups, which biases them towards pre-existing knowledge. We combined weighted co-expression network analysis on the protein expression levels with Hierarchical HotNet to identify functional groups of coexpressed proteins without pre-emptively requiring functional annotation of the proteins. We confirmed the identified networks using our novel validation method on two validation data sets, originating from medial frontal and temporal cortex samples of individuals with FLTD-tau and NHC. Taking these samples of FTD subtypes as a use case, we show that our workflow can extract meaningful biological insight from protein expression data.

N-014: PRState: Incorporating Genetic Ancestry in Prostate Cancer Risk Scores for African American Men
COSI: TransMed
  • Meghana Pagadala, UCSD, United States
  • Joshua Linscott, Maine Medical Center, United States
  • James Talwar, UCSD, United States
  • Tyler Seibert, UCSD, United States
  • Brent Rose, UCSD, United States
  • Julie Lynch, VA Salt Lake City Healthcare System, United States
  • Matthew Panizzon, UCSD, United States
  • Richard Hauger, UCSD, United States
  • Moritz Hansen, Maine Medical Center, United States
  • Jesse Sammon, Maine Medical Center, United States
  • Matthew Hayn, Maine Medical Center, United States
  • Karim Kader, UCSD, United States
  • Hannah Carter, UCSD, United States
  • Stephen Ryan, Maine Medical Center, United States


Presentation Overview: Show

Prostate cancer (PrCa) is one of the most genetically driven solid cancers with heritability estimates as high as 57%. African American men are at an increased risk of PrCa; however, current risk prediction models are based on European ancestry groups and may not be broadly applicable. In this study, we define an African ancestry group of 4,533 individuals to develop an African ancestry-specific PrCa polygenic risk score (PRState). We identified risk loci on chromosomes 3, 8, and 11 in the African ancestry group GWAS and constructed a polygenic risk score (PRS) from 10 African ancestry-specific PrCa risk SNPs, achieving an AUC of 0.61 [0.60-0.63] and 0.65 [0.64-0.67], when combined with age and family history. Performance dropped significantly when using ancestry-mismatched PRS models but remained comparable when using trans-ancestry models. Importantly, we validated the PRState score in the Million Veteran Program, demonstrating improved prediction of PrCa and metastatic PrCa in African American individuals. This study underscores the need for inclusion of individuals of African ancestry in gene variant discovery to optimize PRS.

N-015: Tree-based integration and visualization of single-cell data from multiple patients at an early stage of AML treatment
COSI: TransMed
  • Raul Mendez-Giraldez, National Institute of Enviromental Health Sciences (NIEHS/NIH), United States
  • Benedicte Tislevoll, University of Bergen, Norway
  • Monica Hellesøy, Haukeland University Hospital, Norway
  • Björn Tore Gjertsen, University of Bergen, Norway
  • Nello Blaser, University of Bergen, Norway
  • Benedict Anchang, National Institute of Enviromental Health Sciences (NIEHS/NIH), National Cancer Institute (NCI/NIH), United States


Presentation Overview: Show

The prognosis of most patients with acute myeloid leukemia (AML) is generally poor due to frequent relapse, which is thought to arise from the persistence of leukemia-initiating stem cells following treatment. Early changes suggest a potential establishment of chemotherapy resistance that can mediate long-term survival. The analysis of millions of cells simultaneously by Cytometry by time-of-flight (CyTOF), allows the study of the phenotypic complexity of AML cells.
CyTOF data corresponding to 21 surface and 15 intracellular markers from 32 AML patients was generated, collecting peripheral blood samples at 4 and 24 hours after the start of standard “7+3” induction therapy. Peripheral blood and bone marrow from healthy donors were also analyzed.
Our recently developed Dynamic Spanning Forest Mixtures (DSFMix) algorithm leverages decision trees to select markers that account for variations in multimodality, skewness, and time; and uses them to connect all the into a large tree, which is then split into a forest, using tree hierarchical clustering plus an automated dynamic branch cutting.
DSFMix captures the appearance of cancer stem cells and myeloid phenotype over time, upon treatment, and can scale up to analyze the combined cells of multiple patients across multiple conditions, which makes it unique for clinical applications.

N-016: Knowledge-guided deep learning models of drug toxicity improve interpretation
COSI: TransMed
  • Yun Hao, University of Pennsylvania, United States
  • Joseph Romano, University of Pennsylvania, United States
  • Jason Moore, Cedars-Sinai Medical Center, United States


Presentation Overview: Show

A major reason for attrition in drug development is the lack of understanding of cellular mechanisms governing drug toxicity. The black-box nature of conventional classification models has limited their utility in identifying toxicity pathways. Here we developed DTox (Deep learning for Toxicology), an interpretation framework for knowledge-guided neural networks, which can predict compound response to toxicity assays and infer toxicity pathways of individual compounds. Our framework enables systematic explanation of model predications, which is missing in the previous development of knowledge-guided neural networks. We demonstrate that DTox can achieve the same predictive performance as conventional models with a significant improvement in interpretability. For instance, DTox is able to rediscover mechanisms of transcription activation by three nuclear receptors and differentiate distinctive mechanisms leading to HepG2 cytotoxicity. DTox can also recapitulate cellular activities induced by aromatase inhibitors and pregnane X receptor agonists, implying its potential to generate testable hypotheses for further investigation. Enrichment analysis based on DTox results links hepatic adverse events to known cellular mechanism of drug-induced liver injury. Virtual screening by DTox reveals that compounds with predicted cytotoxicity are at higher risk for hepatic adverse events. In summary, DTox provides a framework for deciphering cellular mechanisms of toxicity in silico.

N-017: Direction-aware data fusion techniques for multi-omics pathway enrichment analysis and biomarker discovery
COSI: TransMed
  • Mykhaylo Slobodyanyuk, University of Toronto, Canada
  • Jüri Reimand, University of Toronto, Canada


Presentation Overview: Show

Different omics techniques allow us to characterise the genetic, transcriptomic, epigenomic and proteomic landscapes of cells and tissues, and better understand their perturbations in disease. However, joint analyses of different omics datasets for a holistic understanding of cell function present a computational challenge. We recently developed ActivePathways, an integrative pathway enrichment analysis method that uses data fusion to merge signals from multiple omics datasets, prioritizes genes and pathways through p-value merging, and evaluates their contribution from individual input datasets. Here we extend this computational framework to account for directional activities of genes and proteins across the input omics datasets. For example, fold-change in protein expression would be expected to associate positively with mRNA change of the corresponding gene, while DNA methylation change of the gene promoter would be expected to associate negatively. We extend our method to encode such directional interactions and penalize genes and proteins where such assumptions are violated. We demonstrate the approach by integrating cancer RNA-seq, DNA methylation, and proteomics datasets in the CPTAC and TCGA projects, in which we uncover novel candidate biomarkers and pathways that have been previously overlooked in the analysis of individual datasets.

N-018: Functional characterization of disease embeddings learned from massive EHR data.
COSI: TransMed
  • Andrej Bugrim, Silver Beach Analytics, Inc., United States


Presentation Overview: Show

Recently developed cui2vec resource provides embeddings of approximately 109 thousand biomedical terms learned from the analysis of over 60 million Electronic Health Records and full text papers. We investigate whether these data could be used to identify functional relations among diseases and to predict novel gene-disease associations. We combine disease embeddings with gene-disease associations obtained from DisGeNet database and generate joint gene-disease embeddings for 3568 diseases and 8686 genes. We identify groups of similar diseases and show that they are well aligned with disease categories from MeSH. We also identify molecular pathways most characteristic for each disease group and show that they are highly relevant to known disease mechanisms. Finally, we use similarity between gene and disease embeddings to predict novel gene-disease associations and demonstrate that this method can generate highly accurate predictions. Importantly, our analysis can be used to suggest mechanisms for conditions that have not been functionally understood. In this respect it can identify potential markers and drug targets for poorly characterized orphan and rare diseases based on similarity of their embeddings with genes and better characterized conditions. It can also reveal unexpected novel connections among diseases and between diseases and molecular pathways.

N-019: Human genetic diversity alters therapeutic gene editing off-target outcomes
COSI: TransMed
  • Samuele Cancellieri, University of Verona, Italy
  • Jing Zeng, Boston Children's Hospital, United States
  • Linda Lin, Boston Children's Hospital, United States
  • Manuel Tognon, University of Verona, Italy
  • My Anh Nguyen, Boston Children's Hospital, United States
  • Jiecong Lin, Massachusetts General Hospital, United States
  • Nicola Bombieri, University of Verona, Italy
  • Stacy Maitland, University of Massachusetts Medical School, United States
  • Marioara-Felicia Ciuculescu, Boston Children’s Hospital, United States
  • Varun Katta, St. Jude Children's Research Hospital, United States
  • Shengdar Tsai, St. Jude Children's Research Hospital, United States
  • Myriam Armant, Boston Children’s Hospital, United States
  • Scot Wolfe, University of Massachusetts Medical School, United States
  • Rosalba Giugno, University of Verona, Italy
  • Daniel Bauer, Boston Children's Hospital, United States
  • Luca Pinello, Massachusetts General Hospital, United States


Presentation Overview: Show

CRISPR gene editing holds great promise to modify somatic genomes to ameliorate disease. In silico prediction of homologous sites coupled with biochemical evaluation of possible genomic off-targets may predict genotoxicity risk of individual gene editing reagents. However, standard computational and biochemical methods focus on reference genomes and do not consider the impact of genetic diversity on off-target potential. Here we developed a web application called CRISPRme that explicitly integrates human genetic variant to nominate and prioritize off-target sites at scale. The method considers both single-nucleotide variants (SNVs) and indels, accounts for bona fide haplotypes and is suitable for personal genome analyses. We tested CRISPRme with a gRNA targeting the BCL11A erythroid enhancer that has shown therapeutic promise in clinical trials for sickle cell disease (SCD) and β-thalassemia. We find that the top candidate off-target is produced by a non-reference allele common in African-ancestry populations. We validate that SpCas9 generates indels and chr2 pericentric inversions in a strictly allele-specific manner in edited hematopoietic stem/progenitor cells. CRISPRme highlights alternative allele-specific off-target editing as a prevalent risk of gRNAs considered for therapeutic gene editing. Our report illustrates how population and private genetic variants should be considered as modifiers of genome editing outcomes.

N-020: Genome-Derived Diagnosis: Deep Learning Model for Tumor Type Prediction using MSK-IMPACT data
COSI: TransMed
  • Madison Darmofal, Memorial Sloan Kettering, Weill Cornell Graduate School, United States
  • Quaid Morris, Sloan Kettering Institute, United States
  • Michael Berger, Memorial Sloan Kettering, United States


Presentation Overview: Show

Knowledge of a patient’s tumor type is essential for guiding clinical treatment decisions in cancer, but histologic-based diagnosis remains challenging. Genomic alterations are highly indicative of tumor type, and can be used to build classifiers which predict diagnoses, but most genomic-based classification methods use WGS data which is not feasible for widespread clinical implementation at present. MSK-IMPACT is a FDA-approved clinical sequencing fixed-panel assay which reports genomic alterations including mutations, indels and copy number alterations across 468 cancer-associated genes, and has sequenced over 65,000 Memorial Sloan Kettering patients to date. We use genomic features from this large dataset to develop Deep Genome-Derived-Diagnoses (GDD-NN): a deep-ensemble tumor type classifier. GDD-NN achieves 78.6% accuracy across 40 common cancer types, outperforming similar models. For MSK-IMPACT patients with rarer cancers, we implement out-of-distribution detection using ensemble-based features, which classifies OOD samples (AUC = .94) without explicitly training on them. For patients where non-genomic information might inform predictions, we implement a prediction-specific adaptive prior and report improved accuracy after adjusting predictions given sample biopsy site. Overall, integrating GDD-NN into the well-established MSK-IMPACT pipeline will enable clinically-relevant tumor type predictions that can guide treatment decisions in real time at an institutional level.

N-021: A Computer Program to Identify Time-series Gene Expression Datasets in PubMed
COSI: TransMed
  • Hoang Le Tran, Grand Valley State University, United States
  • Varsha Penumalee, Troy High School, United States
  • Guenter Tusch, Grand Valley State University, United States


Presentation Overview: Show

Purpose: Transcriptome meta-analysis can help to identify temporal patterns in publicly available data; however, temporal studies need to be indexed appropriately too many false positives. We conducted a case study to identify possible solutions.

Procedures: In a previous study, we started with key regulatory pathways and manually searched in PubMed to identify papers with those pathways. We used a query refinement method developed by our group for time-series expression profiles and randomly selected 40% of the papers out of 1226 original research results based on MeSH terms, abstracts, supplementary tables, and the articles themselves, which resulted in 68 distinct papers. 92% of these papers could be identified with abstracts and tables alone. We created a prototype automating this process by using R/Bioconductor packages to identify articles by MeSH terms and the pubtator function to extract genes from abstracts and tables to match with KEGG pathways.

Results: We were able to identify a similar percentage of the 68 papers with the manual process as with our computer program.

Impact: This proves that text mining abstracts and supplementary tables in combination with MeSH term searches can help identify genes, signaling pathways and biomarkers, and novel pathways and genes with sufficient accuracy.

N-022: A network-based approach to identify expression modules underlying rejection in pediatric liver transplantation
COSI: TransMed
  • Mylarappa Ningappa, University of Pittsburgh, United States
  • Syed A Rahman, University of Pittsburgh, United States
  • Brandon Higgs, University of Pittsburgh, United States
  • Chethan S Ashokkumar, University of Pittsburgh, United States
  • Nidhi Sahni, MD Anderson Cancer Center, United States
  • Rakesh Sindhi, University of Pittsburgh, United States
  • Jishnu Das, University of Pittsburgh, United States


Presentation Overview: Show

Selecting the right immunosuppressant to ensure rejection-free outcomes poses unique challenges in pediatric liver transplant (LT) recipients. A molecular predictor can comprehensively address these challenges. Currently, there are no well-validated blood-based biomarkers for pediatric LT recipients either pre- or post-LT. Here, we discover and validate separate pre- and post-LT transcriptomic signatures of rejection. Using an integrative machine learning approach, we combine transcriptomic data with the reference high-quality human protein interactome to identify network module signatures, which underlie rejection. Unlike gene signatures, our approach is inherently multivariate, more robust to replication and captures the structure of the underlying network, encapsulating additive effects. We also identify, in a patient-specific manner, signatures that can be targeted by current anti-rejection drugs and other drugs that can be repurposed. Overall, our approach can enable personalized adjustment of drug regimens for the dominant targetable pathways in pre- and post-LT in children.

N-023: Age- and sex-specific gene signatures and networks
COSI: TransMed
  • Kayla Johnson, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States


Presentation Overview: Show

Age and sex are historically understudied factors in biomedical studies. However, many complex traits and diseases vary in their incidence and presentation in people of different ages and sexes, illustrating the need for computational frameworks that can aid scientists in systematically bridging gaps in understanding age- and sex-specific biological context. Addressing this need is complicated by the fact that a vast majority of publicly available gene expression profiles do not have age and sex labels. Therefore, we first curated ~30,000 samples associated with age and sex information in order to train models to predict these variables from gene expression values. By interpreting the weights of these predictive models, we also derive gene signatures characteristic of different age groups in males and females. These models will be expanded to predict age and sex labels for a large set of publicly available data, which will then be selectively integrated to build genome-scale interaction networks that are specific to multiple age and sex groups. The specificity of the gene signatures and networks are tested on their ability to recapitulate gene associations from prior know Together, these will provide powerful tools to aid scientists in studying age- and sex-specific health and disease processes.

N-024: A network-based approach for isolating the chronic inflammation gene signatures underlying complex diseases towards finding new treatment opportunities
COSI: TransMed
  • Stephanie Hickey, Michigan State University, United States
  • Alexander McKim, Michigan State University, United States
  • Christopher Mancuso, Michigan State University, United States
  • Arjun Krishnan, Michigan State University, United States


Presentation Overview: Show

Complex diseases are associated with a wide range of cellular, physiological, and clinical phenotypes. To advance our understanding of disease mechanisms and our ability to treat these diseases, it is critical to delineate the molecular basis and therapeutic avenues of specific disease phenotypes, especially those that are associated with multiple diseases. Inflammatory processes constitute one such prominent phenotype, being involved in a wide range of health problems including ischemic heart disease, stroke, cancer, diabetes mellitus, chronic kidney disease, non-alcoholic fatty liver disease, and autoimmune and neurodegenerative conditions. While hundreds of genes might play a role in the etiology of each of these diseases, isolating the genes involved in the specific phenotype (e.g. inflammation “component”) could help us understand the genes and pathways underlying this phenotype across diseases and predict potential drugs to target the phenotype. Here, we present a computational approach that integrates gene interaction networks, disease-/trait-gene associations, and drug-target information to accomplish this goal. We apply this approach to isolate gene signatures of complex diseases that correspond to chronic inflammation and prioritize drugs to reveal new therapeutic opportunities.

N-025: Aurora – Convolutional Neural Network for Breast Cancer Diagnosis Prediction
COSI: TransMed
  • Luiz Santos, Universidade Estadual do Sudoeste da Bahia, Brazil
  • Wagner Soares, Universidade Estadual do Sudoeste da Bahia, Brazil


Presentation Overview: Show

The use of artificial intelligence (AI) as a tool of “Health 4.0”, has been modifying the way of monitoring clinical data in health. The improvement in the diagnosis of Chronic Non-Communicable Diseases (NCDs), such as cancer, can minimize socioeconomic losses with a direct impact on the public and private health system. The created predictive convolutional neural network was able to classify breast tumors as malignant or benign. The neural architecture was based on: Backpropagation and activation functions (ReLu and Sigmoidal). We analyzed breast aspirate data from 569 patients, using 75% of the sample for training and 25% for testing. The RMSprop algorithm was implemented. The data were extracted from the public dataset “Breast Cancer Wisconsin (Diagnostic) Data Set”, from the University of Wisconsin. After multiparametric analysis and network calibration in 20 training generations, the system presented a 98% success rate in classifying mammary tumors as benign or malignant for the inputs received, showing no significant improvement in accuracy when increasing the number of generations beyond this threshold. AI has become an extremely valuable tool in the field of biomedical sciences, leading to an improvement in the accuracy of medical diagnosis and quality of life for patients.

N-026: A Mobile Application For Early Prediction of Skeletal Class III Malocclusion from Profile Photos Integrating Deep Learning and Machine Learning Models
COSI: TransMed
  • Selahattin Aksoy, Mugla Sıtkı Koçman University, Turkey
  • Banu Kılıç, Bezmialem Vakıf University, Turkey
  • Tuğba Önal-Süzek, Mugla Sitki Kocman University, Turkey


Presentation Overview: Show

Among skeletal deformities, early detection of Class III malocclusion is unique as the best treatment period is the pre-adolescent growth period. Class III malocclusion is complicated to treat with braces frequently requiring surgical intervention after the pubertal growth spurt. Delayed recognition of Class III malocclusion yields significant functional, aesthetic, and psychological concerns.

In this study, we implemented a mobile application following a comparative analysis of three predictive models to predict Class III malocclusion from profile images: deep learning, machine learning, and rule-based algorithm. For this analysis, we collected a novel patient profile image data from Bezmialem Vakıf University Orthodontics Department of 606 orthodontics patients. The first novelty of our study is; that a mobile application integrating a machine learning model for automated diagnosis of the Class III malocclusion from profile images has not been investigated before. Second, our model has the potential to be completely extendable to other ethnicities and skeletal syndromes without the need for any expert or additional apparatus. The validation of the initial model successfully achieved 77% accuracy in 10-fold cross-validation, 79% accuracy over test data, and 77% accuracy over train data. As a result of the validation of the initial model successfully achieved %79 accuracy.

As there is no machine learning-based orthodontics mobile application in the market worldwide, our implementation is the first mobile application for the early diagnosis of craniofacial abnormalities.

N-027: Joint genetics and transcriptomics based classification of acute myeloid leukaemia patients
COSI: TransMed
  • Jeppe Severens, Leiden University Medical Center (LUMC)/Technical University Delft (TU Delft), Netherlands
  • Onur Karakaslar, Leiden University Medical Center (LUMC)/Technical University Delft (TU Delft), Netherlands
  • Elena Sanchez-Lopez, Leiden University Medical Center (LUMC), Netherlands
  • Hendrik Veelken, Leiden University Medical Center (LUMC), Netherlands
  • Marcel Reinders, Leiden University Medical Center (LUMC)/Technical University Delft (TU Delft), Netherlands
  • Marieke Griffioen, Leiden University Medical Center (LUMC), Netherlands
  • Erik van den Akker, Leiden University Medical Center (LUMC)/Technical University Delft (TU Delft), Netherlands


Presentation Overview: Show

Introduction: Acute myeloid leukemia (AML) can be classified using recurrent genetic abnormalities (RGAs). RGAs are only identified in half of the AML patients, hindering clinical decision-making. Literature has shown the merits of transcriptomic data for the subclassification of AML. We thus use gene-expression to predict established RGAs and to further stratify AML.

Method: We collected RNAseq data from five studies (n = 1337) to build prediction models for RGAs. In parallel, we performed unsupervised analysis using batch-effect removal and clustering. Identified patient clusters were correlated with aberrations, survival, and drug-resistance data.

Results: The best performing model had an accuracy of 0.93 on the test set, while mispredictions could be explained by aberrations showing similar gene-expression. Clustering yielded 15 clusters, of which several were not defined by a known RGA. For the CEBPAdm enriched cluster, we found samples without a CEBPA mutation to have the same favorable survival typical for CEBPAdm. For the NPM1 subtype, four clusters were found, showing differences in mutations, maturational arrest, and in-vitro drug resistance.

Discussion: We show that gene-expression can be used to predict RGAs. Unsupervised analysis of gene-expression data showed options to refine and discover new subtypes and to improve clinical decision-making.

N-028: Drug Repurposing Identification and Prioritization for Polycystic Kidney Disease by Gene Signature Reversion
COSI: TransMed
  • T.C. Howton, UAB, United States
  • Brittany Lasseigne, UAB, United States
  • Elizabeth Wilk, UAB, United States
  • Michal Mrug, UAB, United States
  • Bradley Yoder, UAB, United States


Presentation Overview: Show

Autosomal dominant polycystic kidney disease (ADPKD), characterized by renal cyst expansion, also manifests across multiple systems and is one of the most prevalent monogenic human diseases, impacting approximately 0.04-0.1% of people. However, it currently only has one FDA approved treatment, Tolvaptan, a vasopressin V2-receptor antagonist that is very costly and causes severe adverse effects including liver toxicity. Therefore, drug repurposing provides a promising opportunity for helping patients by significantly decreasing the time and price involved in traditional drug discovery. The vast majority of ADPKD is caused by a variant in PKD1 or PKD2. As gene expression profiles from ADPKD patients and pre-clinical models have significantly altered transcriptomic signatures, this presents an opportunity to compare the PKD disease-specific signature, against FDA approved drug signatures to identify candidates that may reverse the cellular phenotypes. Here, we applied signature reversion methods to determine drug repurposing candidates for ADPKD using publicly available Pkd2 mouse data and detected inversely related molecular signatures from the LINCS treatment perturbation database. Drug candidates were further prioritized by annotating drugs by mechanism of action (MOA), FDA status, targets, side effects, and functional enrichment analysis.

N-029: Perturbed Transcriptomic Analyses Identify Chemo-immunotherapy Synergisms to Shift Anti-PD1 Resistance in Cancer
COSI: TransMed
  • Yue Wang, UNIVERSITY OF PITTSBURGH, United States
  • Dhamotharan Pattarayan, UNIVERSITY OF PITTSBURGH, United States
  • Min Zhang, UNIVERSITY OF PITTSBURGH, United States
  • Da Yang, UNIVERSITY OF PITTSBURGH, United States


Presentation Overview: Show

Immune checkpoint blockade (ICB) prompts a revolution in cancer treatment, but its low response rate and high resistance remains a problem. Here, we reported a novel algorithm to reliably predict chemo-ICB synergism for overcoming ICB resistance, terming as Perturbed Transcriptome-based Synergism Prediction for ICB-Chemotherapy Combinations (PerTSynIC). Through a clinical response-guided feature selection procedure, we established that treatment-induced gene expression changes (TECs) are among the major determinative phenotypes for anti-PD1 response in melanoma. Through integrating one million perturbed transcriptomes of cancer cell lines treated with ten thousand genetic and pharmacological inhibitors from high-throughput screening studies, PerTSynIC identified chemo-/targeted agents who can induce TEC shifting between anti-PD1 non-responders and responders. These agents include MEKi, HDACi and CDKi, whose synergism with ICBs have been reported in clinical practice. PerTSynIC characterized 23 top synergy target genes whose genetic and pharmacological inhibition share consistent TEC shift ability in melanoma. Among these genes, PAK4 and its pharmacological inhibitors are identified. In vitro assay validated that treatment of PAK inhibitors on melanoma cell MEL526 can induce significant dose-dependent activation of antigen processing/presentation and type II interferon signaling. Our study provides a reliable prediction method for chemo-ICB synergism, which will help cancer patients better cope with immunotherapy resistance.