Posters - Schedules
Poster presentations at ISMB/ECCB 2021 will be presented virtually. Authors will pre-record their poster talk (5-7
minutes) and will upload it to the virtual conference platform site along with a PDF of their poster beginning July 19
and no later than July 23. All registered conference participants will have access to the poster and presentation
through the conference and content until October 31, 2021. There are Q&A opportunities through a chat
function and poster presenters can schedule small group discussions with up to 15 delegates during the conference.
Information on preparing your poster and poster talk are available at:
https://www.iscb.org/ismbeccb2021-general/presenterinfo#posters
Ideally authors should be available for interactive chat during the times noted below:
View Posters By Category
Session A: Sunday, July 25 between 15:20 - 16:20 UTC |
Session B: Monday, July 26 between 15:20 - 16:20 UTC |
---|---|
Session C: Tuesday, July 27 between 15:20 - 16:20 UTC |
Session D: Wednesday, July 28 between 15:20 - 16:20 UTC |
---|---|
Session E: Thursday, July 29 between 15:20 - 16:20 UTC |
---|
Short Abstract: Bladder cancer (BC) is one of the most common cancer types worldwide, with both high incidence (i.e. 550,000 new cases in 2018) and mortality (i.e. 200,000 deaths in 2018) rates. The molecular heterogeneity of the disease, despite resulting in similar clinical manifestation, frequently limits the response to therapies, thus complicating clinical management. Drug repurposing has been established as a promising approach that simultaneously accelerates- and reduces cost- of drug development. Here, we present the ongoing development of a data-driven, molecular-based framework for drug repurposing in BC. The approach builds upon defining molecular signatures of BC through integration of multi- omic/parametric data (proteomics, transcriptomics, clinical data) and BC-associated features extracted literature and specialized resources (e.g. DisGeNet, Reactome, etc.). Integrated BC molecular signatures will be used to define potential drug candidates for repositioning in BC that are able to reverse (specific parts of the) disease signature. Most promising candidates will be shortlisted for further in vitro and in vivo experimentation through assessment of novelty and retrieval of existing information about drugs (e.g. safety, toxicity, FDA approval) from publicly available databases. Finally, we believe our framework could serve as an example for future data-driven drug repurposing explorations of other human diseases or conditions.
Short Abstract: Cancer development is driven by the accumulations of somatic mutations. Modern sequencing technologies have the ability to generate large-scale mutational datasets from tumor samples. From these datasets we can identify patterns of co-occurring mutations, also known as mutational signatures, which are produced by exposure to carcinogens or aberrant endogenous processes. We developed an R/shiny Graphical User Interface (GUI) for the musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) package to provide a framework for discovery and analysis of mutational signatures. This user-friendly interface streamlines the musicatk pipeline by providing a step-by-step workflow that includes preprocessing, deconvolution, and downstream exploratory analysis. The application can import multiple file formats including mutation annotation format (MAF) and variant call format (VCF). A wrapper for TCGABiolinks package is also available to allow for the import of TCGA data. Latent Dirichlet Allocation (LDA) and Negative Matrix Factorization (NMF) algorithms are provided for predicting signatures. Modules for downstream analysis include clustering of tumors into subgroups, comparing discovered signatures to those in the COSMIC dataset, identifying differentially exposed signatures, and plotting sample-exposure heatmaps. The musicatk platform can facilitate carcinogenesis research and establish probabilistic models to predict tumor etiology in patients.
Short Abstract: In silico assessment of drug toxicity is becoming a critical step in drug development. Existing models are limited by low accuracy and lack of interpretability. Further, they often fail to explain cellular mechanisms underlying structure-toxicity associations. We addressed these limitations by incorporating target profile as an intermediate connecting structure to toxicity. To accommodate for high-dimensional feature space, we developed a pipeline that can identity subset of predictive features. We implemented our pipeline to study 569 targets and 815 adverse events. The features identified by our pipeline comprise less than ten percent of the original feature space, nevertheless, they accurately predicted binding outcomes for 377 targets and toxicity outcomes for 36 adverse events. We demonstrated that predictive targets tend to be differentially expressed in the tissue of toxicity. We rediscovered key cellular functions associated with cardiotoxicity from the predictive targets, as well as markers of skin and liver diseases. We found evidence supporting diagnostic/therapeutic applications of some predictive targets in hepatotoxicity and nephrotoxicity. Our findings highlighted the critical role of predictive targets in cellular mechanisms leading to toxicity. In general, our study improved the interpretability of toxicity prediction without sacrificing accuracy. Our novel pipeline may benefit future studies of high-dimensional datasets.
Short Abstract: Genomic predictors of sensitivity to radiation therapy have yet to translate to clinical practice. Several models based on gene expression have been trained on small sample, in vitro dose-response data, and some have shown predictive capability in clinical cohorts. This work aimed to investigate a selection of published transcriptome-based radiosensitivity predictors on a larger in vitro dataset.
Two cancer cell line datasets (NCI60 [n=60] and Cancer Cell Line Encyclopaedia (CCLE) [n=522]) with available in vitro radiosensitivity measurements were used to test 7 published radiosensitivity signatures. Models were tested using reported parameters or principal component regression. To benchmark results, random signatures of varying sizes were produced by sampling from all genes available in the dataset along with an intercept only model.
No published model tested outperformed the 2 standard deviation limit for mean error of randomly sampled gene signatures of a similar size, and some had higher errors than an intercept only model. Poor performance of signatures suggests a need for model improvement which may be aided by greater sample size, improved modelling methods, incorporation of multiomics and external validations. Further assessment of radiosensitivity signatures in clinical cohorts using suitable null hypotheses and adjustment for confounding is needed.
Short Abstract: Glioma is a type of brain cancer which manifests within the glial cells and has dismal survivability and grave impact on the patients' quality of life. The life expectancy of glioblastoma (the most aggressive subtype of glioma) patients remains a few months, despite multimodal treatments that include surgery, radiation, and chemotherapy. Raman spectroscopy is a non-destructive chemical analysis technique that can be used to identify detailed molecular fingerprints of the sample. It has recently been used successfully in optimizing brain tumor surgeries through detection of tumor barriers and in deep learning classification of tumors, demonstrating its promise to characterize key aspects of tumor tissues. Our hypothesis is that Raman spectra can be used to separate tumor regions from non-tumor regions (for example, blood or necrotic cells). We use Raman spectroscopy to analyze glioma tumor samples extracted from 45 patient tumors. We analysed the spectra by utilizing agglomerative clustering, a form of unsupervised machine learning. We found that the majority cluster matches very well the tumor spots characterized by the frequency criterion for three representative results. The average accuracy over all samples was 90.3%, the average precision was 99.6% and the average recall was 90.2%.
Short Abstract: Recent work has shown that high tumor mutation burden (TMB-H) could result in an increased number of neoepitopes from somatic mutations expressed by a patient’s tumor cells, which can be recognized and targeted by neighboring tumor-infiltrating lymphocytes (TILs). A deeper understanding of the spatial heterogeneity and organization of tumor cells and their neighboring immune infiltrates could provide new insights into the biology of tumor progression and treatment response, including immunotherapy. We developed and applied computational approaches using digital whole slide images (WSIs) to investigate the spatial heterogeneity and organization of regions harboring TMB-H tumor cells and TILs within tumors, and their impact in prognostic and predictive utility. In experiments using WSIs from The Cancer Genome Atlas bladder cancer (BLCA), our findings show that WSI-based approaches can reliably predict patient-level TMB status, delineate spatial TMB heterogeneity and identify co-organization with TILs. TMB-H patients with low spatial heterogeneity enriched with high TILs showed improved overall survival. Furthermore, we evaluated our models using BLCA patients treated with immunotherapy from the real-world clinical setting. Our results indicate both prognostic and predictive roles for image-based TMB and TILs.
Short Abstract: Clinical cohort study data often build the foundation for data-driven Alzheimer’s disease (AD) progression modeling. The employment of specific inclusion and exclusion criteria forms the distribution from which study participants are recruited and subsequently introduces a bias into the collected data. Therefore, it remains unclear whether patterns found in one dataset generalize beyond the discovery cohort and are reproducible in independent cohorts.
We used multi-state models (MSM) to data mine AD progression patterns from six distinct cohort datasets. We trained a conceptually same MSM on each dataset and compared the resulting progression signals. Furthermore, we propose a novel technique to cluster cohort datasets based on their similarity of progression.
Our study revealed significant differences in progression signals across cohorts. Investigation of the fitted models elucidated that they learned significantly different, cohort-specific parameters which bias their predictions and can impede model generalization.
Our results emphasize the need for external validation of data-driven results. Given the heterogeneity of AD cohort data, building a single progression model that serves all predictive purposes and is applicable to the entire population seems inconceivable. Instead, to eventually support clinical decision making, subpopulation-specific models that embrace the individual characteristics of a stratified target group seem more promising.
Short Abstract: In Alzheimer’s Disease (AD) the use of digital technologies has gained a lot of attention, because it may help to diagnose the disease in a pre-symptomatic stage. However, before any use in clinical routine, digital measures (DMs) need to be evaluated carefully by assessing their relationship to established clinical scores and understanding their diagnostic benefit. Along these lines, the IMI project RADAR-AD (www.radar-ad.org) evaluates a smartphone based virtual reality game panel that can help to assess cognitive impairment. In our work we applied Variational Autoencoder Modular Bayesian Network (VAMBN) [1] on the virtual reality game data and analysed connections between DMs and cognitive assessments (e.g. Mini Mental State Examination). Based on our model we then predicted DMs within the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. This resulted into a network that allowed us to disentangle and quantify the relationship between DMs, established clinical scores, brain volumes as well as molecular mechanisms. Therefore, DMs may have the potential to act as a vital measure in the diagnosis of AD in a pre-symptomatic stage.
[1] Gootjes-Dreesbach L, Sood M…..Fröhlich H (2020) Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data. Front. Big Data 3:16. doi: 10.3389/fdata.2020.00016
Short Abstract: Data protection laws force hospitals to create data silos, making it difficult to apply machine learning and artificial intelligence methods across distributed datasets. The Personal Health Train is a paradigm proposed within the GO-FAIR initiative to utilize these methods and improve personalized medicine by enabling the learning of more robust models in a distributed setting. Our deployment-ready and open-source Personal Health Train architecture (Figure 1) enables the execution of arbitrarily complex analysis pipelines with a strong focus on security. Without transferring the data to a central analysis site, container technologies allow the user to run a wide range of algorithms iteratively among participating hospitals. Our framework is dynamically extensible to accommodate the particular needs of researchers and hospitals. After deployment of a station at a hospital, no further installation steps are required. A hospital never gives up control over its data and can independently decide to join in an analysis. We demonstrate our framework's capabilities for raw genomic analysis, including homomorphically encrypted count queries and deep neural networks applied to image data (Figures 2, 3).
Short Abstract: Genome-driven precision medicine seeks to improve diagnosis and pair the right patient with the right drug at the right time. However, understanding and interpreting the cellular impact of molecular variation for both purposes continues to remain a major challenge. The UAB Center for Precision Animal Modeling (C-PAM) aims to efficiently analyze and model such disease-associated variants through the development and application of computational approaches and by developing preclinical models. As a component of this Center, the C-PAM Bioinformatics Section (BIS) has integrated computational biology and data science methods to generate an analytical suite that supports the review, prioritization, interpretation, and selection of variants for model organism studies. We have shown that our BIS deep learning-based methods automate the variant annotation, classification, and prioritization of variants and compare favorably to expert geneticist interpretation. Additionally, we are also developing and applying ensemble machine learning methods with rank-based prioritization of novel targets and repurposed drugs to enhance downstream therapeutic testing. Here we describe our platform and methodology, present findings from proof of principle studies, and discuss diagnoses and drug prioritization for C-PAM cases through rare, misdiagnosed, or undiagnosed disease program cases at UAB and collaborating institutions.
Short Abstract: Oropharyngeal squamous cell carcinoma (OPSCC) and oral squamous cell carcinoma (OSCC) cases are rising worldwide, specifically among young individuals in the South Asian region with risk factors including use of tobacco, gutka, alcohol and human papilloma virus (HPV) infections. HPV, specifically the subtype HPV16, is an established risk factor in the initiation and progression of OPSCC, compared to OSCC. The study aimed to carry out a comparative analysis of protein-protein interactions (PPI) between HPV and human proteins, particularly those encoded by key cancer genes in OPSCC and OSCC. We constructed a HPV-Human PPI network using interaction data from BioGRID. Next, we identified cancer genes, for OPSCC and OSCC, using data retrieved from key databases (NIH GDC, COSMIC and cBioPortal) and conducted network analysis along with enrichment analysis of gene ontology annotations and KEGG Pathway labels. Six human proteins (PML, JAK1, CDK4, RB1, ZBTB16 and MYC) identified, were unique to HPV interactions in OPSCC. Two unique proteins (SMAD4, TRAF3) were interacting with HPV proteins in OSCC. Furthermore, three proteins (CREBBP, EP300, TP53) had common HPV-Human protein interactions between the two cancers. Our next step is to experimentally validate the putative role of these proteins in the initiation and/or progression of OPSCC.
Short Abstract: Head and neck cancer is the sixth leading cause of cancer across the globe and is prevalent in South Asian countries. Prediction of pathological stages of cancer can play a pivotal role in early diagnosis and personalized medicine. This project ventures into the prediction of different stages of head and neck squamous cell carcinoma (HNSCC) using prioritized DNA methylation patterns. DNA methylation profiles for each HNSCC stage (I-IV) were used to analyze 485,577 methylation CpG sites and prioritize them on the basis of the predictive power using a wrapper-based feature selection method, along with different classification models. We identified 68 methylation sites which predicted the pathological stage of HNSCC samples with 90.62 % accuracy using a Random Forest classifier. We set out to construct a PPI network for the proteins encoded by the 67 genes associated with these sites to study its network topology and also undertook enrichment analysis of nodes in their immediate neighborhood for GO and KEGG Pathway annotations which revealed their role in cancer-related pathways, cell differentiation, signal transduction, metabolic and biosynthetic processes. With information on the predictive power of each of the 67 genes in each HNSCC stage, we unveil a dynamic stage-course network for HNSCC.
Short Abstract: Pediatric Acute Myeloid Leukemia (AML) with a FLT3 internal tandem duplication mutation (FLT3-ITD) is a challenging disease due to poor outcomes in many patients. The 4-year progression free survival is still only 31%. Current biomarkers are insufficient to predict why certain patients with FLT3-ITD AML relapse and others do not. The development of prognostic biomarkers in FLT3-ITD pediatric AML may help improve the outcomes and management of these patients. In order to develop new biomarkers and identify therapeutic targets, we performed single cell RNA sequencing on a panel of 37 diagnostic samples and 18 paired diagnostic/relapse samples from FLT3-ITD patients that did not and did relapse respectively. Using this RNA sequencing dataset comprised of over 250k single cells, we first investigated if the frequency of specific clusters of AML cells may help predict patient outcome. We found several clusters that were significantly different between patients that did and did not relapse. Using these clusters, we deconvoluted publicly available RNAseq data to build prediction models using scRNAseq clusters. We found these prediction models to be highly specific as a prognostic biomarker for pediatric AML (p << 0.01) and were better than models trained using genes from the publicly available RNAseq data.
Short Abstract: Predicting the sensitivity of tumor cells to specific anti-cancer therapy is a task of paramount importance for precision medicine. Several research groups have approached this task in the past decade by integrating multi-omics data with machine learning. Deep learning has achieved high-level performance compared to other methods. The relative half-maximal inhibitory concentration has been most commonly used to predict drug response in the literature. Here, we target other drug response metrics like Maximal Effect and Area Under the Curve, which are more informative in distinguishing between effective and ineffective drugs. Integrating large-scale multi-omics data from different sources is especially challenging due to varying experimental conditions resulting in significant inconsistencies. We addressed this problem by homogenizing data generated from various experiments on cancer cell lines and integrated them to be modeled by deep neural networks. We evaluated single cancer – single drug feed-forward neural networks using the gene expression data and achieved a correlation coefficient of 91%. We will also include mutation and copy number variation data and assess model performance. The final aim is to apply neural networks to rare disseminated tumor cells via transfer learning.
Short Abstract: Background
Rare cancers are diagnosed in less than 6 out of 100,000 cancer patients per year. Their molecular characteristics and treatment options are still not well defined. ARCAGEN aims to explore clinical and molecular information of patients with rare cancers across Europe.
Methods
Tumor samples and clinical data were collected and successfully screened for 77 patients with rare cancers: 41 sarcomas, 9 yolk sac tumors, 14 rare head and neck cancers, and 13 thymomas. Molecular analysis was performed using FoundationOne Heme for sarcomas and FoundationOne CDx assay for other histologies. Findings were compared to Foundation Medicine dataset for common cancers.
Results
Most patients reported some genomic alterations (89%), mostly in genes that regulate cell cycle (TP53, RB1, CDKN2A/B, MDM2), as well as in RAS/RAF family. Direct actionable mutations for which there is a treatment approved in Europe within the patient’s tumor type were detected in 4 cases (4.7%), whereas such actionable mutations were reported in 58% of samples from common cancers. Moreover, fewer cases with no treatment recommendation were present in common cancers than in Arcagen retrospective cohort (9% vs 51.8%). This highlights the need for new studies that focus on molecular analysis, biomarker discovery and treatment in rare cancers.
Short Abstract: When patients with an underlying autoimmune condition such as juvenile idiopathic arthritis or lupus report life-threatening symptoms, physicians need to quickly determine whether these symptoms are caused by an acute infection or a complication of their autoimmune condition. As immunosuppressive drugs are harmful to someone undergoing an infection, accurate and timely diagnosis is critical. In recent years, host-response-based diagnostics have shown promise in accurately and non-invasively diagnosing a number of infectious and autoimmune diseases.
Here, we collected and curated blood transcriptome profiles of 14,587 patients from 42 countries across 122 independent datasets and grouped them into infectious, autoimmune, and healthy control categories. Using a novel statistical framework, we created two gene signatures from this data: one to differentiate patients with autoimmune or infectious diseases from healthy individuals and another to differentiate between patients with autoimmune or infectious diseases. Both signatures achieve an area under the receiver operating characteristics curve (AUROC) of >0.87 on completely independent datasets. Because our training and testing data included heterogeneity across many factors, these gene signatures can be utilized in diverse clinical populations. Furthermore, these signatures can aid physicians across a broad range of clinical scenarios, where existing diagnostics are invasive, expensive, or non-specific.
Short Abstract: T-cell prolymphocytic leukemia (T-PLL) is a rare cancer with poor overall survival. T-PLL diagnosis criteria consider the presence of clonal prolymphocytic T-cells in combination with increased white blood cell counts and complex chromosomal aberrations. The proto-oncogene TCL1A and tumor suppressor ATM are putative drivers of T-PLL development. Despite an improved understanding of complex molecular alterations, little is known about the existence of T-PLL subtypes. We performed an analysis of gene expression profiles of 70 T-PLL patients by hierarchical clustering revealing three robust T-PLL subgroups. These subgroups did not show strong significant differences in survival, but patients of the subgroup that was co-clustered together with normal references had the worst chances. Further analyses revealed several similarities of the subgroups at the level of individual genes, signaling and metabolic pathways and alterations of gene regulatory networks, but each subgroup also had its specific molecular characteristics. These differences were mainly reflected at the gene expression level, whereas gene copy number profiles of the subgroups were much more similar to each other except for a few characteristic differences. Especially, major regulators identified by our network approach could potentially contribute to future developments of improved T-PLL stratification systems and design of targeted treatment strategies.
Short Abstract: Investigating disease-causing and highly expressed genes can support finding the root causes of uncertainties in patient care. However, independent, and timely high-throughput next generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Here, we present GVViZ, a robust bioinformatics, user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation, and expression analysis with dynamic heat map visualization. GVViZ has potential to find patterns across millions of features and extract actionable information, which can support early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patient’s transcriptomics data with public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and disease phenotypes with greater visibility and customization. Experts can use GVViZ to visualize and interpret transcriptomics data making it a powerful tool to study the dynamics of gene expression and regulation.
Short Abstract: Oral squamous cell carcinoma (OSCC) is a subset of head and neck cancer (HNSCC), the seventh most common cancer worldwide, and accounts for more than ninety percent of oral malignancies. Early detection of OSCC is essential for effective treatment and reducing the mortality rate. However, the gold standard method of microscopy-based histopathological investigation is often challenging, time-consuming and relies on human expertise. Automated analysis of oral biopsy images can aid the histopathologists in performing a rapid and arguably more accurate diagnosis of OSCC. In this study, we present deep learning (DL) based automated classification of 290 normal and 934 cancerous oral histopathological images published by Tabassum et al (Data in Brief, 2020). We utilized a transfer learning approach by adapting three pre-trained DL models to OSCC detection. VGG16, InceptionV3, and Resnet50 were fine-tuned individually and then used in concatenation as feature extractors. The concatenated model outperformed the individual models and achieved 96.66% accuracy (95.16% precision, 98.33% recall, and 95.00% specificity) compared to 89.16% (VGG16), 94.16% (InceptionV3) and 90.83% (ResNet50). These results demonstrate that the concatenated model can effectively replace the use of a single DL architecture.
Short Abstract: Oral squamous cell carcinoma (OSCC) is the most common malignant epithelial neoplasm of head and neck region in South Asian countries, with a 5-year survival rate between 20 to 50%. In this study, we perform a metanalysis of five gene expression datasets (GSE23558, GSE25099, GSE30784, GSE37991 and TCGA-OSCC) that produced 1851 statistically significant differentially-expressed genes (DEGs) in OSCC. The DEGs were involved in key biological pathways that putatively drive the progression of OSCC. A comprehensive protein-protein interaction (PPI) network was constructed for proteins encoded by the DEGs to study the topology including hubs and top modules using Cytoscape. Next, 125 DEGs from top modules were mapped with antineoplastic agents using the L1000CDS2 server. We found 37 perturbing agents out of which 12 FDA-approved antineoplastic agents (Teniposide, Palbociclib, Etoposide, Fedratinib, Tivozanib, Afatinib, Vemurafenib, Mitoxantrone, Idamycin, Canertinib, Dovitinib and Selumetinib) which showed interactions with over-expressed DEGs were selected for further study. Next, the candidate antineoplastic agents are now taken into an in vitro drug screen against a library of primary cell lines obtained from tumours of OSCC patients of local, South Asian origin.
Short Abstract: The identification of significant changes in Gene Regulatory Networks (GRNs) under different response groups can help discover novel molecular diagnostics and prognostic signatures.
In this work, we present a computational method, TraRe, which combines unsupervised learning and non-parametric testing to mechanistically understand how transcription networks are differentially regulated.
We applied TraRe on RNAseq data of metastatic Castration-Resistant Prostate Cancer (CRPC) patients from the PROMOTE clinical study (NCT 01953640) treated with abiraterone (ABI). Rewired GRNs between ABI- responders and non-responders were found to be enriched in genes down-regulated in prostate cancer samples, as well as in transcription factors (TFs) involved in the androgen receptor signaling pathway as well as associated with other cancers. Further MDX1, a TF that acts as a transcriptional repressor and is a candidate tumor suppressor, is among the top rewiring-specific TFs.
Key rewired TF-target relationships were validated in vitro via qRT-PCR. After knock-down of the top TFs, expression levels of four key genes were significantly changed between parent cell lines and ABI-resistant cell lines.
TraRe efficiently uncovers GRNs from high-throughput sequencing data, performing differential network analysis that unravel phenotype-specific regulatory disruptions.
Short Abstract: Type 1 diabetes (T1D) is an autoimmune disease characterized by the progressive loss of pancreatic beta-cells due to dysregulation of pancreatic T-cells. However, the exact mechanisms and interplay of T-cell subpopulations remains elusive.
We present a multi-omics immune profiling study to characterize and compare pancreatic T-cell populations of four healthy non-autoimmune prone (BalbC) and non-obese diabetic (NOD) mice with islet autoimmunity at a single-cell level to discover T1D-driving mechanisms. We identified nine T-cell subtypes including a CD4+Foxp3+CD25lowHELIOS+ cluster involved in autoimmunity regulation, with downregulated Hspa8 and Lag3 in NOD, potentially explaining their dysfunctionality in T1D. We also identified an effector CD8+ cluster with high level of clonal expansion and gene expression related to T1D according to overenrichment analysis. Overall, we observed a cell-type compositional shift in all CD4+ and CD8+ related clusters towards a significative increase in the number of cells in samples with islet autoimmunity. Specifically, we detect a dramatic increase in CD8+ effector cluster in comparison to regulatory T-cells providing information about the possible compositional imbalance as a relevant factor behind T1D development.
Our preliminary results provide first insights into the T-cell imbalance associated with T1D development and dysfunctional regulation of different lymphocytic cell populations in the pancreas.
Short Abstract: Pancreatic cancers (PCs) are among the most deadly solid-tumor cancers and often do not show specific symptoms. Because of late diagnostics, only 10% of the patients cross a 5-year survival. We applied a data deconvolution method based on consensus independent component analysis (ICA) to transcriptomes of 183 pancreatic tumors from TCGA. By mapping the tumors into the space defined by independent components, it is possible to disentangle the activity of various biological processes, show potential technical effects or make predictions about abundance of specific cells. Previously we reported components specific to normal pancreas activity (secretion), stroma (neoangiogenesis), infiltrated immune and tumor cells (cell cycle, hypoxia, keratinization, etc.). Interestingly, we also observed a strong linkage between component weights and visual features of corresponding hematoxylin/eosin staining slides. For example, tumor tissues from samples with a strong cell cycle component showed also a high degree of pleomorphism, higher cell density and mitoses. In contrast, samples with weak cell cycle and strong secretion-related components were almost normal histologically, with clear ductal structures and low number/absent mitotic cells. Therefore, ICA of transcriptomic data from PC patients recapitulates the histopathological properties. We are currently developing a method that would link deconvolved molecular profiles with histological features.
Short Abstract: Despite breast feeding impact being well-known, post-partum evolution of human breast milk constituents remain poorly understood. To give new insights to it with a cutting-edge acquisition technique, we aim at combining data obtained from four different types of molecular families assayed in EBM and analysed through suitable multi-omics statistical tools.
Milk samples (n=257) were collected from days 2 to 6 within the EDEN mother-child cohort (Berdi et al., 2019). Untargeted analyses of oligosaccharides (HMOs), lipids and metabolites were performed while targeted analysis of numerous cytokines, growth factors and antibodies was achieved. After correcting a strong effect of the collection center, single-omic data analysis was performed to assess the evolution of each type of molecules family independently (univariate tests, PLS-DA). Multi-block approaches (multi-block PLS-DA and WGNCA) were used to highlight potential associations between different types of variables.
We evidenced that HMOs, lipids and metabolites have stronger temporal variations than cytokines. Interestingly, multi-block methods infer associations between families of molecules, notably, cytokines with specific metabolites.
Combination of various omics approaches provided an unprecedented exhaustive view of the biochemical composition of BM. The further association of global milk composition with mother exposure or with infant health outcomes could lead to establishing relevant biomarkers.
Short Abstract: Sequencing of cell-free DNA in the blood of cancer patients (liquid biopsy) provides attractive opportunities for early diagnosis, assessment of treatment response, and minimally invasive disease monitoring. To unlock liquid biopsy analysis for pediatric tumors with few genetic aberrations, we introduce an integrated genetic/epigenetic analysis method and demonstrate its utility on 241 deep whole genome sequencing profiles of 95 patients with Ewing sarcoma and 31 patients with other pediatric sarcomas. Our method achieves sensitive detection and classification of circulating tumor DNA in peripheral blood independent of any genetic alterations. Moreover, we benchmark different metrics for cell-free DNA fragmentation analysis, and we introduce the LIQUORICE algorithm for detecting circulating tumor DNA based on cancer-specific chromatin signatures. Finally, we combine several fragmentation-based metrics into an integrated machine learning classifier for liquid biopsy analysis that is tailored to cancers with low mutation rates while exploiting widespread epigenetic deregulation. Clinical associations highlight the potential value of cfDNA fragmentation patterns as prognostic biomarkers in Ewing sarcoma. In summary, our study provides a comprehensive analysis of circulating tumor DNA beyond recurrent somatic mutations, and it renders the benefits of liquid biopsy more readily accessible for childhood cancers.
Short Abstract: Traditional drug discovery faces a severe efficacy crisis. Repurposing of registered drugs provides an alternative with lower costs, reduced risk, and faster clinical application. The underlying mechanisms of complex diseases are best described by disease modules. These modules represent disease-relevant pathways and contain potential drug targets which can be identified in silico with network-based methods. The data necessary for the identification of disease modules and network-based drug repurposing are scattered across independent databases, moreover, existing studies have been limited to predictions for specific diseases or non-translational algorithmic approaches. Hence, there is an unmet need for adaptable tools allowing biomedical researchers to employ network-based drug repurposing approaches for their specific use cases. We close this gap with NeDRex, an integrative and interactive platform for network-based drug repurposing. NeDRex integrates different data sources covering genes, proteins, drugs, drug targets, disease annotations, and their relationships, resulting in a network with 350,142 nodes and 14,127,004 edges. NeDRex allows for constructing heterogeneous biological networks, mining them for disease modules, and prioritizing drugs targeting disease mechanisms. NeDRex generalizes the approach implemented in our previous work for COVID-19 drug repurposing, CoVex (doi.org/10.1038/s41467-020-17189-2), to be applicable for other diseases.
Short Abstract: Cell proliferation, differentiation, and apoptosis are three main biological processes that collaboratively can cause tissue abnormality incurring to cancer (Evan & Vousden, 2001). This fact can be very useful in case of metastasis that tumor growth affects other organs. It indicates that targeting the common genes involved in these processes can prevail metastasis (Nazarieh & Helms, 2019). But carrying multiple diseases does not restrict to just metastasis. e.g. a patient can suffer from colon cancer and simultaneously from diabetes.
The need for personalized medicine manifests since different types of biological factors cooperate together to emerge a specific condition that causes a disease-specific drugs do not affect a patient suffering from several diseases. Therefore, building a model that integrates multiple factors weighing them proportionally becomes imperative. Here, I propose a pipeline comprising of six stages to address patient-specific condition which leads to a more accurate diagnosis and treatment such as image processing for the disease diagnosis and model evaluation, single-cell data analysis to identify e.g. cancer stem cells, multiomics data integration for the enhancing of the model, e.g in case of edge prediction, personalized network modelling, identification of personalized biomarkers, and identification of mutations causing disease, respectively.
Short Abstract: Background: Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes.
Results: We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the-art cancer gene prioritization methods on lung, breast, and pan-cancer datasets.
Conclusions: Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.
Short Abstract: The key principle of recent drug repurposing methods is an efficacious drug will reverse the disease molecular ‘signature’ with minimal side-effects. This principle was defined and popularized by the influential ‘connectivity map’ study in 2006 regarding reversal relationships between disease- and drug-induced gene expression profiles, quantified by a disease-drug ‘connectivity score.’ Over the past 15 years, several studies have proposed variations in calculating connectivity scores towards improving accuracy and robustness in light of massive growth in reference drug profiles. However, these variations have been formulated inconsistently using various notations and terminologies even though various scores are based on a common set of conceptual and statistical ideas. Here, we present a systematic reconciliation of multiple disease-drug similarity metrics and connectivity scores by defining them using consistent notation and terminology. In addition to providing clarity and deeper insights, this coherent definition of connectivity scores and their relationships provides a unified scheme that newer methods can adopt, enabling the computational drug-development community to compare and investigate different approaches easily. This resource will be available as a live document (jravilab.github.io/connectivity_scores) coupled with a GitHub repository (github.com/JRaviLab/connectivity_scores) to facilitate the continuous and transparent integration of newer methods.
Short Abstract: Analysis by a clinical pathologist is the gold standard for preclinical histological analysis but may be difficult to obtain due to the cost and availability of their services. As an alternative we have developed a digital pathology pipeline to segment, grade, and analyze lung adenocarcinoma tumors. This convolutional neural network (CNN) was trained to classify normal lung tissue, normal airways, and the different grades (1 – 4) of lung adenocarcinoma from 36,000 224x224 pixel image patches (~6,000 patches per class) extracted from hematoxylin and eosin-stained sections collected from 4 different mouse models.
As a test of our CNN, we analyzed two mouse models to better understand the role of TAp73 in lung adenocarcinoma: KrasG12D/+ (“K”) and KrasG12D/+;TAp73fltd/fltd (“TK”). Both human raters and our CNN reported a significant increase in the tumor burden of the compound mutant “TK” mice compared to the single mutant “K” mice. The higher grading resolution provided by our CNN showed the increased tumor burden observed in the “TK” mice was due to expansion of Grade 2 regions within higher grade tumors. Future work will expand this tool into a multidimensional digital pathology pipeline that can accelerate current investigations and reveal new therapeutic targets and prognostic markers.
Short Abstract: Increasingly, medical datasets link multiple domains thereby furthering their potential to uncover new knowledge and develop our understanding of diseases. However, integration and analysis of large orthogonal datasets is challenging.
We have successfully applied several quality control techniques to a multi-modal colon cancer dataset, containing unstructured data and various encodings, including: information extraction for eleven variables from free-text, identification of three variable pairs with internal inconsistencies, and utilisation of an information theoretic approach to support ten variable merges. We also developed methods for numeric encoding of non-numeric health data, hierarchical clustering of data completeness, and record-keeping of modifications for review. Semantic relationships in medical ontologies can be used to enrich medical datasets prior to analysis. We have developed an ontology-agnostic method to identify semantic commonalities between dataset variables when mapped to ontological entities, demonstrated with SNOMED CT and the Gene Ontology. Variables are then aggregated by their commonalities and aggregations are appended to the dataset.
We anticipate that the improved quality, structuring, and encoding of the data, as well as the added semantic information, will facilitate improved performance and interpretability of subsequent analyses in health datasets. We are currently developing an R package to share these approaches with the community.
Short Abstract: Since the early 1990s, fluorescence microscopy-based assessment of chromosomal copy number variation has been superseded by sequence-based methods such as comparative genomic hybridization (CGH) and next generation sequencing. Sequence-based methods are more efficient and have an improved resolution. However, they are presently not quantitative. Instead, mean-normalized relative copy numbers (relating to an unknown ploidy level) are commonly processed to reconstruct absolute copy numbers. In our hands, published algorithms did not come up to our needs. Specifically, they did not allow to handle differences in individual sample quality and technical artifacts that are difficult to foresee. Because generally few disseminated or circulating tumor cells can be analyzed per patient, every sample is valuable. Accordingly, we developed a semi-automated graphical workflow using R Shiny that performs automated processing and subsequently assists users in manually adjusting genomic profiles for apparent artifacts. The automated pipeline includes new copy number estimation methods and an option to mount an external classifier (in our case, a classifier resting on the Mitelman database). SunShine is presently in use for in-house arrayCGH and low-pass sequencing data. Evaluation on public cancer cell line data is ongoing.
Short Abstract: Achilles’ heel relationships arise when the status of one gene exposes a cell’s vulnerability to perturbation of a second gene, providing therapeutic opportunities for precision oncology. Here we present the web server SynLeGG (www.overton-lab.uk/synlegg), developed using R and shiny, that identifies and visualizes mutually exclusive loss signatures in ‘omics data to enable discovery of genetic dependency relationships (GDRs) across 783 cancer cell lines and 30 tissues. SynLeGG depends upon the MultiSEp algorithm for unsupervised assignment of cell lines into gene expression clusters, which provide the basis for analysis of CRISPR scores and mutational status in order to propose candidate GDRs.
Results, generated at both the pan-cancer and tissue-specific level are searchable, allowing the user to recover established relationships, such as synthetic lethality for SMARCA2 with SMARCA4. Proteomics, Gene Ontology, protein-protein interactions and paralogue information are provided to assist interpretation and candidate drug target prioritization. Benchmarking using SynLethDB demonstrates favourable performance for MultiSEp against competing approaches, finding significantly higher area under the Receiver Operator Characteristic curve and between 2.8-fold to 8.5-fold greater coverage. We hope SynLeGG will expedite the clinical positioning of existing therapies and the discovery of more focused and effective cancer treatments.
Short Abstract: There are several instances in the Cancer Genome Atlas (TCGA) where samples from patients are not fully characterized. Triple-negative breast cancer (TNBC) is a type of breast cancer lacking the expression of estrogen receptors, progesterone receptors, and human epidermal growth receptor-2. This study aimed to use machine learning (ML) to predict the receptor expression status of uncharacterized samples. We used the pan-cancer TCGA 2016 dataset and grouped instances into training and test sets. After evaluating performances of six different ML classifiers, we chose J48 for the prediction of test set classes due to its consistent precision and sensitivity among all the receptor subtypes. TNBC was contrasted with non-TNBC (nTNBC). For validation, we identified proteins that were differentially expressed (DEP) between the two groups as well as find the pathways overrepresented between TNBC and nTNBC, using the training set and then the test set (with predicted classes). We also used these DEPs to analyze protein-protein interactions using the STRING network in Cytoscape. We identified 8 common DEPs. Activation of the RAS/RAF/MAPK pathway was common to both sets. There were protein-protein interactions common to both sets. Thus, we were able to characterize and validate the uncharacterized receptor status of samples using ML.
Short Abstract: Glial tumors are traditionally classified based on their histological resemblance to normal brain astrocytes and oligodendrocytes. Morphological features are now combined with genetic alterations into integrated diagnoses providing increased accuracy. Isocitrate dehydrogenase (IDH1/2) gene mutations define glial tumor categories associated with longer survival. However, the role of these mutations in malignancy development remains unclear.
IDH1/2 codes for enzymes that metabolizes the isocitrate substrate to alpha-ketoglutarate, a fundamental substrate involved in multiple metabolic pathways. Alpha-ketoglutarate is also the precursor of the two main neurotransmitters, glutamate and GABA. This observation may suggest a close relationship between the two neurotransmitter systems and glioma aggressivity.
In this study, we hypothesized that the tumor expression patterns of GABA and glutamate signaling elements may define clinically relevant glial tumors categories. Analyzes of GABA and glutamate expression profiles from 661 glioma from TCGA recapitulated established glial tumor categories but also defined novel groups with different clinical and cellular characteristics such as an altered tumor immune micro-environment. These findings suggest an important role of glutamate and GABA neurotransmission in gliomagenesis associated with immune micro-environment regulation. This research will deepen the current classification and lead to better understanding of novel druggable neurotransmitter-related signaling pathways in gliomas.
Short Abstract: wtVHL ccRCC account for 5-12% of all ccRCC but have been shown to be more aggressive conferring a worse survival. A portion of wtVHL ccRCC are thought to be TCEB1 ccRCC a novel and contentious subtype. We combine publicly available resources together with 369 ccRCC samples from the University Hospital Zürich Renal Cancer Biobank to identify phenotypic characteristics and molecular changes promoting the aggressive nature of wtVHL ccRCC using histological, genetic, epigenetic, transcriptomic, and proteomic datasets. Using a bidirectional network diffusion method (NetICS) to integrate mutation, CNV, and gene expression data we identify genes central to orchestrating downstream differential gene expression given the upstream aberrations in a protein-protein interaction network. We also apply unsupervised clustering to determine where wtVHL samples lie on the broader spectrum of renal carcinoma given the various omics datasets and with the addition of papillary and chromophobe RCCs. We find ccRCC samples with TCEB1 mutations and VHL aberrations dispelling the notion that these are mutually exclusive. HMGA1 is identified as a key mediator within wtVHL ccRCC. We believe the identified factors promoting and permitting EMT, extracellular matrix degradation, cell mobility, and cell migration result in the more invasive and metastatic phenotype attributed to wtVHL ccRCC tumours.
Short Abstract: The Wearables for Wellness programme is a unique pilot study that aims to leverage the latest in technology, especially increasingly low-cost wearables, and data science to help the primary health care provider in an LMIC context. With an increasing risk of non-communicable diseases especially CVDs in developing countries and a trend of non-adherence to medication that increases in older adults, we set out to monitor the wellness of our patients and to enable the early detection of disease onset. We aim to generate a preliminary dataset by tracking 6 individuals (3 males and 3 females; 3 physically active and 3 non-active) for a period of 60 days (May-June) using the popular Mi Band 5 smartwatch (By Xiaomi; $30) along with an array of four digital health monitoring gadgets including a scale, thermometer, oximeter and blood pressure unit. We measure the following to extract features and undertake machine learning experiments: heart-rate, sleep, steps, calories burnt, exercise, blood pressure, SpO2, body temperature, weight as well as body measurements (chest, abdomen and thighs), daily food intake and daily mental and physical fatigue levels using questionnaires. We hope to begin to address the dearth of data especially in the case of NCDs in LMICs.