Attention Presenters - please review the Presenter Information Page available here
Schedule subject to change
All times listed are in EDT
Monday, July 15th
10:40-11:00
Proceedings Presentation: An Empirical Study on KDIGO-Defined Acute Kidney Injury Prediction in the Intensive Care Unit
Confirmed Presenter: Xinrui Lyu, Department of Computer Science, ETH Zürich, Switzerland, Switzerland

Room: 519
Format: Live Stream

Moderator(s): Iman Hajirasouliha


Authors List: Show

  • Xinrui Lyu, Department of Computer Science, ETH Zürich, Switzerland, Switzerland
  • Bowen Fan, Department of Biosystems Science and Engineering, ETH Zürich, Switzerland, Switzerland
  • Matthias Hüser, Department of Computer Science, ETH Zürich, Switzerland, Switzerland
  • Philip Hartout, Department of Biosystems Science and Engineering, ETH Zürich, Switzerland, Switzerland
  • Thomas Gumbsch, Department of Biosystems Science and Engineering, ETH Zürich, Switzerland, Germany
  • Martin Faltys, Department of Intensive Care, Austin Hospital, Melbourne, Australia, Australia
  • Tobias Merz, Cardiovascular Intensive Care Unit, Auckland City Hospital, New Zealand, Switzerland
  • Gunnar Rätsch, Department of Computer Science, ETH Zürich, Switzerland, Switzerland
  • Karsten Borgwardt, Department of Biosystems Science and Engineering, ETH Zürich, Switzerland, Switzerland

Presentation Overview: Show

Motivation: Acute kidney injury (AKI) is a syndrome that affects up to a third of all critically ill patients, and early diagnosis to receive adequate treatment is as imperative as it is challenging to make early. Consequently, machine learning approaches have been developed to predict AKI ahead of time. However, the prevalence of AKI is often underestimated in state-of-the-art approaches, as they rely on an AKI event annotation solely based on creatinine, ignoring urine output.
Methods: We construct and evaluate early warning systems for AKI in a multi-disciplinary ICU setting, using the complete KDIGO definition of AKI. We propose several variants of gradient-boosted decision trees (GDBT)-based models, including a novel time-stacking based approach. A state-of-the-art LSTM-based model previously proposed for AKI prediction is used as a comparison, which was not specifically evaluated in ICU settings yet.
Results: We find that optimal performance is achieved by using GBDT with the time-based stacking technique (AUPRC=65.7%, compared with the LSTM-based model's AUPRC=62.6%), which is motivated by the high relevance of time since ICU admission for this task. Both models show mildly reduced performance in the limited training data setting, perform fairly across different subcohorts, and exhibit no issues in gender transfer.
Conclusion: Following the official KDIGO definition substantially increases the number of annotated AKI events. In our study GBDTs outperform LSTM models for AKI prediction. Generally, we find that both model types are robust in a variety of challenging settings arising in the ICU.

11:00-11:20
Mapping spatial omics when tissue architecture doesn’t match.
Confirmed Presenter: Patrick Martin, Cedars-Sinai Medical Center, United States

Room: 519
Format: In Person

Moderator(s): Iman Hajirasouliha


Authors List: Show

  • Patrick Martin, Cedars-Sinai Medical Center, United States
  • Kyoung Jae Won, Cedars-Sinai Medical Center, United States

Presentation Overview: Show

The reality of using spatial transcriptomics in a clinical setting is that samples taken from different patients, under different conditions, and at different times will almost always present a drastically different tissue architecture. Yet, to better understand the effects of a diseases or a treatment, comparing similar cell groups between samples is crucial. The challenge of mapping cells between spatial omics samples is to retain the spatial context.
To address this challenge, we developed a context aware mapping approach that finds optimally matching cell pairs between samples. Our algorithm solves a linear assignment problem by minimizing a cost matrix constructed from a variety of cellular contexts including cell similarity, niche similarity, and tissue structure similarity. We provide a highly flexible approach that would allow the use of context specific cost matrices to fine tune cell comparisons across samples. We benchmarked the performance of our method in simulated data to unequivocally demonstrate how spatial context aids in accurately mapping cells across samples – outperforming spatial alignment methods in the process. We demonstrate its broad applicability by mapping cells across different biological samples, different technologies, and different developmental time points. Finally, we demonstrate how the use of mapping scores can be used to stratify patients by clustering patient samples with a similar spatial context.

11:20-11:40
MALAT1 expression consistently indicates cell quality in single-cell RNA and single-nucleus RNA sequencing
Confirmed Presenter: Zoe Clarke, University of Toronto, Canada

Room: 519
Format: In Person

Moderator(s): Iman Hajirasouliha


Authors List: Show

  • Zoe Clarke, University of Toronto, Canada
  • Gary Bader, University of Toronto, Canada

Presentation Overview: Show

Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) have revolutionized our understanding of cell types, as it gives us an unbiased lens through which to view the different cell types that make up a tissue. However, empty droplets and poor quality cells are often captured in the experiment and need to be filtered out to avoid the false interpretation of certain cell types. Many automated and manual methods exist to filter out these cells or droplets, such as minimum RNA count thresholds and comparing the gene expression profile of an individual cell to the overall background expression of the experiment. Recently, a method called DropletQC was developed to calculate the overall RNA splice ratios of cells, and used this to determine if a cell is actually an empty droplet. We have found that MALAT1 expression is correlated with cell quality inferred by DropletQC, and consistently identifies low quality cells across scRNA-seq and snRNA-seq data. MALAT1 is a non-coding RNA retained in the nucleus and ubiquitously expressed across cell types, and low expression values indicate a poor quality cell. Since it is easy to visualize the expression of MALAT1 in single-cell maps, its expression can be used to determine cell quality and improve the interpretation of cells in any tissue. Further, old maps can be reviewed to retroactively determine the quality of different cell populations.

11:40-12:00
Combining DNA and protein alignments to improve genome annotation with LiftOn
Confirmed Presenter: Kuan-Hao Chao, Department of Computer Science, Center for Computational Biology, Johns Hopkins University, United States

Room: 519
Format: In Person

Moderator(s): Iman Hajirasouliha


Authors List: Show

  • Kuan-Hao Chao, Department of Computer Science, Center for Computational Biology, Johns Hopkins University, United States
  • Jakob M. Heinz, Department of Biomedical Informatics, Harvard Medical School, United States
  • Celine Hoh, Department of Computer Science, Center for Computational Biology, Johns Hopkins University, United States
  • Alan Mao, Department of Computer Science, Department of Biomedical Engineering, Johns Hopkins University, United States
  • Alaina Shumate, Department of Biomedical Engineering, Center for Computational Biology, Johns Hopkins University, United States
  • Mihaela Pertea, Department of Computer Science, Department of Biomedical Engineering, Johns Hopkins University, United States
  • Steven L Salzberg, Department of Computer Science, Department of Biomedical Engineering, Johns Hopkins University, United States

Presentation Overview: Show

As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open-reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta.

12:00-12:20
Proceedings Presentation: Optimal Phylogenetic Reconstruction of Insertion and Deletion Events.
Confirmed Presenter: Sanjana Tule, The University of Queensland, Australia

Room: 519
Format: In Person

Moderator(s): Iman Hajirasouliha


Authors List: Show

  • Sanjana Tule, The University of Queensland, Australia
  • Gabriel Foley, The University of Queensland, Australia
  • Chongting Zhao, The University of Queensland, Australia
  • Michael Forbes, The University of Queensland, Australia
  • Mikael Boden, The University of Queensland, Australia

Presentation Overview: Show

Insertions and deletions (indels) influence the genetic code in fundamentally distinct ways from substitutions, significantly impacting gene product structure and function. Despite their influence, the evolutionary history of indels is often neglected in phylogenetic tree inference and ancestral sequence reconstruction, hindering efforts to comprehend biological diversity determinants and engineer variants for medical and industrial applications.

We frame determining the optimal history of indel events as a single Mixed-Integer Programming (MIP) problem, across all branch points in a phylogenetic tree adhering to topological constraints, and all sites implied by a given set of aligned, extant sequences. By disentangling the impact on ancestral sequences at each branch point, this approach identifies the minimal indel events that jointly explain the diversity in sequences mapped to the tips of that tree. MIP can recover alternate optimal indel histories, if available.

We evaluated MIP for indel inference on a dataset comprising 15 real phylogenetic trees associated with protein families ranging from 165 to 2000 extant sequences, and on 60 synthetic trees at comparable scales of data and reflecting realistic rates of mutation. Across relevant metrics, MIP outperformed alternative parsimony-based approaches and reported the fewest indel events, on par or below their occurrence in synthetic datasets. MIP offers a rational justification for indel patterns in extant sequences; importantly, it uniquely identifies global optima on complex protein data sets without making unrealistic assumptions of independence or evolutionary underpinnings, promising a deeper understanding of molecular evolution and aiding novel protein design.

14:20-14:40
Proceedings Presentation: Approximating facial expression effects on diagnostic accuracy via generative AI
Confirmed Presenter: Tanviben Patel, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States

Room: 519
Format: In Person

Moderator(s): Gary Bader


Authors List: Show

  • Tanviben Patel, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Amna Othman, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Omer Sumer, Institute of Computer Science, Augsburg University, Augsburg, 86159, Bavaria, Germany, Germany
  • Fabio Hellman, Institute of Computer Science, Augsburg University, Augsburg, 86159, Bavaria, Germany, Germany
  • Peter Krawitz, Institute for Genomic Statistics and Bioinformatics, University of Bonn, Germany, Germany
  • Elisabeth Andre, Institute of Computer Science, Augsburg University, Augsburg, 86159, Bavaria, Germany, Germany
  • Molly E. Ripper, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Chris Fortney, Social and Behavioral Research Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Susan Persky, Social and Behavioral Research Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Ping Hu, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Cedrik Tekendo-Ngongang, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Suzanna Ledgister Hanchard, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Kendall A. Flaharty, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Rebekah L. Waikel, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Dat Duong, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States
  • Benjamin D. Solomon, Medical Genomics Unit, Medical Genetics Branch, NHGRI, Bethesda, 20892, Maryland, USA, United States

Presentation Overview: Show

Artificial Intelligence (AI) is increasingly used in genomics research and practice, and generative AI has garnered significant recent attention. In clinical applications of generative AI, aspects of the underlying datasets can impact results, and confounders should be studied and mitigated. One example involves the facial expressions of people with genetic conditions. Stereotypically, Williams (WS) and Angelman (AS) syndromes are associated with a “happy” demeanor, including a smiling expression. Clinical geneticists may be more likely to identify these conditions in images of smiling individuals. To study the impact of facial expression, we analyzed publicly available facial images of approximately 3500 individuals with genetic conditions. Using a deep learning (DL) image classifier, we found that WS and AS images with non-smiling expressions had significantly lower prediction probabilities for the correct syndrome labels than those with smiling expressions. This was not seen for 22q11.2 deletion and Noonan syndromes, which are not associated with a smiling expression. To further explore the effect of facial expressions, we computationally altered the facial expressions for these images. We trained HyperStyle, a GAN-inversion technique compatible with StyleGAN2, to determine the vector representations of our images. Then, following the concept of InterfaceGAN, we edited these vectors to recreate the original images in a phenotypically accurate way but with a different facial expression. Through online surveys and an eye-tracking experiment, we examined how altered facial expressions affect the performance of human experts. We overall found that facial expression is associated with diagnostic accuracy variably in different genetic conditions.

14:40-15:00
Using Relation Equivariant Graph Neural Networks to Explore the Mosaic-like Tissue Architecture of Kidney Diseases with Spatially Resolved Transcriptomics
Confirmed Presenter: Mauminah Raina, Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, United States

Room: 519
Format: In Person

Moderator(s): Gary Bader


Authors List: Show

  • Mauminah Raina, Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, United States
  • Juexin Wang, Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, United States
  • Dong Xu, Department of Electrical Engineering and Computer Science, University of Missouri, United States
  • Qin Ma, Department of Biomedical Informatics, The Ohio State University, United States
  • Micheal T. Eadon, Department of Medicine, Indiana University, United States
  • Ricardo Melo Ferreira, Department of Medicine, Indiana University, United States

Presentation Overview: Show

Chronic kidney disease is one of the leading public health problems with a prevalence of around 15% adults in the world. Emerging spatially resolved transcriptomics (SRT) technologies provide unprecedented opportunities to discover the spatial patterns of gene expression in diseases. Currently most existing computational tools are designed and tested on the ribbon-like brain cortex, often making it challenging to identify highly heterogeneous mosaic-like tissue architectures, such as tissues from kidney diseases. This hurdle demands heightened precision in discerning the cellular and morphological changes within renal tubules and their interstitial niches. We present an empowered graph deep learning framework, REGNN (Relation Equivariant Graph Neural Networks), for SRT data analyses on heterogeneous tissue structures. To increase expressive power in the SRT lattice using graph modeling, REGNN integrates equivariance, handling the rotational and translational symmetries of the spatial space, and Positional Encoding to strengthen the relative spatial relations of the nodes uniformly distributed in the lattice. With limited availability of labels, both Graph Autoencoder and graph self-supervised learning strategies are built on REGNN. Our study finds that REGNN outperforms existing computational tools in identifying tissue architectures of samples in mosaic-like heterogenous samples sourced from different kidney diseases using the 10X Visium platform. In case studies, the results identified by REGNN are validated by annotations from experienced nephrology physicians. This proposed framework explores the expression patterns of highly heterogeneous tissues with an enhanced graph deep learning model and paves the way to pinpoint underlying pathological mechanisms that contribute to the progression of complex diseases.

15:00-15:20
scResolve: Recovering single cell expression profiles from multi-cellular spatial transcriptomics
Confirmed Presenter: Young Je Lee, Carnegie Mellon University, United States

Room: 519
Format: In Person

Moderator(s): Gary Bader


Authors List: Show

  • Hao Chen, Carnegie Mellon University, United States
  • Young Je Lee, Carnegie Mellon University, United States
  • Ziv Bar-Joseph, Carnegie Mellon University, United States
  • Jose Lugo-Martinez, Carnegie Mellon University, United States

Presentation Overview: Show

Many popular spatial transcriptomics techniques lack single-cell resolution. Instead, these methods measure the collective gene expression for each location from a mixture of cells, potentially containing multiple cell types. Here, we developed scResolve, a method for recovering single-cell expression profiles from spatial transcriptomics measurements at multi-cellular resolution. scResolve accurately restores expression profiles of individual cells at their locations, which is unattainable from cell type deconvolution. Applications of scResolve on human breast cancer data and human lung disease data demonstrate that scResolve enables cell type-specific differential gene expression analysis between different tissue contexts and accurate identification of rare cell populations. The spatially resolved cellular-level expression profiles obtained through scResolve facilitate more flexible and precise spatial analysis that complements raw multi-cellular level analysis.

15:20-15:40
Multiview factorization for joint modeling of spatial multi-omics and histology images via NMF
Confirmed Presenter: William Bowie, Baylor College of Medicine, United States

Room: 519
Format: In Person

Moderator(s): Gary Bader


Authors List: Show

  • William Bowie, Baylor College of Medicine, United States
  • Stacy Wang, Baylor College of Medicine, United States
  • Benjamin Strope, Baylor College of Medicine, United States
  • Qian Zhu, Baylor College of Medicine, United States

Presentation Overview: Show

An increasing number of cancer research studies employ spatially resolved transcriptomics (SRT) to investigate the composition of tumor microenvironment (TME) in a cancer type of interest. These studies have defined TME states and spatial domains based on clustering spatial gene expression patterns in SRT in an unbiased manner, yet a more thorough delineation of TME states requires the incorporation of the tumor’s histology image. Here, we implement a multiview formulation of NMF, which is a matrix factorization approach that is suitable for cancer research studies where joint profiles of spatial multi-omics and tumor histology images are available. We apply MultiNMF-XT (pronounced extra) to analyze a set of TNBC SRT primary tumor samples and reveal TME states in the stromal, epithelial, and immune enriched compartments, defined by distinct histomorphological features. MultiNMF-XT is written in C++ and is 3.5-fold faster than Python NMF implementation.

MultiNMF-XT can factorize the SRT into components well-supported by histological evidence. It identified T-cell infiltrating regions, and EMT-enriched regions with distinct immune and stroma morphological characteristics. The T-cell infiltration neighbors a region that has an appearance of necrosis according to the pathologist evaluation. These domains further demonstrate enrichment of motifs according to SCENIC. We further illustrate approach’s ability to extend to paired spatial ATAC-seq and histology of HER2 breast cancer. This analysis not only reveals ERBB2 amplicon amplification, but also derives regulatory regions for T- and B-cells that are variable between clones. MultiNMF-XT thus permits an automated, data-driven decomposition of SRT and spatial ATACseq supported by histomorphological evidence.

15:40-16:00
Predicting gene functional associations from coevolutionary signals with EvoWeaver
Confirmed Presenter: Erik Wright, University of Pittsburgh, United States

Room: 519
Format: In Person

Moderator(s): Gary Bader


Authors List: Show

  • Erik Wright, University of Pittsburgh, United States
  • Aidan Lakshman, University of Pittsburgh, United States

Presentation Overview: Show

The universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. This phenomenon gives rise to a "rich get richer" scenario: the majority of research focuses on a small subset of proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through 'guilt-by-association'. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, an algorithm that weaves together 12 distinct signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver's signals encompass phylogenetic profiling, phylogenetic structure, gene organization, and sequence-level methods that broadly capture coevolution between sequences. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We demonstrate the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than genome sequences. Additionally, we show that EvoWeaver's predictions rival those of the widely used STRING database without reliance on prior biological knowledge. Finally, we leverage EvoWeaver's predictions to uncover experimentally validated functional associations among genes that are absent from existing databases. This work forms one of the largest scale analyses of protein functional relationships to date, encompassing 1,545 gene groups from 8,564 genomes. Given its predictive power and speed, EvoWeaver has the potential to revolutionize protein functional prediction at the scale of the protein universe.

16:40-17:00
Proceedings Presentation: Efficient parameter estimation for ODE models of cellular processes using semi-quantitative data
Confirmed Presenter: Domagoj Doresic, IRU Mathematics and Life Sciences, University of Bonn; Helmholtz Zentrum München, Computational Health Center, Germany

Room: 519
Format: In Person

Moderator(s): Shamini Ayyadhury


Authors List: Show

  • Domagoj Doresic, IRU Mathematics and Life Sciences, University of Bonn; Helmholtz Zentrum München, Computational Health Center, Germany
  • Stephan Grein, IRU Mathematics and Life Sciences, University of Bonn, Germany
  • Jan Hasenauer, IRU Mathematics and Life Sciences, University of Bonn; Technische Universität München; Helmholtz Zentrum München, Germany

Presentation Overview: Show

Quantitative dynamical models facilitate the understanding of biological processes and the prediction of their dynamics. The parameters of these models are commonly estimated from experimental data. Yet, experimental data generated from different techniques do not provide direct information about the state of the system but a non-linear (monotonic) transformation of it. For such semi-quantitative data, when this transformation is unknown, it is not apparent how the model simulations and the experimental data can be compared. Here, we propose a versatile spline-based approach for the integration of a broad spectrum of semi-quantitative data into parameter estimation. We derive analytical formulas for the gradients of the hierarchical objective function and show that this substantially increases the estimation efficiency. Subsequently, we demonstrate that the method allows for the reliable discovery of unknown measurement transformations. Furthermore, we show that this approach can significantly improve the parameter inference based on semi-quantitative data in comparison to available methods. Modelers can easily apply our method by using our implementation in the open-source Python Parameter EStimation TOolbox (pyPESTO).

17:00-17:20
NCBI’s RNA-seq analysis pipeline produces millions of pre-computed gene expression counts to accelerate data reuse and discovery
Confirmed Presenter: Emily Clough, NCBI/NLM/NIH, United States

Room: 519
Format: In Person

Moderator(s): Shamini Ayyadhury


Authors List: Show

  • Emily Clough, NCBI/NLM/NIH, United States
  • Ryan Connor, NCBI/NLM/NIH, United States
  • Vadim Zalunin, NCBI/NLM/NIH, United States
  • Andrey Kochergin, NCBI/NLM/NIH, United States
  • Lukas Wagner, NCBI/NLM/NIH, United States
  • Maxim Tomashevsky, NCBI/NLM/NIH, United States
  • Naigong Zhang, NCBI/NLM/NIH, United States
  • Nadezhda Serova, NCBI/NLM/NIH, United States
  • Alexandra Soboleva, NCBI/NLM/NIH, United States
  • Ilene Mizrachi, NCBI/NLM/NIH, United States
  • James Brister, NCBI/NLM/NIH, United States

Presentation Overview: Show

The NIH Sequence Read Archive (SRA) is a diverse collection of DNA and RNA sequences that together document the genetic diversity across the tree of life. This repository provides a platform where individual contributors can add their data to a growing corpus that is available for subsequent reanalysis by the scientific community. SRA has grown exponentially over the past decade to more than 29 petabytes of data with 27 million sequence samples, of which 40% are derived from RNA. RNA-sequencing (RNA-seq) is a powerful method used to analyze transcription quantitatively and qualitatively at the subcellular, single-cell and tissue level and has become a standard molecular biology application. To propel reanalysis and use of SRA’s expansive volume of RNA-seq data, SRA has built a cloud-based RNA-seq analysis pipeline using publicly available software to provide users with consistently computed gene expression counts for all human RNA-seq samples contained within SRA. To date, counts have been produced for 1.3 million human RNA-seq runs. The pipeline runs continuously creating new count files as RNA-seq runs are released for public access. SRA’s RNA-seq counts are used to create read count matrices for ~26,000 studies held in NIH’s Gene Expression Omnibus (GEO) database and have been incorporated into GEO’s online analysis tool, GEO2R. The run-level counts generated by SRA will be made available by Amazon Web Service (AWS). Consistently computed RNA-seq gene counts reduce cost, effort and time barriers to data reuse and enable large-scale analyses with high statistical power.

17:20-17:40
Enabling Affordable Single-Cell Data in Large Cohort Studies via Deep Generative Neural Networks and Active Learning
Confirmed Presenter: Jingtao Wang, Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, Canada

Room: 519
Format: In Person

Moderator(s): Shamini Ayyadhury


Authors List: Show

  • Jingtao Wang, Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, Canada
  • Gregory Fonseca, Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, Canada
  • Jun Ding, Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, Canada

Presentation Overview: Show

Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases. Its prohibitive cost, however, hampers its application in expansive biomedical studies. Traditional cellular deconvolution approaches can infer cell type proportions from more affordable bulk sequencing data, yet they fall short in providing the detailed resolution required for single-cell-level analyses. To overcome this challenge, we introduce “scSemiProfiler”, an innovative computational framework that marries deep generative models with active learning strategies. This method adeptly infers single-cell profiles across large cohorts by fusing bulk sequencing data with targeted single-cell sequencing from a few carefully chosen representatives. Extensive validation across heterogeneous datasets verifies the precision of our semi-profiling approach, aligning closely with true single-cell profiling data and empowering refined cellular analyses. Originally developed for extensive disease cohorts, “scSemiProfiler” is adaptable for broad applications. It provides a scalable, cost-effective solution for single-cell profiling, facilitating in-depth cellular investigation in various biological domains.

17:40-18:00
HINN: A Novel Neural Network Architecture to Integrate Multi-Omics Data based on their Biological Relationships
Confirmed Presenter: Yashu Vashishath, University of North Texas, United States

Room: 519
Format: In Person

Moderator(s): Shamini Ayyadhury


Authors List: Show

  • Yashu Vashishath, University of North Texas, United States
  • Sarah Beaver, Univeristy of North Texas, United States
  • Fahad Saeed, Florida International University, United States
  • Serdar Bozdag, Univeristy of North Texas, United States

Presentation Overview: Show

Introduction:
Detecting declines in cognitive function is a critical global health concern, highlighting the need for timely identification to implement effective intervention strategies. This study investigates the potential of blood-based biomarkers as accurate and non-invasive measures of cognitive function. We developed a novel deep learning architecture that integrates multi-omics data by considering their relationship. Specifically, we integrated single nucleotide polymorphisms (SNPs), gene expression, and DNA methylation data to predict cognitive test scores.

Methods:
A novel computational architecture called Hierarchical Input Neural Network (HINN) (Figure 1) was developed to integrate multi-omics datasets. In this study, HINN had three input layers receiving SNPs, DNA Methylation, and gene expression data, respectively. First, we conducted multiple GWAS runs to identify significant SNPs associated with cognitive scores based on different cognitive tests and found 373 SNPs across these runs. We identified 13037 DNA methylation sites proximal to these SNPs and 455 genes associated with these probes.

Results:
We observed that MAE of the HINN model for MMSE, MoCA, ADAS11, and RAVLT-I was 2.28, 2.98, 4.54, and 8.88, respectively. Similarly, MSE of the same model for MMSE, MoCA, ADAS11, and RAVLT-I was 10.41, 17.16, 45.30, and 135.59, respectively. We compared our model with the other baseline models, namely L1-regularized, support vector machine, random forest, and deep neural network. We also compared HINN with the pathway guided deep neural network in which all omics data were fed to single layer, which was connected based on biological processes relationship.

Conclusion:
The findings derived from the HINN model highlight its efficacy in detecting novel biomarkers for cognitive assessment. Through the hierarchical input approach, we outperformed baseline methods to predict cognitive score. Notably, our method is characterized by its independence from assumptions and subjective scoring, ensuring the robustness and reliability of predictions across diverse cognitive measures.