The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 14, 2024
10:40-11:20
Invited Presentation: Single-cell and single-molecule computational epigenomics
Confirmed Presenter: Maria Colomé Tatché
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Maria Colomé Tatché

Presentation Overview:Show

Recent breakthroughs in high-throughput sequencing of single cells are revolutionizing the biological and biomedical sector. Among the different -omics layers that can be measured at the single-cell level, single-cell epigenomic measurements present a rich layer of regulatory information that stands between the genome and the transcriptome. These measurements can be obtained for large heterogeneous samples of single cells to profile tissues, organs and whole organisms, and to study dynamic processes like cellular differentiation, reprogramming or cancer evolution. These data types provide an unprecedented level of measurement resolution.
In this talk I will discuss how single-cell ATAC-seq and single-cell DNA methylation data can be used to study cell identity [1,2]. I will introduce and compare multiple feature space constructions for epigenetic data analysis and show the feasibility of common clustering, dimension reduction, batch integration and trajectory learning techniques for both single-cell DNA methylation data and scATAC-seq data.
Studying single-cell DNA methylation heterogeneity using single-cell DNA methylation measurements is however complicated, as experimental protocols are costly and difficult to implement. I will present an alternative strategy, which involves minION sequencing combined with deconvolution of single-molecule methylation signals to reconstruct cell-type methylation profiles. I will show how, using this method, it is possible to deconvolve the methylomes of different cell types from an in-silico mix of cells.
Another level of genomic information that can be extracted from single-cell data are single-cell copy number variations (CNVs). I will present a novel algorithm, epiAneufinder [3], which exploits the read count information from scATAC-seq data to extract genome-wide CNVs for individual single-cells, and I will show how the obtained CNVs are comparable to the ones obtained from single-cell whole genome sequencing data. Thanks to epiAneufinder it is therefore possible to add a relevant extra layer of genomic information, namely single-cell copy number variation, to every scATAC-seq dataset without the need of additional experiments.

[1] A. Danese, M.L. Richter, D.S. Fischer, F.J. Theis and M. Colomé-Tatché*. EpiScanpy: integrated single-cell epigenomic analysis. Nature Communications, 12, 5228 (2021).
[2] M.D. Luecken, M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M.F. Mueller, D.C. Strobl, L. Zappia, M. Dugas, M. Colomé-Tatché*, F.J. Theis*. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
[3] A. Ramakrishnan, A. Symeonidi, P. Hanel, K. T. Schmid, M. L. Richter, M. Schubert, M. Colomé-Tatché*. epiAneufinder identifies copy number alterations from single-cell ATAC-seq data. Nat. Commun. 14, 5846 (2023).

July 14, 2024
11:20-11:40
Proceedings Presentation: REUNION: transcription factor binding prediction and regulatory association inference from single-cell multi-omics data
Confirmed Presenter: Yang Yang, Memorial Sloan Kettering Cancer Center, Howard Hughes Medical Institute
Track: RegSys

Room: 518
Format: Live Stream
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Yang Yang, Yang Yang, Memorial Sloan Kettering Cancer Center
  • Dana Pe'er, Dana Pe'er, Memorial Sloan Kettering Cancer Center

Presentation Overview:Show

Motivation: Profiling of gene expression and chromatin accessibility by single-cell multi-omics approaches can help to systematically decipher how transcription factors (TFs) regulate target gene expression via cis-region interactions. However, integrating information from different modalities to discover regulatory associations is challenging, in part because motif scanning approaches miss many likely TF binding sites.
Results: We develop REUNION, a framework for predicting genome-wide TF binding and cis-region-TF-gene “triplet” regulatory associations using single-cell multi-omics data. The first component of REUNION, Unify, utilizes information theory-inspired complementary score functions that incorporate TF expression, chromatin accessibility, and target gene expression to identify regulatory associations. The second component, Rediscover, takes Unify estimates as input for pseudo semi-supervised learning to predict TF binding in accessible genomic regions that may or may not include detected TF motifs. Rediscover leverages latent chromatin accessibility and sequence feature spaces of the genomic regions, without requiring chromatin immunoprecipitation data for model training. Applied to peripheral blood mononuclear cell data, REUNION outperforms alternative methods in TF binding prediction on average performance. In particular, it recovers missing region-TF associations from regions lacking detected motifs, which circumvents the reliance on motif scanning and facilitates discovery of novel associations involving potential co-binding transcriptional regulators. Newly identified region-TF associations, even in regions lacking a detected motif, improve the prediction of target gene expression in regulatory triplets, and are thus likely to genuinely participate in the regulation.
Availability and implementation: All source code is available at https://github.com/yangymargaret/REUNION.

July 14, 2024
11:40-12:00
scHOCMO: Higher Order Correlation Model for Single-cell Multi-omics
Confirmed Presenter: Reetika Ghag, Washington University in St. Louis School of Medicine, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Reetika Ghag, Reetika Ghag, Washington University in St. Louis School of Medicine
  • Shamim Mollah, Shamim Mollah, Washington University in St. Louis School of Medicine

Presentation Overview:Show

Single-cell technologies enable system level interrogations across several molecular layers at a single-cell resolution. Current single-cell technologies incorporate massive parallelism enabling high-throughput joint profiling of various modalities on a cell. Multi-modal single-cell data can be further integrated to understand the causal relationships among the several molecular layers driving regulatory mechanisms in disease progression. However, the heterogeneity introduced by various modalities and their feature spaces makes it challenging to unify data into a single inference framework. Here, we propose a novel method scHOCMO (Higher Order Correlation Model for Single cell multiomics), to address the scalability and generalizability challenges in the existing single-cell multimodal data integration methods. We extend the previously developed tensor-based HOCMO (Higher Order Correlation Model) to improve the scalability to analyze single cell data from 107 to 1013 scale using Trillion-Tensor framework. We illustrate our method, using the single-nucleus RNA seq (sn-RNA) and single-nucleus ATAC seq (sn-ATAC) data for Diabetic Kidney Disease. Using differentially expressed genes, we aim to elucidate the regulatory dynamics of disease progression based on these disease specific marker genes and cell types.

July 14, 2024
12:00-12:20
Pan-cell type continuous chromatin state annotation of all IHEC epigenomes
Confirmed Presenter:
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Habib Daneshpajouh, Habib Daneshpajouh, Simon Fraser University
  • Kay C. Wiese, Kay C. Wiese, Simon Fraser University
  • Maxwell W. Libbrecht, Maxwell W. Libbrecht, Simon Fraser University

Presentation Overview:Show

Understanding the mechanistic basis of genetic disease requires annotating the regulatory elements in the human genome. To this end, international consortia such as IHEC, ENCODE, and Roadmap Epigenomics have generated thousands of epigenomic datasets such as ChIP-seq, DNase-seq, and ATAC-seq that measure various biochemical activities in the genome, including transcription factor binding, histone modification, and DNA accessibility. Currently, the predominant methods for integrating these data sets to annotate regulatory elements are segmentation and genome annotation (SAGA) algorithms such as ChromHMM and Segway. SAGA algorithms partition the genome and assign a chromatin state label to each segment, indicating the epigenetic activity at that position. To alleviate the limitations of the discrete SAGA framework, we recently developed epigenome-ssm, a method that produces a vector of continuous chromatin state features at each position that summarizes epigenetic activity. Unlike discrete labels, these continuous features can easily represent varying strengths of a given element and can represent combinatorial elements with multiple types of activities. Here, we present a continuous chromatin state feature map generated using epigenome-ssm on 9,539 genome-wide signal tracks from six core histone modification assays across 1,698 epigenomes. We show that these feature maps constitute an intuitive and visualizable summary of epigenomic data and enable accurate identification of mechanisms of disease association.

July 14, 2024
12:00-12:20
Automated and genome-scale exploration of the cis-regulatory code involved in neuronal differentiation
Confirmed Presenter:
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Océane Cassan, Océane Cassan, LIRMM
  • Christophe Vroland, Christophe Vroland, Institut de Génétique Moléculaire de Montpellier
  • Raynal Julien, Raynal Julien, Institut de Génétique Moléculaire de Montpellier
  • Masaki Kato, Masaki Kato, RIKEN Center for Integrative Medical Sciences
  • Hazuki Takahashi, Hazuki Takahashi, RIKEN Center for Integrative Medical Sciences
  • Takeya Kasukawa, Takeya Kasukawa, RIKEN Center for Integrative Medical Sciences
  • Piero Carninci, Piero Carninci, RIKEN Center for Integrative Medical Sciences
  • Chi Wai Yip, Chi Wai Yip, RIKEN Center for Integrative Medical Sciences
  • Laurent Bréhélin, Laurent Bréhélin, LIRMM
  • Charles-Henri Lecellier, Charles-Henri Lecellier

Presentation Overview:Show

Gene expression is controlled by proximal and distal cis-regulatory elements (CREs), containing DNA motifs bound by various transcription factors (TFs). Other sequence features, such as specific k-mers or low complexity regions, have also been implicated.
However, in a dynamic biological process such as cell differentiation, we lack an understanding of how the transcriptional activity of CREs progressively change and what sequence features underlie these transitions, which may reflect common and/or coordinated regulatory processes.
Here, we use single-nucleus ATAC-seq with single-cell 5’ RNA-seq to follow, at a genome scale, CREs along differentiation of induced pluripotent stem cells into cortical neurons. We propose a guided clustering algorithm, STOIC (Statistical learning To Optimize Integrative Clustering) that jointly learns the different CRE clusters and their distinctive sequence-level features using an interpretable machine learning approach.
This procedure explores the expression space and delineates the CRE clusters iteratively in order to optimize the performance of a supervised classifier predicting CRE cluster membership based on DNA sequence features.
We show that STOIC provides more predictive sequence-level features than a standard k-means clustering. Furthermore, orthogonal chromatin and TF binding data collected in the same settings are used to validate the inferred CRE clusters and their sequence features, associate them to specific enhancer or promoter signatures and biological processes. Our results explore the complexity of the cis-regulatory code at the genome scale and provide an updated perspective on the transcriptional regulations at play during neuronal differentiation.

July 14, 2024
12:00-12:20
Expanding GTEx dataset with brain ontology-based graph neural networks to investigate genetic impacts on brain diseases
Confirmed Presenter:
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Jianfeng Ke, Jianfeng Ke, University of Massachusetts Lowell
  • Rachel Melamed, Rachel Melamed, University of Massachusetts Lowell
  • Tingjian Ge, Tingjian Ge, University of Massachusetts Lowell

Presentation Overview:Show

The human brain, with its intricate network of diverse regions, profoundly influences disease development. The Genotype-Tissue Expression (GTEx) program gathered transcriptome data and matched genotype data from over three hundred post-mortem donors, which allows us to understand how genetic variation can impact gene expression in diverse regions. However, the GTEx dataset included only 13 brain regions and only 10% of subjects had all brain regions measured. Improving the completeness of gene expression data within the GTEx project has the potential to elucidate the impact of disease risk variants on gene regulation in crucial tissues relevant to disease development. A possible resource to address this issue is the Allen Human Brain Atlas dataset. It collected transcriptome data from post-mortem brain tissue samples from 6 individuals, covering over a hundred distinct brain subregions. Leveraging the Allen dataset, we proposed a graph neural network model based on an expert ontology describing a hierarchy of increasingly fine-grained brain regions. This Graph Ontology model can predict 103 subordinate or previously uncollected brain regions for subjects within the GTEx dataset. We showed that our model outperformed several existing multi-tissue imputation models. Our model extended the initial 13 GTEx regions to 103 subordinate regions, enabling us to explore how genetic variation represented in GTEx can impact diverse disease-relevant regions that were not originally covered by the GTEx. Our prediction results can serve as a foundation for future investigations into how specific genetic variations influence diseases by altering gene expression patterns across a wide range of brain regions.

July 14, 2024
12:00-12:20
Interpretable single-cell factor decomposition using sciRED
Confirmed Presenter:
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Delaram Pouyabahar, Delaram Pouyabahar, Department of Molecular Genetics
  • Tallulah Andrews, Tallulah Andrews, Departments of Biochemistry and Computer Science
  • Gary Bader, Gary Bader, Departments of Molecular Genetics and Computer Science

Presentation Overview:Show

Single-cell RNA sequencing (scRNA-seq) enables the exploration of gene expression heterogeneity within large cell populations, arising from biological and technical factors. Inferring gene expression programs from scRNA-seq data is challenging due to noise, sparsity, and high dimensionality, addressed by computational approaches like matrix factorization. Specialized factorization techniques such as glmPCA and cNMF have emerged in recent years to be tailored for scRNA-seq. However, the resulting factors must be manually interpreted. To address this gap, we developed sciRED as a tool to improve the interpretation of scRNA-seq factor analysis. sciRED implements a four-step approach to characterizing gene expression programs: (1) Removing confounding effects and using rotations to maximize factor interpretability (2) Calculating association statistics to map factors with known covariates, (3) Highlighting unexplained factors that may indicate hidden biological phenomena, and (4) Determining the genes and biological processes represented by unexplained factors. We apply our method, sciRED, across diverse datasets including the scMixology benchmark dataset and four biological single-cell atlases. Specifically, we showcase its application in identifying cell identity programs and sex-specific variations in a kidney map, discerning strong and weak stimulation signals in a PBMC dataset, eliminating ambient RNA contamination in a rat liver atlas to unveil strain variations, and revealing the hidden biology, represented by a rare cell type signature and anatomical zonation gene programs, in the healthy human liver map. These demonstrate the utility of our approach on real datasets for characterizing intricate biological signals within scRNA-seq maps.

July 14, 2024
12:00-12:20
Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

  • Alexis Morrissey, Alexis Morrissey, Penn State University
  • Jeffrey Shi, Jeffrey Shi, Penn State University
  • Daniela James, Daniela James, Penn State University
  • Shaun Mahony, Shaun Mahony, Penn State University

Presentation Overview:Show

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq, as most regulatory genomics analysis pipelines discard “multi-mapped” reads. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment. To demonstrate Allo’s potential, we apply it to reanalyze almost 500 transcription factor ChIP-seq datasets from K562 cells. This analysis resulted in over 385,000 previously unidentified transcription factor binding sites in repetitive regions of the genome. We find that Allo is particularly beneficial in identifying ChIP-seq peaks at centromeres and in younger TEs. In particular, we find novel associations between particular TFs and the recently expanded SVA and ERVK transposon families. We also find that Allo has a striking ability to disambiguate multi-mapped reads at recently duplicated genes. Using Allo, we analyze how regulatory elements diverge at recently generated paralogous genes, enabling new regulatory insights at sites of recent evolutionary novelty that often get overlooked in regulatory genomics analyses. Finally, we demonstrate that TF binding sites harbored by repeats are particularly difficult for neural network-based methods to predict de novo, and we speculate on approaches that can offer improved performance.

July 14, 2024
12:00-12:20
Q&A for Flash Talks
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Alejandra Medina-Rivera


Authors List: Show

July 14, 2024
14:20-14:40
Proceedings Presentation: A count-based model for delineating cell-cell interactions in spatial transcriptomics data
Confirmed Presenter: Hirak Sarkar, Princeton University, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Amin Emad


Authors List: Show

  • Hirak Sarkar, Hirak Sarkar, Princeton University
  • Uthsav Chitra, Uthsav Chitra, Princeton University
  • Julian Gold, Julian Gold, Princeton University
  • Benjamin Raphael, Benjamin Raphael, Princeton University

Presentation Overview:Show

Motivation: Cell-cell interactions (CCIs) consist of cells exchanging signals with themselves and neighboring cells by expressing ligand and receptor molecules, and play a key role in cellular development, tissue homeostasis, and other critical biological functions. Since direct measurement of CCIs is challenging, multiple methods have been developed to infer CCIs by quantifying correlations between the gene expression of the ligands and receptors that mediate CCIs, originally from bulk RNA sequencing data and more recently from single-cell or spatial transcriptomics data. Spatial transcriptomics has a particular advantage over single-cell approaches since ligand-receptor correlations can be computed between cells or spots that are physically close in the tissue. However, the transcript counts of individual ligands and receptors in spatial transcriptomics data are generally low, complicating the inference of CCIs from expression correlations.

Results: We introduce Copulacci, a count-based model for inferring CCIs from spatial transcriptomics data. Copulacci uses a Gaussian copula to model dependencies between the expression of ligands and receptors from nearby spatial locations even when the transcript counts are low. On simulated data, Copulacci outperforms existing CCI inference methods based on the standard Spearman and Pearson correlation coefficients. Using several real spatial transcriptomics datasets, we show that Copulacci discovers biologically meaningful ligand-receptor interactions that are lowly expressed and undiscoverable by existing CCI inference methods.

Availability: Copulacci is implemented in Python and available at https://github.com/raphael-group/copulacci

July 14, 2024
14:40-15:00
Mapping lineage-resolved scRNA-seq data with spatial transcriptomics using TemSOMap
Confirmed Presenter: Xinhai Pan, Georgia Institute of Technology, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Amin Emad


Authors List: Show

  • Xinhai Pan, Xinhai Pan, Georgia Institute of Technology
  • Xiuwei Zhang, Xiuwei Zhang, Georgia Institute of Technology

Presentation Overview:Show

Spatial transcriptomics (ST) has become a powerful technique that bridges the gap between traditional gene expression analysis and spatial information within tissues or organisms. While ST can obtain a snapshot of cells’ spatial gene expressions, the library size is relatively limited compared to scRNAseq datasets. This limitation can be overcome by integrating scRNAseq data with the ST data. By mapping the single cells onto the spatial data, we can also infer the spatial coordinates of the cells from the scRNAseq dataset. On the other hand, CRISPR/Cas9-based lineage tracing technologies have enabled paired sequencing of cells’ gene expression and lineage barcodes. The reconstructed cell lineage tree from the barcodes represents cells’ clonal distances. With the availability of single-cell spatial and temporal information at the single-cell resolution, it is of great interest to look into the spatio-temporal dynamics of cells, which requires the inference of spatial coordinates of the lineage-traced cells. Therefore, we developed TemSOMap (Temporal and Spatial-Omics Mapping of single cells), which infers the spatial coordinates of cells by mapping a paired gene expression and lineage barcode dataset onto a spatial transcriptomics dataset using deep learning. The method aims to improve the accuracy of state-of-the-art mapping methods by utilizing the temporal and spatial information in the data. We show that TemSOMap can more accurately infer the spatial location of single cells, and can help us better understand the spatio-temporal dynamics of single cells using the spatially-resolved cell lineage and transcriptomic map.

July 14, 2024
15:00-15:20
Enhancing spatial transcriptomics analysis using deep learning-based batch effect mitigation
Confirmed Presenter: Rian Pratama, Pusan National University, South Korea
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Amin Emad


Authors List: Show

  • Rian Pratama, Rian Pratama, Pusan National University
  • Jason Hilton, Jason Hilton, Stanford University
  • Jeonghoon Choi, Jeonghoon Choi, Pusan National University
  • J. Michael Cherry, J. Michael Cherry, Stanford University
  • Giltae Song, Giltae Song, Pusan National University

Presentation Overview:Show

Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, holds significance. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaves other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce batch effects problem that hinders statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on distribution aspect. Additionally, our method leverages single-cell analysis tools.

Our study introduces Spatial Gene Net, a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ an encoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows seamless integration of multiple samples with minimum detriment, bolstering the statistical power of ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of Seurat tools clustering, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.236 to 0.405. This result reveals the potential of gene distribution spatial aspect, encouraging the development of better spatial feature extractor which emphasizes the impact of integration and batch effect correction for understanding tissue characteristics.

July 14, 2024
15:20-15:40
Gene Regulatory Networks analysis from single cell multi-omics data
Confirmed Presenter: Zhana Duren, Clemson University, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Amin Emad


Authors List: Show

  • Qiuyue Yuan, Qiuyue Yuan, Clemson University
  • Zhana Duren, Zhana Duren, Clemson University

Presentation Overview:Show

Existing methods for gene regulatory networks (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.

July 14, 2024
15:40-16:00
Dynamic Gene Regulatory Network Inference with Interpretable, Biophysically-Motivated Neural ODEs
Confirmed Presenter: Maggie Beheler-Amass, New York University, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Amin Emad


Authors List: Show

  • Maggie Beheler-Amass, Maggie Beheler-Amass, New York University
  • Christopher A Jackson, Christopher A Jackson, New York Genome Center
  • David Gresham, David Gresham, New York University
  • Richard Bonneau, Richard Bonneau, Prescient Design

Presentation Overview:Show

Gene Regulatory Networks (GRNs) are complex dynamical systems that modulate gene expression and drive transitions between phenotypic cell states. Determining these networks is crucial in understanding how gene dysregulation can lead to phenotypic variation and diseases. We present a novel biophysically-motivated neural ordinary differential equation (ODE) model framework with a biologically interpretable deep learning architecture that leverages dynamic single-cell data. This model framework infers GRNs by implicitly estimating underlying biophysical parameters such as RNA velocity, mRNA transcription rate, and mRNA decay rate.

To test the accuracy of our model, we apply it to a simulated dataset with a known ground truth GRN. We demonstrate that the neural ODE can successfully predict gene expression at unseen time points, and decompose the inferred RNA velocity into the transcription and degradation driving the system, while inferring the underlying GRN. Next, we train a model on a single-cell Saccharomyces cerevisiae dataset dynamically responding to rapamycin treatment. The model learns regulatory responses to the rapamycin perturbation and reveals key genes involved in the cellular response in silico. Finally, we apply the model to a dynamic hematopoiesis dataset to test whether the model can capture bifurcations of hematopoietic stem cells progressing along the myeloid and lymphoid lineages. The applicability to real-world datasets highlights the utility of neural ODEs coupled with interpretable deep learning. This framework has the potential to advance our understanding of complex biological systems and aid in the discovery of regulatory mechanisms underlying cellular responses to perturbations.

July 14, 2024
16:40-17:00
Proceedings Presentation: Optimal sequencing budget allocation for trajectory reconstruction of single cells
Confirmed Presenter: Noa Moriel, Hebrew University of Jerusalem, Israel
Track: RegSys

Room: 518
Format: Live Stream
Moderator(s): Marcel Schulz


Authors List: Show

  • Noa Moriel, Noa Moriel, Hebrew University of Jerusalem
  • Edvin Memet, Edvin Memet, Harvard University
  • Mor Nitzan, Mor Nitzan, Hebrew University of Jerusalem

Presentation Overview:Show

Charting cellular trajectories over gene expression is key to understanding dynamic cellular processes and their underlying mechanisms. While advances in single-cell RNA-sequencing technologies and computational methods have pushed forward the recovery of such trajectories, trajectory inference remains a challenge due to the noisy, sparse, and high-dimensional nature of single-cell data. This challenge can be alleviated by increasing either the number of cells sampled along the trajectory (breadth) or the sequencing depth, i.e. the number of reads captured per cell (depth). Generally, these two factors are coupled due to an inherent breadth-depth tradeoff that arises when the sequencing budget is constrained due to financial or technical limitations. Here we study the optimal allocation of a fixed sequencing budget to optimize the recovery of trajectory attributes. Empirical results reveal that reconstruction accuracy of internal cell structure in expression space scales with the logarithm of either the breadth or depth of sequencing. We additionally observe a power law relationship between the optimal number of sampled cells and the corresponding sequencing budget. For linear trajectories, non-monotonicity in trajectory reconstruction across the breadth-depth tradeoff can impact downstream inference, such as expression pattern analysis along the trajectory. We demonstrate these results for five single-cell RNA-sequencing datasets encompassing differentiation of embryonic stem cells, pancreatic β cells, hepatoblast and multipotent haematopoietic cells, as well as induced reprogramming of embryonic fibroblasts into neurons. By addressing the challenges of single-cell data, our study offers insights into maximizing the efficiency of cellular trajectory analysis through strategic allocation of sequencing resources.

July 14, 2024
17:00-17:20
Charting the role of RNA binding proteins in tissue-specific alternative splicing using machine explanations
Confirmed Presenter: Ayan Paul, Northeastern University, United States
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Marcel Schulz


Authors List: Show

  • Ayan Paul, Ayan Paul, Northeastern University
  • Shalini Karthyk, Shalini Karthyk, Northeastern University
  • Yogi Raghav, Yogi Raghav, University of Virginia
  • Jennifer Dy, Jennifer Dy, Northeastern University
  • John Platig, John Platig, University of Virginia
  • Peter Castaldi, Peter Castaldi, Harvard Medical School

Presentation Overview:Show

The regulation of alternative splicing by RNA Binding Proteins (RBP) is an essential mechanism in determining tissue specificity. The nuances of the variation in the role of the RBPs, singly and collectively, in various tissues are not well understood. We present a study of two cell lines, HepG2 and K562 using eCLIP RBP binding data and shRNA RBP knockdown followed by RNA-seq data from the ENCODE project to chart the role of RBP cooperativity in regulating exon skipping, one of the primary modes of alternative splicing. We build RBP binding graphs from exon triplets and train machine learning models, both linear and non-linear, to map RBP bindings to exon-skipping quantification. We show significant non-linearities are expressed in both cell lines. We achieve state-of-the-art performance with Extreme Gradient Boosted Decision Trees and Deep Neural Networks with skip connections. We use Shapley values as post-hoc explanations of machine learning models to quantify the importance of individual RBPs and to identify instances of cooperative regulation between sets of RBPs. We explore RBP activity in close proximity to intron-exon junctions and in deep intronic regions. We show that RBPs have a subset of cell-line agnostic roles and a subset of cell-line-specific roles in regulating alternative splicing. Furthermore, we identify binding-region-specific roles of RBPs as splicing enhancers or silencers displaying the power of our analysis in elucidating the functional roles by which RBPs regulate alternative splicing.

July 14, 2024
17:20-18:00
Invited Presentation: Harnessing deep learning to amplify insights from GWAS
Confirmed Presenter: Hae Kyung Im
Track: RegSys

Room: 518
Format: In Person
Moderator(s): Marcel Schulz


Authors List: Show

  • Hae Kyung Im

Presentation Overview:Show

Genome-wide Association Studies (GWAS) have identified associations with thousands of complex traits across a significant portion of the genome. Transcriptome-wide Association Studies (TWAS) and similar methods (xWAS) aim to uncover causal mechanisms by leveraging genetic predictors of molecular traits. However, their effectiveness is constrained by the current limitations in predicting these traits from genotypes. In this talk, I will explore recent advancements in deep learning methods for predicting gene expression from DNA sequences and demonstrate how these techniques can enhance the power of TWAS. By fine-tuning pre-trained large-scale models, we can predict molecular traits on a much larger scale than is possible with traditional population-based approaches. This methodology holds promise for addressing challenges related to portability across ancestries and species, rare variations, linkage disequilibrium (LD) confounding, and single-cell expression analysis.