RegSys

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Tuesday, July 25th
10:30-11:10
Invited Presentation: Exploring in-silico representations of the regulatory code
Room: Salle Rhone 2
Format: Live from venue

  • Julien Gagneur


Presentation Overview: Show

In the first part of my talk, I will present an evaluation of state-of-the-art models of human transcriptional regulatory sequences against data from two large-scale observational studies and five deep perturbation assays. Our results indicate that while causal determinants of human promoters are by en large well captured, current models fail to capture the effects of enhancers on expression, notably in medium to long distances.

In the second part of the talk, I will present results from self-supervised modeling of genomic sequences. Specifically, we train a masked language model on more than 800 fungal species spanning over 500 million years of evolution. We show that explicitly modeling species is instrumental in capturing conserved yet evolving regulatory elements and in controlling for oligomer biases. The utility of the learned sequence embeddings will be demonstrated for a range of downstream transcriptional and post-transcriptional predictive tasks.

I will finish with a discussion of the potential merits of self-supervised versus purely supervised learning for modeling regulatory sequences.

11:10-11:30
Impact of transcription initiation at microsatellites on gene expression
Room: Salle Rhone 2
Format: Live from venue

  • Mathys Grapotte, CNRS - Sanofi, France
  • Diego Garrido-Martin, CRG, Spain
  • Lisa Calero, CNRS, France
  • Quentin Bouvier, CNRS, France
  • Laurent Brehelin, CNRS, France
  • Clement Chatelain, Sanofi, United States
  • Roderic Guigo, CRG, Spain
  • Charles Lecellier, CNRS, France


Presentation Overview: Show

Microsatellites, also called Short Tandem Repeats (STRs), are key players in gene regulation. Using the FANTOM5 Cap Analysis of Gene Expression (CAGE) technology, we discovered widespread transcription initiation at STRs, predictable by sequence-based neural network models.
Here, we develop fully interpretable deep learning models to probe the functional impact of this STR transcription on gene expression. We leverage GTEx genetic and expression data to revisit expression (e)STR computations considering only SNPs located around STRs and using the prediction of our models as regressors. This new eSTR catalog complements existing eQTLs and eSTRs based solely on STR length variation. Together, our findings contribute to the understanding of molecular regulatory mechanisms orchestratedby non-coding RNAs in human tissues and their impact on complex traits. Our work constitutes a useful resource for the interpretation of thousands of genetic variants located at the vicinity of microsatellites.

11:30-11:50
Joint sequence and chromatin neural networks characterize the differential abilities of Forkhead transcription factors to engage inaccessible chromatin
Room: Salle Rhone 2
Format: Live from venue

  • Sonny Arora, The Pennsylvania State University, United States
  • Tomohiko Akiyama, Keio University, Japan
  • Jianyu Yang, The Pennsylvania State University, United States
  • Daniela James, The Pennsylvania State University, United States
  • Thomas Blanda, The Pennsylvania State University, United States
  • Nitika Badjatia, The Pennsylvania State University, United States
  • William Lai, Cornell University, United States
  • Minoru Ko, Keio University, Japan
  • B. Franklin Pugh, Cornell University, United States
  • Shaun Mahony, The Pennsylvania State University, United States


Presentation Overview: Show

To understand the cell-specific determinants of TF DNA-binding specificity, we need to examine how newly activated TFs interact with sequence and preexisting chromatin landscapes to select their binding sites. Here, we present a neural network that jointly models sequence and prior chromatin data to interpret the binding specificity of TFs that have been induced in well-characterized chromatin environments. The network architecture allows us to quantify the degree to which sequence and prior chromatin features explain induced TF binding, both at individual sides and genome-wide.
We apply our approach to characterize differential binding activities across a selection of Forkhead-domain TFs when each is expressed in mouse embryonic stem cells. Despite having similar in vitro DNA-binding preferences, the various Fox TFs different DNA targets, and drive differential gene expression patterns, even when expressed in the same chromatin environment. Using our neural network, we demonstrate that the differential Fox binding activities are explained by a mixture of differential DNA-binding preferences and differential abilities to engage relatively inaccessible chromatin. We propose that modifying preferences for preexisting chromatin states is an important strategy by which evolution enables the functional diversification of paralogous TFs.

11:50-12:10
Proceedings Presentation: ChromDL: A Next-Generation Regulatory DNA Classifier
Room: Salle Rhone 2
Format: Live from venue

  • Christopher Hill, NIH, United States
  • Sanjarbek Hudaiberdiev, NIH, United States
  • Ivan Ovcharenko, NIH, United States


Presentation Overview: Show

Motivation: Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA.
Results: Using a comparative analysis of the performance of thousands of Deep Learning (DL) architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units (BiGRU), convolutional neural networks (CNNs), and bidirectional long short-term memory units (BiLSTM), which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site (TFBS), histone modification (HM), and DNase-I hypersensitive site (DHS) detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor (TF) binding with higher accuracy as compared to previously developed methods and has the potential to accurately delineate TF binding motif specificities. Availability: The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.

12:10-12:30
Proceedings Presentation: An intrinsically interpretable neural network architecture for sequence to function learning
Room: Salle Rhone 2
Format: Live from venue

  • Ali Tugrul Balci, University of Pittsburgh, United States
  • Mark Maher Ebeid, University of Pittsburgh, United States
  • Panayiotis Benos, University of Florida, United States
  • Dennis Kostka, University of Pittsburegh, United States
  • Maria Chikina, University of Pittsburgh, United States


Presentation Overview: Show

Motivation: Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post-hoc analyses, and even then we often cannot explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. Results: tiSFM’s model architecture makes use of convolutions with a fixed set of kernel weights representing known transcription factor (TF) binding site motifs. Analyzing published open chromatin measurements across hematopoietic lineage cell types we demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM’s model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function developmental transition.

13:50-14:30
Invited Presentation: Causal Representation Learning in the Context of Gene Regulation
Room: Salle Rhone 2
Format: Live from venue

  • Caroline Uhler


Presentation Overview: Show

The development of CRISPR-based assays and small molecule screens holds the promise of engineering precise cell state transitions to move cells from one cell type to another or from a diseased state to a healthy state. The main bottleneck is the huge space of possible perturbations/interventions, where even with the breathtaking technological advances in single-cell biology it will never be possible to experimentally perturb all combinations of thousands of genes or compounds. This important biological problem calls for a framework that can integrate data from different modalities to identify causal representations, predict the effect of unseen interventions, and identify the optimal interventions to induce precise cell state transition. Traditional representation learning methods, although often highly successful in predictive tasks, do not generally elucidate causal relationships. In this talk, we will present initial ideas towards building a statistical and computational framework for causal representation learning and its application towards understanding gene regulation and optimal intervention design.

14:30-14:50
Biochemical activity is the default DNA state in eukaryotes
Room: Salle Rhone 2
Format: Live from venue

  • Ishika Luthra, University of British Columbia, Canada
  • Xinyi E. Chen, University of British Columbia, Canada
  • Cassandra Jensen, University of British Columbia, Canada
  • Abdul Muntakim Rafi, University of British Columbia, Canada
  • Asfar Lathif Salaudeen, University of British Columbia, Canada
  • Carl G. de Boer, University of British Columbia, Canada


Presentation Overview: Show

Genomes encode for genes and the regulatory signals that enable those genes to be transcribed, and are continually shaped by evolution. Genomes, including those of human and yeast, encode for numerous regulatory elements and transcripts that have limited evidence of conservation or function. Here, we sought to create a genomic null hypothesis by quantifying the gene regulatory activity of evolutionarily naïve DNA, using RNA-seq of evolutionarily distant DNA expressed in yeast and computational predictions of random DNA activity in human cells and tissues. In yeast, we found that >99% of bases in naïve DNA expressed as part of one or more transcripts. In humans, we found that, while random DNA is predicted to have minimal activity, dinucleotide content-matched randomized DNA is predicted to have much of the regulatory activity of evolved sequences. Naïve human DNA is predicted to be more cell type-specific than evolved DNA and is predicted to generate co-occurring chromatin marks, indicating that these are not reliable indicators of selection. Our results indicate that evolving regulatory activity from naïve DNA is comparatively easy in both yeast and humans, and we expect to see many biochemically active and cell type-specific DNA sequences in the absence of selection.

14:50-15:10
CRISPR-CLEAR - In-Situ Investigation of Genotype-to-Phenotype Relationship with Nucleotide Level Resolution CRISPR saturation mutagenesis screens
Room: Salle Rhone 2
Format: Live from venue

  • Basheer Becerra, Harvard Medical School, Massachusetts General Hospital, Boston Children’s Hospital, United States
  • Martin Jankowiak, BROAD institute, United States
  • Sandra Wittibschlager, St. Anna Children’s Cancer Research Institute (CCRI), Austria
  • Anzhelika Karjalainen, St. Anna Children’s Cancer Research Institute (CCRI), Austria
  • Ana Patricia Kutschat, St. Anna Children’s Cancer Research Institute (CCRI), Austria
  • Ting Wu, Boston Children’s Hospital, Harvard Medical School, BROAD institute, United States
  • Marlena Starrs, Boston Children’s Hospital, Harvard Medical School, BROAD institute, United States
  • Zain Patel, Harvard Medical School, Massachusetts General Hospital, Boston Children’s Hospital, BROAD institute, United States
  • Daniel Bauer, Boston Children’s Hospital, Harvard Medical School, BROAD institute, United States
  • Davide Seruggia, St. Anna Children’s Cancer Research Institute (CCRI), Austria
  • Luca Pinello, Harvard Medical School, Massachusetts General Hospital, Boston Children’s Hospital, BROAD institute, United States


Presentation Overview: Show

CRISPR-CLEAR is a novel assay designed for in-situ examination of genotype-phenotype relationships at nucleotide and variant-level resolution. By integrating CRISPR technologies with cutting-edge computational strategies, we facilitate high-resolution analysis of sequence variants, connecting phenotypes to causal mutations.

As proof of concept, we explored a regulatory element upstream of CD19 in NALM-6 cells, a B-cell leukemia model. We designed 200 sgRNAs tiling a candidate enhancer, and performed four screens using nucleases and base editors. Cells were sorted into CD19+ or CD19- populations according to its expression. We sequenced both sgRNA (perturbation counts) and endogenous targeted regions (direct allele readout) to pinpoint functional sub-regions critical for CD19 regulation.

We developed a Bayesian regression model to investigate genotype-phenotype relationships at nucleotide resolution, assigning importance scores to mutations considering abundance in sorted populations. We identified and validated a mutation hotspot in a 20 bp region corresponding to significant transcription factor binding sites, such as PAX5.

CRISPR-CLEAR outperforms existing methods by providing superior signal detection, functional resolution and offers a powerful tool for investigating genotype-phenotype relationships by directly observing edits at the endogenous locus. This framework promises a comprehensive solution for classifying non-coding variants of uncertain significance and discovering causal regulatory elements.

15:10-15:30
Genome-wide analysis of CRISPR perturbations indicates that enhancers act multiplicatively, but provides no evidence for epistatic-like enhancer interactions
Room: Salle Rhone 2
Format: Live from venue

  • Jessica Zhou, University of California San Diego, Salk Institute for Biological Studies, United States
  • Karthik Guruvayurappan, University of California San Diego, Salk Institute for Biological Studies, United States
  • Graham McVicker, Salk Institute for Biological Studies, University of California San Diego, United States


Presentation Overview: Show

A single gene may be regulated by multiple enhancers, but how these enhancers work in concert to control gene expression is poorly understood. Prior studies have primarily interrogated single loci rather than examining genome-wide interactions between enhancers and have reached inconsistent conclusions about synergistic effects between enhancers. To interrogate interactions between enhancers on a genome-wide scale, we reanalyzed a single-cell CRISPR interference (CRISPRi) screen that delivered random combinations of guide RNAs (gRNAs) targeting putative enhancers to each cell. Using negative binomial generalized linear models (GLMs), we first improved the modeling of enhancer-gene interactions by accounting for variable gRNA efficiencies. We then queried interaction effects between enhancers by modeling 3,808 enhancer pairs throughout the genome where both enhancers were located within 1MB of a common target gene. We found no evidence of strong interaction effects between pairs of enhancers acting on the same gene using this dataset. Rather, the data support a model where the effects of multiple enhancers combine multiplicatively to regulate their target genes. Our results provide a scalable statistical framework for quantifying interactions between enhancers via single-cell RNA-seq data, and support a model where most pairs of enhancers act multiplicatively to control gene expression.

16:00-16:40
Invited Presentation: Analysis for single cell genomics in multi-donor settings
Room: Salle Rhone 2
Format: Live from venue

  • Nir Yosef


Presentation Overview: Show

The popularization of assays for single-cell RNA-sequencing is leading to marked increase in the number of samples that are included in datasets – either ones assembled computationally from multiple sources or ones generated de-novo by individual studies. As part of this trend, an increasing number of studies in humans consider cohorts with complex designs that incorporate hundreds of donors. Current approaches for analyzing sample-level variation in these settings often rely on a simplification of the data such as an aggregation at the cell-type or cell-state-neighborhood level. In this talk, I will discuss our thoughts on developing ways for inferring and interpreting sample-level variation while leveraging the full resolution of single-cell sequencing data. As a leading case, I will present MrVI, a deep generative model for analyzing variation between samples at an arbitrary level of resolution, down to that of a single cell. Leveraging counterfactual calculations, MrVI facilitates meta-data centric analysis – detecting differences between sample groups in terms of both gene expression and cell type composition, while accounting for uncertainties in the data and in the model, and in a manner independent of clustering or cell typing. MrVI also enables unsupervised analysis, discovering for any given cellular context in the data the corresponding stratification of the donor population. I will demonstrate MrVI using the immune aging atlas – a recent collaborative project, which measures the transcriptomes of immune cells in humans across tissues and age groups. We will see how analysis with MrVI and other tools can help unveil key transcriptional signatures that are associated with age for different immune lineages and at different tissue locations.

16:40-17:00
scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single cell RNA-sequencing data
Room: Salle Rhone 2
Format: Live-stream

  • Ziqi Zhang, Georgia Institute of Technology, United States
  • Xinye Zhao, Georgia Institute of Technology, United States
  • Peng Qiu, Georgia Institute of Technology, United States
  • Xiuwei Zhang, Georgia Institute of Technology, United States


Presentation Overview: Show

Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and the biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effects and meaningful condition effects, while perturbation prediction methods solely focus on condition effects, resulting in inaccurate gene expression predictions due to unaccounted batch effects.

We introduce scDisInFact, a deep learning framework that models both batch effect and condition effect among batches in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effects from batch effects, enabling it to simultaneously perform three major tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluated scDisInFact on both simulated and real datasets, and compared its performance to baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating multi-batch multi-condition scRNA-seq data and predicting scRNA-seq data of unseen conditions.

17:00-17:20
Simulating scRNA-seq using causal generative adversarial networks
Room: Salle Rhone 2
Format: Live from venue

  • Seyed Yazdan Zinati, McGill University, Canada
  • Abdulrahman Takiddeen, McGill University, Canada
  • Amin Emad, McGill University, Canada


Presentation Overview: Show

We present GRouNdGAN, a gene regulatory network (GRN)-guided causal implicit generative model for simulation of single-cell RNA-seq data, in-silico perturbation experiments, and benchmarking of GRN inference methods. Through the imposition of a user-defined GRN, describing TF-gene regulatory interactions, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating TFs. Our model is trained on a reference single-cell dataset; it captures non-linear TF-gene dependence and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise with no user manipulation and only implicit parameterization. Despite imposing a rigid constraint on causality, GRouNdGAN outperforms state-of-the-art simulators in generating realistic cells by incorporating domain knowledge through the GRN. GRouNdGAN learns meaningful causal regulatory dynamics allowing sampling from both interventional and observational distributions to synthesize cells under conditions that do not occur in the dataset at inference time, allowing performing in-silico TF knockout and perturbation experiments. Interactions imposed through the GRN are emphasized, resulting in GRN inference algorithms assigning them higher scores than edges not imposed but of equal importance. We used these properties to benchmark various GRN inference algorithms, including those that utilize the concept of pseudo-time.

17:20-17:40
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
Room: Salle Rhone 2
Format: Live from venue

  • Dongyuan Song, University of California, Los Angeles, United States
  • Qingyang Wang, University of California, Los Angeles, United States
  • Guanao Yan, University of California, Los Angeles, United States
  • Tianyang Liu, University of California, Los Angeles, United States
  • Tianyi Sun, Department of Statistics, UCLA, United States
  • Jingyi Jessica Li, University of California, Los Angeles, United States


Presentation Overview: Show

In the single-cell and spatial omics field, computational challenges include method benchmarking, data interpretation, and in silico data generation. To address these challenges, we propose an all-in-one statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs, and feature modalities, by learning interpretable parameters from real datasets. Furthermore, using a unified probabilistic model for single-cell and spatial omics data, scDesign3 can infer biologically meaningful parameters, assess the goodness-of-fit of inferred cell clusters, trajectories, and spatial locations, and generate in silico negative and positive controls for benchmarking computational tools.

17:40-18:00
DragoNNFruit: Learning cis- and trans-regulatory factors of chromatin accessibility profiles at single base and single cell resolution
Room: Salle Rhone 2
Format: Live from venue

  • Jacob Schreiber, Stanford University, United States
  • Surag Nair, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States


Presentation Overview: Show

We introduce DragoNNFruit, a first-of-its-kind approach that jointly models cis- and trans-regulatory factors of genome-wide chromatin accessibility at single-cell and base-pair resolution. By explicitly modeling both cis- and trans-regulatory factors, DragoNNFruit departs from current regulatory modeling approaches which primarily model either cis- or trans-factors alone, and so can model perturbations in both cis- and trans-regulatory components, i.e., variant effect and protein concentration/expression respectively. We showcase DragoNNFruit by modeling single-cell chromatin dynamics across a fibroblast to iPSC reprogramming time course and show that its predictions allow precise timestamping of each enhancer’s activation and repression along the continuous reprogramming trajectory. Through an explicit model of Tn5 sequence bias, DragoNNFruit tracks TF binding footprints that are not visible from the experimental data alone across a continuum of cell states for each enhancer. DragoNNFruit reveals that scATAC-seq encodes differences in TF footprint depths that correlate with TF stoichiometry and motif affinity. These analyses highlight the locus-specific, temporal, and quantitative interplay between cis- and trans-regulatory factors, and demonstrate that DragoNNFruit offers a powerful new paradigm for understanding dynamic regulation of cell fate decisions at single cell resolution.

Wednesday, July 26th
10:30-11:10
Invited Presentation: Probing the relationship between enhancer activity, connectivity, and gene expression
Room: Salle Rhone 2
Format: Live from venue

  • Mikhail Spivakov
11:10-11:30
BootTHiC: Integrating HiC and transcriptomics to detect transcriptional hubs
Room: Salle Rhone 2
Format: Live from venue

  • Vipin Kumar, NCMM, University of Oslo, Norway
  • Fabrizio Guidotti, University of Bologna, Italy
  • Anthony Mathelier, NCMM, University of Oslo, Norway


Presentation Overview: Show

The regulatory mechanisms enabling the genome’s transcriptional agility producing the variety of tissue functions observed in eukaryotes remains an active field of study. In particular, reconciling this complex behaviour with the conventional linear representation of the genome highlights the limits of this narrow description.
The broader spatial reconfiguration the genome goes through to produce adequate transcription or upon disease onset recently highlighted the regulatory importance of genome architecture.
In this study we introduce BootTHiC, a method integrating HiC data and CAGE sequencing to detect candidate transcriptional hubs, we believe constitute a decisive locus of gene regulation.
We support the biological relevance of the detected trancriptional hubs by observing their enrichment in cell-line specific genes,significantly more active enhancers and downregulated genes when compared to matching cancer cell-lines. Put together these observations indicate a likely contribution of our transcriptional hubs in determining cell identity.
We provide BootTHiC as a snakemake pipeline.

11:30-11:50
UniversalEPI: An Attention-based Method to Predict Chromatin Interactions in Unseen and Rare Cell Types
Room: Salle Rhone 2
Format: Live from venue

  • Aayush Grover, Department of Computer Science, ETH Zürich, Switzerland
  • Simeon Häfliger, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland
  • Tuna Acisu, Technical University of Munich, Germany
  • Felix Tockner, Department of Computer Science, ETH Zurich, Switzerland
  • Ignacio Ibarra, Institute of Computational Biology, Computational Health Center, Helmholtz Zentrum Muenchen, Germany
  • Valentina Boeva, Department of Computer Science, ETH Zurich, Switzerland


Presentation Overview: Show

Non-coding mutations represent more than 95% of all cancer variants. Most of these variants do not affect the transcription rate of proximal genes and are therefore overlooked in cancer studies. However, a small proportion of non-coding variants may have vast consequences on gene regulation through modifications in the structure of chromatin and/or modulation of proximal and distal regulatory elements. Therefore, prediction of the chromatin structure in a specific tissue or cancer type is an important task and is crucial for our understanding of the mechanisms of cancer development and progression. While several methods have been shown to accurately predict the chromatin structure for a given cell type, little focus has been given to being able to predict the chromatin structure in unseen cell types. Here, we propose an attention-based deep neural model, called UniversalEPI, that can accurately predict the interactions between regulatory elements in an unseen cell type. We show that UniversalEPI captures transcription factors that are important for chromatin organization accurately and can be used along with open chromatin information to robustly predict chromatin interactions in unseen cell types. This in-silico 3D-modelling of DNA represents a crucial step in evaluating the role of mutational processes in different cancer types.

11:50-12:10
Identifying genetic variants associated with chromatin looping and genome organization
Room: Salle Rhone 2
Format: Live from venue

  • Sourya Bhattacharyya, La Jolla Institute for Immunology, United States
  • Vivek Chandra, La Jolla Institute for Immunology, United States
  • Pandurangan Vijayanand, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States


Presentation Overview: Show

attached as long abstract

12:10-12:30
Proceedings Presentation: Reference panel guided super-resolution inference of Hi-C data
Room: Salle Rhone 2
Format: Live from venue

  • Yanlin Zhang, McGill University, Canada
  • Mathieu Blanchette, McGill University, Canada


Presentation Overview: Show

Motivation: Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C data sets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available, and (ii) the vast majority of local spatial organizations are conserved across multiple cell types.

Results: Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high accuracy mapping of structures such as loops and topologically associating domains.

Availability: https://github.com/BlanchetteLab/RefHiC

Contact: blanchem@cs.mcgill.ca

13:50-14:30
Invited Presentation: Multi-modal learning for single-cell multi-omics data integration
Room: Pasteur Auditorium
Format: Live from venue

  • Laura Cantini, CNRS and Institut Pasteur, France


Presentation Overview: Show

Single-cell RNA sequencing (scRNAseq) is revolutionizing biology and medicine. The possibility to assess cellular heterogeneity at a previously inaccessible resolution, has profoundly impacted our understanding of development, of the immune system functioning and of many diseases. While scRNAseq is now mature, the single-cell technological development has shifted to other large-scale quantitative measurements, a.k.a. ‘omics’, and even spatial positioning. In addition, combined omics measurements profiled from the same single cell are becoming available.

Each single-cell omics presents intrinsic limitations and provides a different and complementary information on the same cell. The current main challenge in computational biology is to design appropriate methods to integrate this wealth of information and translate it into actionable biological knowledge.

In this talk, I will discuss two main computational directions for multi-omics integration, currently explored in my team: (i) joint dimensionality reduction to study cellular heterogeneity simultaneously from multiple omics and (ii) multilayer networks to integrate a large range of interactions between the features of various omics and isolate the regulators underlying cellular heterogeneity.

14:30-14:50
Active repression of alternative cell fates safeguards hepatocyte identity and prevents liver tumorigenesis
Room: Pasteur Auditorium
Format: Live from venue

  • Aryan Kamal, EMBL, Germany
  • Bryce Lim, DKFZ, Germany
  • Juan Adrian Segarra, DKFZ, Germany
  • Ignacio Ibarra, EMBL, Germany
  • Judith Zaugg, EMBL, Germany
  • Moritz Mall, DKFZ, Germany


Presentation Overview: Show

Maintaining a stable cell identity requires suppressing inappropriate transcriptional programs. The current dogma suggests that this is achieved through passive epigenetic silencing. Here, we propose that active transcriptional repression by safeguard repressors is crucial for lifelong cell fate stability. This process prevents the loss of cell identity and errors that may lead to developmental disorders or cancer. To support our proposition, we devised a strategy to identify safeguard repressor candidates using single-cell RNA-seq and transcription factor motif data in eighteen cell types from all germ layers. Then, to investigate whether this mechanism could prevent diseases associated with plasticity, such as cancer, we overexpressed one of the safeguard repressor candidates for hepatocytes in an in vivo model of hepatocarcinoma. We found that exogenous overexpression entirely blocked tumor initiation and improved survival. To understand the mechanism, we used direct cellular reprogramming to assess whether the candidate affects cell fate plasticity. Indeed we found that overexpression of the candidate repressed alternative cell fates by targeting - and thereby repressing - master regulators of several alternative fates. In summary, our findings suggest that cell type-specific safeguard repressors maintain lineage commitment by actively repressing alternative identities.

14:50-15:10
Normal and cancer tissues are accurately characterised by intergenic transcription at RNA polymerase 2 binding sites
Room: Pasteur Auditorium
Format: Live from venue

  • Pierre De Langen, Aix Marseille Univ, INSERM, TAGC, Marseille, France, France
  • Fayrouz Hammal, Aix Marseille Univ, INSERM, TAGC, Marseille, France, France
  • Elise Guéret, Aix Marseille Univ, INSERM, TAGC, Marseille, France, France
  • Lionel Spinelli, Aix Marseille Univ, INSERM, TAGC, Marseille, France, France
  • Benoit Ballester, Aix Marseille Univ, INSERM, TAGC, Marseille, France, France


Presentation Overview: Show

Intergenic transcription in normal and cancerous tissue is pervasive and incompletely understood. To investigate this activity at a global level, we constructed an atlas of over 180,000 consensus RNA Polymerase II (RNAP2) bound intergenic regions from more than 900 RNAP2 ChIP-seq experiments across normal and cancer samples. Using unsupervised analysis, we identified 51 RNAP2 consensus clusters, many of which map to specific biotypes and identify tissue-specific regulatory signatures. We developed a meta-clustering methodology to integrate our RNAP2 atlas with active transcription across 28,797 RNA-seq samples from TCGA, GTEx and ENCODE, which revealed strong tissue- and disease-specific interconnections between RNAP2 occupancy and transcription. We demonstrate that intergenic transcription at RNAP2 bound regions are novel per-cancer and pan-cancer biomarkers showing genomic and clinically relevant characteristics including the ability to differentiate cancer subtypes and are associated with overall survival. Our results demonstrate the effectiveness of coherent data integration to uncover and characterise intergenic transcriptional activity in both normal and cancer tissues.

15:10-15:30
Proceedings Presentation: CLARIFY: Cell-cell interaction and gene regulatory network refinement from spatially resolved transcriptomics
Room: Pasteur Auditorium
Format: Live from venue

  • Mihir Bafna, Georgia Institute of Technology, United States
  • Hechen Li, Georgia Institute of Technology, United States
  • Xiuwei Zhang, Georgia Institute of Technology, United States


Presentation Overview: Show

Motivation: Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to, other, neighboring cells. These cell-cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model.

Results: We propose CLARIFY, a tool that takes GRNs as input, uses them to predict CCIs, and simultaneously, uses the CCIs to refine and output cell-specific GRNs. CLARIFY uses a novel multi-level graph neural network, which mimics cellular networks at a higher level and cell specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using SeqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that either inferred only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks.

16:00-16:40
Invited Presentation: Third-generation sequencing technologies to investigate the complexity of transcriptomes
Room: Pasteur Auditorium
Format: Live from venue

  • Ana Conesa


Presentation Overview: Show

Third-generation sequencing technologies have the potential to obtain full-length transcript sequences and study transcriptome biology with high resolution. Within the LRGASP project, we have extensively benchmarked these methods and identified a significant lack of agreement among methodologies. We also validated many novel, low-expression, and rare isoforms which suggests a transcriptome complexity component that will challenge genome annotation using these approaches.

16:40-17:00
STAN, a computational framework for inferring spatially informed transcription factor activity networks
Room: Pasteur Auditorium
Format: Live from venue

  • April Sagan, University of Pittsburgh, United States
  • Hatice Osmanbeyoglu, University of Pittsburgh, United States


Presentation Overview: Show

Transcription factors (TFs) are important modulators of cell fate and function, responsible for large-scale changes in response to the environment or intercellular communication. Other types of cells in proximity are critical for instructing TF activity levels. Spatially transcriptomics technologies measure genome-wide mRNA expression across thousands of spots on a tissue slice while preserving information about the location of spots. Here, we present STAN (Spatially informed Transcription Factor Activity Network), a computational method to predict spot-specific TF activities by utilizing spatial transcriptomics data and TF – target gene priors. Specifically, we develop a linear mixed-effect model that integrates curated TF target-gene priors, mRNA expression, spatial coordinates, and imaging data to learn gene regulatory programs that predict the expression of target genes. Spatial coordinates and morphological features extracted from corresponding imaging data are used to promote spatially cohesive gene regulatory programs. We apply STAN to lymph node and breast cancer spatial transcriptomics datasets and identify TFs whose activity are differentially varied across cell types, pathological regions, and spatial domains. We also elucidate ligands whose gene expression is associated with TFs in neighboring spots. Overall, STAN enhances the utility of spatial transcriptomics datasets to uncover TF and spatial relationships in diverse cellular states.

17:00-17:20
Proceedings Presentation: scKINETICS: inference of regulatory velocity with single-cell transcriptomics data
Room: Pasteur Auditorium
Format: Live from venue

  • Cassandra Burdziak, Memorial Sloan Kettering Cancer Center, United States
  • Chujun Zhao, Memorial Sloan Kettering Cancer Center/Columbia University, United States
  • Doron Haviv, Memorial Sloan Kettering Cancer Center/Weill Cornell Medicine, United States
  • Direna Alonso-Curbelo, Memorial Sloan Kettering Cancer Center/The Barcelona Institute of Science and Technology, Spain
  • Scott Lowe, Memorial Sloan Kettering Cancer Center/Howard Hughes Medical Institute, United States
  • Dana Pe'Er, Memorial Sloan Kettering Cancer Center/Howard Hughes Medical Institute, United States


Presentation Overview: Show

Motivation: Transcriptional dynamics governed by the action of regulatory proteins are fundamental to systems ranging from normal development to disease progression. There has been substantial progress in deriving mechanistic insight into regulators of static populations with single-cell transcriptomic data, yet prevalent methods tracking phenotypic dynamics are naive to the regulatory drivers of gene expression variability through time.
Results: We introduce scKINETICS (Key regulatory Interaction NETwork for Inferring Cell Speed), a dynamical model of gene expression change which is fit with the simultaneous learning of per-cell transcriptional velocities and a governing gene regulatory network. This is accomplished through an expectation-maximization approach derived to learn the impact of each regulator on its target genes, leveraging biologically-motivated priors from epigenetic data, gene-gene co-expression, and constraints on cells’ future states imposed by the phenotypic manifold. Applying this approach to an acute pancreatitis dataset recapitulates a well-studied axis of acinar-to-ductal trans-differentiation whilst proposing novel regulators of this process, including factors with previously-appreciated roles in driving pancreatic tumorigenesis. In benchmarking experiments, we show that scKINETICS successfully extends and improves existing velocity approaches to generate interpretable, mechanistic models of gene regulatory dynamics.

17:20-17:40
Cell type directed design of synthetic enhancers
Room: Pasteur Auditorium
Format: Live from venue

  • Ibrahim Ihsan Taskiran, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Katina I. Spanier, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Valerie Christiaens, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • David Mauduit, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Stein Aerts, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium


Presentation Overview: Show

Enhancers are the core elements of gene regulatory networks where transcription factors are bound to orchestrate cell-type specific gene expression. We recently studied the regulatory code of enhancers by training AI models on enhancer sequences of neuronal cell types in the Drosophila brain (DeepFlyBrain) and human melanocytes (DeepMEL). However, “what we cannot create, we do not understand”. Here we implemented three different enhancer design strategies: (1) directed sequence evolution; (2) iterative motif implanting; (3) generative design, to generate de novo enhancers with specific spatiotemporal activity patterns for targeted cell types.

The first strategy also proved useful to modify existing genomic sequences, namely: (1) to prune enhancers and making them specific to only one cell-type; (2) to augment enhancers making them active in multiple chosen cell types; and (3) to rescue “lost” enhancers that only have partial enhancer codes.

For each strategy, we evaluated synthetic enhancers in vivo using transgenic flies, and in vitro human cell culture. We investigated the explainability of each method and compared in detail the engineered enhancers with genomic enhancers. In conclusion, enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.

17:40-18:00
A robust statistical framework for genewise single cell differential expression metaanalysis in the context of population based single cell studies.
Room: Pasteur Auditorium
Format: Live from venue

  • Aida Ripoll, Barcelona Supercomputing Center, Spain
  • Maria Sopena, Barcelona Supercomputing Center, Spain
  • Lude Franke, University Medical Center Groningen, Department of Genetics, Groningen,, Netherlands
  • Marc Jan Bonder, German Cancer Research Center, Division of Computational Genomics and Systems Genetics, Heidelberg, Germany, Germany
  • Monique van der Wijt, University Medical Center Groningen, Department of Genetics, Groningen,, Netherlands
  • Marta Mele, Barcelona Supercomputing Center, Spain


Presentation Overview: Show

Single cell RNA sequencing has enabled deciphering the human transcriptome at an unprecedented resolution. As scale, cost, and sensitivity improve, it is now possible to study transcriptomic changes across many individuals. With this aim, we have founded the sc-eQTLGen consortium. Our consortium builds on a federated structure thereby overcoming the necessity to share privacy sensitive data, while concurrently reducing computational load. Here, we expand the sc-eQTLGen setup to study how specific individual traits affect gene expression at single-cell resolution. To do this, we developed a novel statistical framework to conduct a cell type specific differential expression metaanalysis (SiGMetaDE). We applied this framework to several PBMC datasets to study how sex and age affect gene expression. We show that our approach substantially increases the statistical power to detect differentially expressed (DE) genes and identify known and novel Sex and Age DE genes. Our approach provides a solid framework to study the effects of individual traits and environmental conditions on gene regulation across many cohorts, and can be expanded to single cell chromatin accessibility or DNA methylation studies.