Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
B-313: Uncovering the missing genetic mechanisms of neuropsychiatric disorders through multi-omics data integration
Track: RegSys
  • Jingqi Chen, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China
  • Liting Song, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China
  • Jixin Cao, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China
  • Chun-Yi Zac Luo, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China
  • Qihao Guo, Department of Gerontology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai 200233, China, China
  • Jianfeng Feng, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China
  • Xing-Ming Zhao, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China, China


Presentation Overview: Show

The genetics underlying neuropsychiatric disorders are far from being fully understood. Moreover, the neurological and psychiatric disorders have overlapping phenotypic profiles, but the underlying tissue-specific functional processes are largely unknown.We explored the shared tissue-specificity among 14 neuropsychiatric disorders through the disrupted long-range gene regulations. Averagely 38.0% and 17.2% of the intergenic regulatory SNPs can be linked to target genes in brain and non-brain tissues, respectively. Interestingly, while the target genes in the brain tend to enrich in nervous system development related processes, those in the non-brain tissues are inclined to interfere with synapse and neuroinflammation. Compared to psychiatric disorders, neurological disorders present more prominently the neuroinflammatory processes in both brain and non-brain tissues. We then constructed a disorder similarity network across two brain and three non-brain tissues, highlighting unexpected disorder clusters (e.g. Parkinson’s disease is consistently grouped with psychiatric disorders). We showcase the potential pharmaceutical applications of the small bowel and its disorder clusters, exemplified by the known drug targets NR1I3 and NFACT1. Additionally, zooming in to one of the disorders, Alzheimer's disease (AD), we have performed transcriptome analysis for preclinical AD in a Chinese cohort, and have identified the type I interferon signaling pathway as a novel biomarker.

B-314: scReadSim: a single-cell RNA-seq and ATAC-seq read simulator
Track: RegSys
  • Guanao Yan, University of California, Los Angeles, United States
  • Jingyi Jessica Li, University of California, Los Angeles, United States


Presentation Overview: Show

Benchmarking single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) computational tools demands simulators to generate realistic sequencing reads. However, none of the few read simulators aim to mimic real data. To fill this gap, we introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in FASTQ and BAM formats) by mimicking real data. At both read-sequence and read-count levels, scReadSim mimics real scRNA-seq and scATAC-seq data. Moreover, scReadSim provides ground truths, including unique molecular identifier counts for scRNA-seq and open chromatin regions for scATAC-seq. In particular, scReadSim allows users to design cell-type-specific ground-truth open chromatin regions for scATAC-seq data generation. In benchmark applications of scReadSim, we show that cellranger is a preferred scRNA-seq UMI deduplication tool, and HMMRATAC and MACS3 achieve top performance in scATAC-seq peak calling.

B-315: A continuous epistasis model for predicting growth rate given combinatorial variation in gene expression and environment
Track: RegSys
  • Ryan Otto, The University of Texas Southwestern Medical Center, United States
  • Agata Turska-Nowak, Polish Stem Cells Bank, Warsaw, Poland
  • Philip Brown, The University of Texas Southwestern Medical Center, United States
  • Kim Reynolds, The University of Texas Southwestern Medical Center, United States


Presentation Overview: Show

The ability to predict changes in cellular growth rate given variation in gene expression is critical to interpret disease-causing mutations, engineer biosynthetic pathways, and understand evolutionary constraints on mRNA abundance. However, the relationship between gene expression and growth rate is nonlinear and shaped by environmental and genetic context. Consequently, genetic studies focused on discrete perturbations with strong effects like total knockouts or complete nutrient depletion miss critical information relevant to organismal variation, evolution, and human disease. To address these challenges, we developed a strategy in which we (1) measured the growth rate effects of titrated changes in enzyme expression under varied environmental contexts using CRISPR interference and (2) used a subset of these data to train an interpretable, low-parameter model of E. coli growth. When tested across thousands of measurements, we found that models trained using a sparse subset of our experimental data – sampling only pairwise perturbations over genes and environments – were sufficient to predict the effects of higher-order combinations of up to four expression and environmental perturbations. This framework provides a strategy for characterizing the growth rate effects of altering gene expression across entire metabolic pathways, or even genomes, in varied environments using sparsely sampled, low-order measurements.

B-316: HYENA detects non-coding genes activated by distal enhancers in cancer
Track: RegSys
  • Lixing Yang, University of Chicago, United States
  • Ali Yesilkanal, University of Chicago, United States
  • Anqi Yu, University of Chicago, United States


Presentation Overview: Show

Somatic structural variations (SVs) in cancer can shuffle DNA content in the genome, relocate regulatory elements, and alter genome organization. Enhancer hijacking occurs when SVs relocate distal enhancers to activate gene expression. However, most enhancer hijacking studies have only focused on protein-coding genes. Here, we develop a computational algorithm “HYENA” to identify candidate oncogenes (both protein-coding and non-coding) activated by enhancer hijacking based on tumor whole-genome and transcriptome sequencing data. HYENA detects genes whose elevated expression is associated with somatic SVs by using a rank-based regression model. We systematically analyze 1,148 tumors across 25 types of adult tumors and identify a total of 192 candidate oncogenes including many non-coding genes. A long non-coding RNA TOB1-AS1 is activated by various types of SVs in 10% of pancreatic cancers through altered 3-dimension genome structure. We find that high expression of TOB1-AS1 can promote cell invasion and metastasis. Our study highlights the contribution of genetic alterations in non-coding regions to tumorigenesis and tumor progression.

B-317: Using network propagation to walk through the enhancer-gene regulatory network
Track: RegSys
  • Dennis Hecker, Goethe University Frankfurt, Germany
  • Nina Baumgarten, Goethe University Frankfurt, Germany
  • Marcel Schulz, Goethe University Frankfurt, Germany


Presentation Overview: Show

Enhancers, also referred to as regulatory elements, are key players in the regulation of gene expression. While there is a variety of approaches to annotate and identify enhancers in the genome, there are just as many methods to subsequently predict their target genes. Popular data modalities in this endeavour are measurements that indicate enhancer activity – such as chromatin accessibility or specific histone modifications – as well as chromatin contacts which give insights into the 3D connectivity. Rarely considered are 3D contacts between enhancers, despite increasing evidence that enhancers are not independent of each other and can act synergistically. We model the interactome of enhancers and genes as a joint network of enhancer-gene and enhancer-enhancer interactions with the contact frequency between nodes as edge weights. Based on this weighted graph we use network propagation, more specifically random walk with reset, to propagate the activity of enhancers through the network. This setup allows to estimate the importance of individual enhancers by examining their assortativity, and to predict their effect on the expression of individual genes. We validate our model on three CRISPRi-screens, eQTL data, and its performance for predicting gene expression.

B-318: Multi-omics topic modelling for cancer subtype identification and characterization.
Track: RegSys
  • Michele Caselle, Torino university, Italy


Presentation Overview: Show

Topic modeling is an innovative approach to perform community detection on bipartite networks and has been recently shown to be a powerful tool to extract information from gene expression datasets. In this contribution, focusing on breast cancer, we present a new network-based topic modeling algorithm running on multi-branch networks.The algorithm is based on a hierarchical version of stochastic block modelling. It can be used to integrate any combination of 'omics data and allows to learn from multiple sources of data concurrently. We show that, integrating the information from microRNAs and protein-coding mRNAs in a breast cancer dataset taken from TCGA, it leads to an overall improvement in the discrimination between tumours and normal samples and in the identification of breast cancer subtypes. Taking advantage of the probabilistic nature of topic modelling, we investigate which microRNAs and protein coding genes are more effective in discriminating among different cancer subtypes or in affecting the survival probability of patients. We also discuss a similar analysis performed on single cell data of luminal and triple negative breast cancer cells combining protein-coding mRNAs and lncRNAs. We identify in this way a few lncRNAs strongly associated with specific cancer related cell clusters.

B-319: Using Parenclitic networks to study DNA methylation alterations in aggressive neuroendocrine cancer
Track: RegSys
  • Dimitria Brempou, King's College London, United Kingdom
  • Louise Izatt, Guy’s & St Thomas' NHS Foundation Trust, United Kingdom
  • Cynthia Andoniadou, King's College London, United Kingdom
  • Rebecca Oakey, King's College London, United Kingdom


Presentation Overview: Show

Pheochromocytomas and Paragangliomas, collectively referred to as PPGLs, are rare neuroendocrine tumours associated with genetic pathogenic variants. Despite their high degree of heritability, PPGLs present large phenotypic variability, which remains unexplained. PPGLs with pathogenic variants in the gene SDHB are associated with higher rates of aggressive phenotype and are characterised by global hypermethylation. DNA methylation is an epigenetic mechanism regulating gene expression and is often perturbated in cancer. Understanding the DNA methylation perturbations between aggressive and non-aggressive PPGLs is essential for decoding the molecular mechanism of aggressive progression and developing personalised treatment options.
In order to overcome common limitations of DNA methylation approaches, we built a model of the DNA methylation status in non-aggressive cases and studied the deviation of aggressive cases from this model. We represented the DNA methylation status of each PPGL sample by a parenclitic network and extracted the topological features. Leveraging these features, we were able to predict the phenotype of PPGL samples based on their DNA methylation status. This approach allowed us to overcome the “small n, big k” problem without sacrificing the biological interpretability of the data. Our results provide evidence of distinctive DNA methylation patterns in aggressive PPGLs with predictive potential.

B-320: Consensus peaks of chromatin accessibility in the human genome
Track: RegSys
  • Qiuchen Meng, Tsinghua University, China
  • Xinze Wu, Tsinghua University, China
  • Lei Wei, Tsinghua University, China
  • Xuegong Zhang, Tsinghua University, China


Presentation Overview: Show

Chromatin accessibility profiling methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been promoting the identification of gene regulatory elements and the characterization of epigenetic landscapes. Unlike gene expression data, there is no consistent reference for chromatin accessibility data, which hinders large-scale integration analysis. By analyzing many more than 1000 ATAC-seq samples and 100 scATAC-seq samples, we found that cells share the same set of potential open regions. We thus proposed a reference called consensus peaks (cPeaks) to represent open regions across different cell types, and developed a deep-learning model to predict all potential open regions in the human genome. We showed that cPeaks can be regarded as a new set of epigenomic elements in the human genome, and using cPeaks can increase the performance of cell annotations and facilitate the discovery of rare cell types. cPeaks also performed well in analyzing dynamic biological processes and diseases. cPeaks can serve as a general reference for epigenetic studies, much like the reference genome for genomic studies, making the research faster, more accurate, and more scalable.

B-320: Consensus peaks of chromatin accessibility in the human genome
Track: RegSys
  • Qiuchen Meng, Tsinghua University, China
  • Xinze Wu, Tsinghua University, China
  • Lei Wei, Tsinghua University, China
  • Xuegong Zhang, Tsinghua University, China


Presentation Overview: Show

Chromatin accessibility profiling methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been promoting the identification of gene regulatory elements and the characterization of epigenetic landscapes. Unlike gene expression data, there is no consistent reference for chromatin accessibility data, which hinders large-scale integration analysis. By analyzing many more than 1000 ATAC-seq samples and 100 scATAC-seq samples, we found that cells share the same set of potential open regions. We thus proposed a reference called consensus peaks (cPeaks) to represent open regions across different cell types, and developed a deep-learning model to predict all potential open regions in the human genome. We showed that cPeaks can be regarded as a new set of epigenomic elements in the human genome, and using cPeaks can increase the performance of cell annotations and facilitate the discovery of rare cell types. cPeaks also performed well in analyzing dynamic biological processes and diseases. cPeaks can serve as a general reference for epigenetic studies, much like the reference genome for genomic studies, making the research faster, more accurate, and more scalable.

B-321: Integrating whole genome sequencing, methylation, gene expression, topological associated domain information in regulatory mutation prediction: A study of follicular lymphoma
Track: RegSys
  • Amna Farooq, Oslo University Hospital - Norwegian Radium Hospital, Norway
  • Gunhild Trøen, Oslo University Hospital - Norwegian Radium Hospital, Norway
  • Jan Delabie, University Health Network and University of Toronto, Canada
  • Junbai Wang, University of Oslo, Norway


Presentation Overview: Show

A major challenge in human genetics is of the analysis of the interplay between genetic and epigenetic factors in a multifactorial disease like cancer. Here, a novel methodology is proposed to investigate genome-wide regulatory mechanisms in cancer, as studied with the example of follicular Lymphoma (FL). In a first phase, a new machine-learning method is designed to identify Differentially Methylated Regions (DMRs) by computing six attributes. In a second phase, an integrative data analysis method is developed to study regulatory mutations in FL, by considering differential methylation information together with DNA sequence variation, differential gene expression, 3D organization of genome (e.g., topologically associated domains), and enriched biological pathways. Resulting mutation block-gene pairs are further ranked to find out the significant ones. By this approach, BCL2 and BCL6 were identified as top-ranking FL-related genes with several mutation blocks and DMRs acting on their regulatory regions. Two additional genes, CDCA4 and CTSO, were also found in top rank with significant DNA sequence variation and differential methylation in neighboring areas, pointing towards their potential use as biomarkers for FL. This work combines both genomic and epigenomic information to investigate genome-wide gene regulatory mechanisms in cancer and contribute to devising novel treatment strategies.

B-322: Influence of sequencing platforms on single-cell gene regulatory network properties
Track: RegSys
  • Jens Uwe Loers, Ghent University, Belgium
  • Joke Deschildre, Ghent University, Belgium
  • Vanessa Vermeirssen, Ghent University, Belgium


Presentation Overview: Show

Gene regulatory networks (GRNs) inferred from single-cell RNA-sequencing data (scRNA-seq) are great models to study diseases in a cell state-specific way. For this, several options regarding experimental platforms, data integration, and GRN inference methodologies are available. Two distinct technology platforms are widely used: droplet-based methods sequence large amounts of cells with low read counts per cell, and plate-based methods sequence fewer cells with higher read counts. So far, platform comparisons were limited to simple downstream analyses e.g., differential expression, and it remains unclear how platform-specific differences affect GRN inference. We performed a technical comparison of GRNs from SCENIC and Inferelator3.0, based on glioblastoma patient and retinal organoid samples, sequenced with both droplet-based methods (10x) and plate-based methods (SMART-seq2, FLASH-seq). We compared regulon activity calculated with JASMINE and AUCell, which for both platforms exhibited similar performance in clustering cells into cell states. Furthermore, we worked out differences and similarities between multiple platforms, such as variations in regulon size, regulon composition, precision and recall of known interactions, and cancer marker recovery. We conclude, that both platforms can deliver biologically relevant GRNs, but suffer from the curse of dimensionality and will benefit from higher cell counts, especially when studying rare cell types.

B-323: Multi-omics analysis in primary T cells elucidates mechanisms behind disease associated genetic loci.
Track: RegSys
  • Chuan Fu Yap, The University of Manchester, United Kingdom
  • Gisela Orozco, The University of Manchester, United Kingdom
  • Magnus Rattray, The University of Manchester, United Kingdom
  • Pauline Ho, The University of Manchester, United Kingdom
  • Anne Barton, The University of Manchester, United Kingdom
  • John Bowes, The University of Manchester, United Kingdom
  • Stephen Eyre, The University of Manchester, United Kingdom
  • Paul Martin, The University of Manchester, United Kingdom
  • Darren Plant, The University of Manchester, United Kingdom
  • Chenfu Shi, The University of Manchester, United Kingdom
  • Stefano Rossi, The University of Manchester, United Kingdom
  • Ryan Hum, The University of Manchester, United Kingdom
  • Antonios Frantzeskos, The University of Manchester, United Kingdom
  • Charlotte Wynn, The University of Manchester, United Kingdom
  • Carlo Ferrazzano, The University of Manchester, United Kingdom
  • James Ding, The University of Manchester, United Kingdom
  • Danyun Zhao, The University of Manchester, United Kingdom


Presentation Overview: Show

Genome-Wide Association Studies (GWAS) have identified the genetic variants associated with many traits and diseases. These variants predominantly affect regulatory elements, which can alter the expression of distal genes. Previous studies have used functional genomics to study these variants, but they have mostly relied on data from cell lines, which differ significantly from primary cells, and often used small sample sizes with few or no replicates.
Here, we present the largest collection of chromatin conformation maps to date, along with matching ATAC-seq and RNA-seq data from primary T cells obtained from 55 psoriatic arthritis (PsA) patients and 19 healthy control subjects. To manage this large dataset, we developed innovative analysis methods, which allowed us to examine GWAS loci linked to autoimmune diseases in greater depth than previously possible. For example, the RA locus rs13396472 is assigned to the gene ACOXL, since the variants in LD are in the intron of ACOXL. However, we find that the variant rs13401811 displays strong allelic imbalance in chromatin accessibility. We find that the enhancer affected by the variant is linked by chromatin looping to the promoter of BCL2L11 and the interactions around the locus are correlated with its activity.

B-324: T3E: a tool for characterising the epigenetic profile of transposable elements using ChIP-seq data
Track: RegSys
  • Michelle Almeida da Paz, Graz University of Technology, Austria
  • Leila Taher, Graz University of Technology, Austria


Presentation Overview: Show

The epigenetic profiles of noncoding sequences in the human genome have been assessed by Chromatin Immunoprecipitation Sequencing (ChIP-seq) by large international efforts like ENCODE. However, such analyses have traditionally disregarded transposable elements (TEs). The repetitive nature of TEs results in ambiguously mapping ChIP-seq reads, which makes the characterization of the epigenetic profiles of TEs technically challenging. Furthermore, standard approaches to ChIP-seq enrichment analysis randomly permute the genomic location of the read mappings to build a background. We demonstrate that backgrounds constructed using such approaches do not reflect experimental biases and can result in artifactual enrichment. To address these problems and unveil the functional properties of TEs, we developed the Transposable Element Enrichment Estimator (T3E) tool. Specifically, T3E estimates the read mapping coverage of TE families/subfamilies at a single-nucleotide resolution by weighting the number of read mappings associated with a TE family/subfamily by the total number of loci to which the corresponding reads map in the genome. Additionally, T3E computes ChIP-seq enrichment relative to a background constructed based on the structure of ChIP-seq input control. We show that T3E is able to detect context-specific enrichments at TEs by examining several ChIP-seq datasets in human and mouse.

B-325: Gene regulatory networks inference in the pigs embryos from scRNAseq and scMulti-omics
Track: RegSys
  • Adrien Dufour, Université Paris Saclay, INRAE, AgroParisTech, GABI, Domaine de Vilvert, 78350 Jouy en Josas, France, France
  • Doriane Guion, Université Paris Saclay, INRAE, AgroParisTech, GABI, Domaine de Vilvert, 78350 Jouy en Josas, France, France
  • Cyril Kurilo, Université de Toulouse, INRAE, ENVT, GenPhySE, Chemin de Borde Rouge, 31326 Castanet-Tolosan, France, France
  • Jan Stockl, Ludwig-Maximilians-Universität München, Genzentrum, Feodor-Lynen-Str. 25, 81377 München, Germany, Germany
  • Denis Laloe, Université Paris Saclay, INRAE, AgroParisTech, GABI, Domaine de Vilvert, 78350 Jouy en Josas, France, France
  • Yoann Bailly, INRAE, GenESI, La Gouvanière, 86480 Rouillé, France, France
  • Patrick Manceau, INRAE, GenESI, La Gouvanière, 86480 Rouillé, France, France
  • Frédéric Martins, Université de Lyon, Inserm, INRAE, SBRI, 18 Av. du Doyen Jean Lépine, 69500 Bron, France, France
  • Stéphane Ferchaud, INRAE, GenESI, La Gouvanière, 86480 Rouillé, France, France
  • Thomas Frolich, Ludwig-Maximilians-Universität München, Genzentrum, Feodor-Lynen-Str. 25, 81377 München, Germany, Germany
  • Sylvain Foissac, Université de Toulouse, INRAE, ENVT, GenPhySE, Chemin de Borde Rouge, 31326 Castanet-Tolosan, France, France
  • Jérome Artus, Université Paris Saclay, Inserm, UMRS1310, 7 rue Guy Moquet, 94800 Villejuif, France, France
  • Hervé Acloque, Université Paris Saclay, INRAE, AgroParisTech, GABI, Domaine de Vilvert, 78350 Jouy en Josas, France, France


Presentation Overview: Show

Single-cell technologies are powerful tools to unravel tissue complexity at the cellular scale and to identify new cell populations. They have been widely applied to regulatory genomics and novel methods based on single-cell multi-omics (scRNAseq and scATACseq combined) have opened new possibilities. To illustrate their power and limitations, I will take the example of my PhD work on the pig pre-implantation blastocyst. Embryos have been withdrawn from different developmental stages (early, expanded, spherical and ovoid blastocysts) from which we performed single-cell RNAseq, single-cell multi-omics and uterine fluids proteomics. I will discuss the advantages and inconveniences of single-cell RNAseq and single-cell multi-omics for gene regulation inference through the uses of SCENIC and SCENIC+ pipelines. We are going to connect those regulons to the receptors expressed by the cells using the CellComm packages and finally, we will seek possible interaction between the uterine fluid's proteome and the embryo.

B-326: DysRegNet: Patient-specific and confounder-aware dysregulated network inference
Track: RegSys
  • Johannes Kersting, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany, Germany
  • Olga Lazareva, Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany, Germany
  • Zakaria Louadi, Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany, Germany
  • Jan Baumbach, Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany, Germany
  • David B. Blumenthal, Department Artificial Intelligence in Biomedical Engineering (AIBE), FAU, Erlangen, Germany, Germany
  • Markus List, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany, Germany


Presentation Overview: Show

Gene regulation is frequently altered in diseases in unique and often patient-specific ways. Hence, personalized strategies have been proposed to infer patient-specific gene-regulatory networks. However, existing methods do not focus on disease-specific dysregulation or lack assessments of statistical significance. Moreover, they do not account for clinically important confounders such as age, sex or treatment history.

To overcome these shortcomings, we present DysRegNet, a novel method for inferring patient-specific regulatory alterations (dysregulations) from gene expression profiles. We compared DysRegNet to state-of-the-art methods and demonstrated that DysRegNet produces more interpretable and biologically meaningful networks. Independent information on promoter methylation and single nucleotide variants further corroborate our results. We apply DysRegNet to eleven TCGA cancer types and illustrate how the inferred networks can be used for down-stream analysis. We show that unique as well as cancer-type-specific dysregulation patterns exist and highlight immune-related mechanisms that are not obvious when focusing on individual genes rather than their interactions.

B-327: A robust statistical framework for genewise single cell differential expression metaanalysis in the context of population based single cell studies.
Track: RegSys
  • Aida Ripoll, Barcelona Supercomputing Center, Spain
  • Maria Sopena, Barcelona Supercomputing Center, Spain
  • Lude Franke, University Medical Center Groningen, Department of Genetics, Groningen,, Netherlands
  • Marc Jan Bonder, German Cancer Research Center, Division of Computational Genomics and Systems Genetics, Heidelberg, Germany, Germany
  • Monique van der Wijt, University Medical Center Groningen, Department of Genetics, Groningen,, Netherlands
  • Marta Mele, Barcelona Supercomputing Center, Spain


Presentation Overview: Show

Single cell RNA sequencing has enabled deciphering the human transcriptome at an unprecedented resolution. As scale, cost, and sensitivity improve, it is now possible to study transcriptomic changes across many individuals. With this aim, we have founded the sc-eQTLGen consortium. Our consortium builds on a federated structure thereby overcoming the necessity to share privacy sensitive data, while concurrently reducing computational load. Here, we expand the sc-eQTLGen setup to study how specific individual traits affect gene expression at single-cell resolution. To do this, we developed a novel statistical framework to conduct a cell type specific differential expression metaanalysis (SiGMetaDE). We applied this framework to several PBMC datasets to study how sex and age affect gene expression. We show that our approach substantially increases the statistical power to detect differentially expressed (DE) genes and identify known and novel Sex and Age DE genes. Our approach provides a solid framework to study the effects of individual traits and environmental conditions on gene regulation across many cohorts, and can be expanded to single cell chromatin accessibility or DNA methylation studies.

B-328: Gene regulatory network remodeling through embryonic development trajectories
Track: RegSys
  • Celine Sin, Max Perutz Labs | CeMM, Austria
  • Salvo Danilo Lombardo, Max Perutz Labs | CeMM, Austria
  • Daniel Malzl, Max Perutz Labs | CeMM, Austria
  • Jörg Menche, Max Perutz Labs | University of Vienna, Austria


Presentation Overview: Show

During embryo development, gene regulatory networks orchestrate dynamically changing gene expression programs to enable the complex process of coordinated cell differentiation juxtaposed with stable maintenance of diverse cell types. While a handful of key transcription factors have been identified to promote certain differentiation events, it is not clear how gene regulatory networks can produce the dynamic range of gene expression programs. Network theory has demonstrated numerous relationships between the structure of networks and the dynamic processes arising from them, but these mathematically rigorous principles have not been linked to gene regulatory networks. We build an embryo development atlas from conception to organogenesis, harmonizing 21 scRNA-seq datasets (5 million cells) to build a series of gene regulatory networks resolved to cell types and developmental time. With the harmonized data, we were able to trace pseudotime trajectories and reconstruct the transcriptomic manifold to evaluate for expression program stability, attraction points, and bifurcation points. These local manifold properties are tied to the dynamic processes controlling cell fates and by linking them to topological structure of the underlying gene regulatory networks, we will better understand gene regulatory network remodeling and how this contributes to the enormous dynamic range of gene expression programs. Our findings guide future research in cell differentiation and are particularly relevant for the development of early embryo organoids.

B-329: sc-STITCHIT: Predicting gene-specific regulation with transcriptomic and epigenetic single-cell data
Track: RegSys
  • Laura Rumpf, Goethe University Frankfurt Main, Germany
  • Fatemeh Behjati, Goethe University Frankfurt Main, Germany
  • Dennis Hecker, Goethe University Frankfurt Main, Germany
  • Florian Schmidt, Immunoscape, Singapore
  • Marcel Schulz, Goethe University Frankfurt Main, Germany


Presentation Overview: Show

To understand gene regulation mechanisms, it is essential to fathom out the role of enhancers, also called regulatory elements. Integrative analysis of single-cell epigenetic and transcriptomic data can be used to gain insights into gene-expression regulation in specific phenotypes.

For this purpose, we introduce the sc-STITCHIT pipeline as a single-cell extension of the STITCHIT algorithm. STITCHIT utilizes both epigenetic and transcriptomic information to learn a regression-based model per gene. Firstly, it identifies informative regions in a segmentation step and successively predicts the gene’s expression based on the regions’ accessibility. The coefficients learned by the model can then be used to prioritize enhancers of the gene.

We address the inherent sparsity of single-cell data by summarizing the epigenetic and transcriptomic signal of individual cells into metacells based on the similarity of their gene activity measurements.

The sc-STITCHIT pipeline has been successfully applied to a human blood single-cell dataset to identify immune cell-specific enhancer-gene interactions. sc-STITCHIT enables large-scale analysis of scATAC and scRNA-seq data in an automated fashion. It allows time-efficient analysis and obtains reliable models of gene expression, which can be used to study gene regulatory elements in any organism for which the data becomes available.

B-329: sc-STITCHIT: Predicting gene-specific regulation with transcriptomic and epigenetic single-cell data
Track: RegSys
  • Laura Rumpf, Goethe University Frankfurt Main, Germany
  • Fatemeh Behjati, Goethe University Frankfurt Main, Germany
  • Dennis Hecker, Goethe University Frankfurt Main, Germany
  • Florian Schmidt, Immunoscape, Singapore
  • Marcel Schulz, Goethe University Frankfurt Main, Germany


Presentation Overview: Show

To understand gene regulation mechanisms, it is essential to fathom out the role of enhancers, also called regulatory elements. Integrative analysis of single-cell epigenetic and transcriptomic data can be used to gain insights into gene-expression regulation in specific phenotypes.

For this purpose, we introduce the sc-STITCHIT pipeline as a single-cell extension of the STITCHIT algorithm. STITCHIT utilizes both epigenetic and transcriptomic information to learn a regression-based model per gene. Firstly, it identifies informative regions in a segmentation step and successively predicts the gene’s expression based on the regions’ accessibility. The coefficients learned by the model can then be used to prioritize enhancers of the gene.

We address the inherent sparsity of single-cell data by summarizing the epigenetic and transcriptomic signal of individual cells into metacells based on the similarity of their gene activity measurements.

The sc-STITCHIT pipeline has been successfully applied to a human blood single-cell dataset to identify immune cell-specific enhancer-gene interactions. sc-STITCHIT enables large-scale analysis of scATAC and scRNA-seq data in an automated fashion. It allows time-efficient analysis and obtains reliable models of gene expression, which can be used to study gene regulatory elements in any organism for which the data becomes available.

B-330: SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks
Track: RegSys
  • Seppe De Winter, KU Leuven & VIB, Belgium
  • Carmen Bravo Gonzalez-Blas, KU Leuven & VIB, Belgium
  • Gert Hulselmans, KU Leuven & VIB, Belgium
  • Nikolai Hecker, KU Leuven & VIB, Belgium
  • Irina Matetovici, VIB, Belgium
  • Valerie Christiaens, KU Leuven & VIB, Belgium
  • Suresh Poovathingal, VIB, Belgium
  • Jasper Wouters, KU Leuven & VIB, Belgium
  • Sara Aibar, KU Leuven & VIB, Belgium
  • Stein Aerts, KU Leuven & VIB, Belgium


Presentation Overview: Show

cis-regulatory elements (CREs) integrate information of the cell’s state by consisting of specific combinations of transcription factor binding sites (TFBS). This drives cell type specific gene expression and underlies cell type identity. Due to challenges associated with high-throughput experimental identification of TFBS we lack gene regulatory networks (GRNs) that include CREs as nodes. Here we present SCENIC+, a computational method that accurately infers such enhancer-driven GRNs using single-cell chromatin accessibility and gene expression data combined with sequence analysis. For this, we curated a motif collection with more than 30k position weight matrices and benchmarked SCENIC+ on diverse data sets, including human peripheral blood mononuclear cells, ENCODE cell lines, melanoma cell states and Drosophila retinal development. We show that, SCENIC+ can be used to answer biological questions including conservation of GRNs across human and mouse cerebral cortex, to predict the effect of TF perturbations and to study dynamics of gene regulation along differentiation trajectories. Finally, we applied SCENIC+ to a complex multiome dataset generated on differentiating human neural tube organoids and combined its inference with deep learning to identify enhancers, and their code, underlying neuron progenitor patterning, neurogenesis and neural crest differentiation, which we validated experimentally in vitro and in vivo.

B-331: CREMA: Automated modelling of genome-wide chromatin state in terms of local constellations of regulatory sites.
Track: RegSys
  • Anne Kraemer, Biozentrum, University of Basel, Switzerland
  • Mikhail Pachkov, Biozentrum, University of Basel, Switzerland
  • Erik van Nimwegen, Biozentrum, University of Basel, Switzerland


Presentation Overview: Show

Understanding the key players and interactions in the regulatory networks that control gene expression and chromatin state across different cell types and tissues in metazoans remains one of the central challenges in systems biology. Our laboratory has pioneered a number of methods for automatically inferring core gene regulatory networks directly from high-throughput data by modeling gene expression (RNA-seq) and chromatin state (ChIP-seq) measurements in terms of genome-wide computational predictions of regulatory sites for hundreds of transcription factors and micro-RNAs. The most recent development is an automated system for modeling dynamics of genome-wide chromatin state in terms of local constellations of regulatory sites. The CREMA analysis tool call significant peaks across a set of samples, use curated set of transcription factor motifs to predict transcription factor binding sites within the inferred peaks and finally uses a linear model to infer transcription factor activity and identify regulators best explaining observed chromatin changes across experimental conditions. For ChIP/ATAC/DNase-Seq data CREMA infers significant peaks, key regulators and their activities, regulator targets, peak-to-gene associations and presumable pathways targeted by regulators.

We believe that such automated service would be of great help for the scientific community in developing understanding of regulatory processes in various systems.

B-332: Single-cell RNA sequencing reveals a rewiring of transcriptional relationships in Alzheimer’s Disease associated with risk variants
Track: RegSys
  • Gerard Bouland, Delft University of Technology, Netherlands
  • Kevin Marinus, VU University Amsterdam, Netherlands
  • Ronald van Kesteren, VU University Amsterdam, Netherlands
  • Guus Smit, VU University Amsterdam, Netherlands
  • Ahmed Mahfouz, Leiden University Medical Center, Netherlands
  • Marcel Reinders, Delft University of Technology, Netherlands


Presentation Overview: Show

Alzheimer’s Disease is a progressive neurodegenerative disease characterized by loss of cognitive functions and autonomy, eventually leading to death. Many hypotheses about the etiology of AD exists, highlighting the complexity of AD. Genome-wide association studies have provided a compendium of genomic loci that are associated with the risk for AD. However, understanding how these risk variants contribute to AD etiology remains a challenge. We studied pairs of genes that show a change in correlation between AD and CT individuals in a cell type-specific manner. Using seven scRNAseq datasets (>1.3 million cells) from 177 individuals (87 AD, 90 controls), we created a network of differentially correlated gene pairs to identify key regulators involved in AD. Using the number of differential correlations (hubness) to rank genes located near AD genetic risk variants, we prioritized known causal genes and identified potentially novel ones. Altogether, we have provided insight into cell type-specific and coordinated transcriptional changes between AD and healthy individuals and pin-pointed putative key regulators. Most importantly, we have provided a prioritization scheme that identifies probable causal genes, as well as cell types that undergo most changes, by superimposing the set of most differentially correlated genes onto genes located near AD risk variants.

B-333: Single-cell gene regulatory network inference and comparison using GRaNIE and the GRaNIEverse
Track: RegSys
  • Christian Arnold, EMBL, Germany
  • Judith Zaugg, EMBL, Germany
  • Rim Moussa, EMBL, Germany


Presentation Overview: Show

Enhancers are crucial in cell-type-specific gene regulation and in mediating the impact of noncoding genetic variants associated with complex traits. Their activity is regulated by transcription factors (TFs), epigenetic mechanisms and genetic variants. Gene regulatory networks (GRNs) are powerful tools to study cell-type-specific interactions between TFs, regulatory elements and genes. We recently introduced GRaNIE (Gene Regulatory Network Inference including Enhancers) to build enhancer-mediated GRNs (eGRNs) based on covariation of chromatin accessibility and RNA-seq across samples (https://grp-zaugg.embl-community.io/GRaNIE/). While originally designed for bulk data, we here show suitability of GRaNIE for single-cell GRN inference. Furthermore, we introduce the GRaNIEverse package that complements GRaNIE for single-cell preprocessing, running GRaNIE in batch mode, and post-processing steps such as comparing different GRNs (from other tools) and eGRNS among each other.
We inferred single cell eGRNs from over 15 independent 10x multiome datasets, which comprise a diverse mixture of technical and biological differences, and present the general principles we learned from them. In addition, we present functionalities and initial results on comparing eGRNs to better understand the biological mechanisms underlying each GRN. For example, we show how eGRNs can be used to investigate changes in TF function across pseudotime and trajectories.

B-334: The oncogenic NAB2-STAT6 fusion protein activates an EGR1 transcriptional program in Solitary Fibrous Tumors
Track: RegSys
  • Connor Hill, University of Pennsylvania/The Wistar Institute, United States
  • Alessandro Gardini, The Wistar Institute, United States


Presentation Overview: Show

The pathogenesis of many rare tumor types, such as Solitary Fibrous Tumors (SFTs), is poorly understood, preventing the design of effective treatments. SFTs are a rarely studied tumor of mesenchymal origin found throughout the body, with 10-20% becoming malignant and fatal. The only molecular hallmark of SFTs is a gene fusion between NAB2 and STAT6 that produces a chimeric protein of unknown function called NAB2-STAT6. STAT6 is a transcription factor in the JAK/STAT signaling pathway and is believed to be the primary driver of NAB2-STAT6’s function. However, we found that NAB2-STAT6 functions through the EGR1 transcription factor, a cofactor of NAB2, which plays an important role in neuronal development and the development of many cancers. Using RNA-seq, ChIP-seq, and ATAC-seq, we discovered that NAB2-STAT6 expression in U2OS cells activated EGR1 targets by increasing their chromatin accessibility. Immunoprecipitation of NAB2-STAT6 revealed that the fusion protein interacts with EGR1 and transcriptional activators. We validated our results using RNA-seq and ChIP-seq from primary SFTs and confirmed that an EGR1 signature underlines tumorigenesis in SFTs. Our discovery that NAB2-STAT6 activates EGR1 targets provides new insights into SFT pathogenesis and provides potential targets for developing effective treatments, ultimately leading to improved patient outcomes.

B-335: Pan-cancer analysis uncovers new genes and signatures involved in cuproplasia
Track: RegSys
  • Vu Viet Hoang Pham, University of New South Wales, Australia
  • Toni Jue, Children’s Cancer Institute, Australia
  • Fabio Luciani, University of New South Wales, Australia
  • Jessica Bell, Children’s Cancer Institute, Australia
  • Chelsea Mayoh, Children’s Cancer Institute, Australia
  • Orazio Vittorio, University of New South Wales, Australia


Presentation Overview: Show

Copper is a vital micronutrient involved in many biological processes. Tumour cells require copper as it is an essential component for their growth and migration. Cuproplasia is defined as an aberrant copper-dependent cell-growth and proliferation and copper chelation therapy (CCT) has shown good efficacy in several clinical trials against cancer. Although, the molecular pathways associated with cuproplasia are partially known, the genetic heterogeneity of different cancer types has limited the prediction of how CCT may contribute to patients’ survival. We analysed the expression level of genes involved in cuproplasia by comparing RNA-sequencing data from The Cancer Genome Atlas (TCGA) with the Genotype-Tissue Expression (GTEx). The comprehensive analysis of gene regulatory networks for 23 different cancer types revealed 18-gene signature involved in cuproplasia associated with poor survival, and another 12-gene signature related to higher chances of survival. Interestingly, these gene-signatures showed single-nucleotide variants (SNVs), copy-number variants (CNVs), and differences in methylation levels in many pan-cancer patients. These findings provide evidence that cuproplasia-related genes can be used to build risk score models, to identify patients who could benefit from copper-chelation therapy, and to develop novel targeted therapeutic strategies.

B-336: Computational analysis for functional screens of chromatin factors in normal and malignant hematopoietic development
Track: RegSys
  • Nikolaus Fortelny, University of Salzburg, Austria
  • David Lara-Astiaso, University of Navarra, Spain
  • Brian Huntly, University of Cambridge, United Kingdom
  • Ainhoa Goñi-Salaverri, University of Navarra, Spain
  • Julen Mendieta-Esteban, University of Navarra, Spain


Presentation Overview: Show

BACKGROUND: Cellular differentiation requires extensive alterations in chromatin structure, elicited by chromatin-factors (CFs). However, approaches to functionally dissect this large group of key regulators in a developing system are challenging.

METHODS: Using hematopoiesis as a model-system, we combined bulk and single-cell CRISPR screens in vivo and ex vivo to characterize the role of CFs, functionally screening almost 700 of all 1000 CFs. These analyses represent the first systematic characterization of CF functions in a differentiation setting. To analyze these large and complex data, we developed novel computational analyses to analyze Cas9 versus control non-Cas9 systems, perturbed cluster formation upon knockouts, and blocked differentiation trajectories.

RESULTS: We present broadly applicable experimental and computational methodology for the analysis of bulk and single-cell CRISPR screens of differentiation phenotypes. Based on our methodology, we uncover marked lineage specificities for 142 CFs with 60 CFs validated in detail ex vivo. These analyses reveal functional diversity among related CFs as well as shared roles for unrelated CFs. Studying CFs in leukemia single-cell trajectories, we show that malignant cells corrupt CFs that facilitate differentiation, thus blocking differentiation by engaging in leukemia-specific interactions with transcription factors, which suggest novel therapeutic avenues.

B-337: Bioinformatics analysis of sequence determinants of heterodimeric transcription factor binding
Track: RegSys
  • Maria Osmala, University of Helsinki, Finland
  • Jussi Taipale, University of Cambridge, Department of Biochemistry, United Kingdom


Presentation Overview: Show

Sequence-specific binding of transcription factors (TFs) at non-coding cis-regulatory elements drives expression of the target genes. Single nucleotide variations at TF binding sites can lead to diseases such as cancer. The preferred binding sequences of a TF are represented as a motif or position weight matrix. However, as TFs often co-operate with each other to achieve binding, it is also important to determine motifs for a pair of TFs.

The motifs are derived from short DNA sequences that are known to bind TFs. The short sequences are obtained in vitro using high-throughput exponential enrichment of sequences by systematic evolution of ligands (HT-SELEX). SELEX has characterized 4000 monomeric and (hetero)dimeric binding specificities of TFs. Heterodimeric motifs are identified by consecutive affinity-purification SELEX (CAP-SELEX). The motifs are largely redundant and therefore we identify a smaller representative set of motifs by network analysis and clustering. A new similarity measure required for this analysis is developed and compared to the existing ones.

Genome-wide motif matching finds all occurrences of potential TF binding. The motif matches are the basis for further analysis such as the enrichment of motifs at cell-type-specific cis-regulatory elements, target pathway identification, and comparative genomics analysis.

B-338: Systematic evaluation of Gene Regulatory Networks
Track: RegSys
  • Pau Badia i Mompel, Institute for Computational Biomedicine, Germany
  • Julio Saez Rodriguez, Institute for Computational Biomedicine, Germany


Presentation Overview: Show

Gene regulatory networks (GRNs) model gene regulation by linking transcription factors (TFs) to downstream target genes. The recent single-cell multiomics technologies have sparked a wave of novel computational methods that leverage transcriptomic and chromatin accessibility information to infer GRNs more accurately at an unprecedented resolution. There have been previous benchmarking studies of GRNs (Pratapa et al., 2020; Stone et al., 2021; Marbach et al., 2012) but they are focused on methods that rely only on RNA-seq. Since defining a suitable ground truth to compare to is hard, and GRN methods have their own biases and assumptions, there is a need to find a command ground to benchmark them. Here we propose Gene REgulation neTwork Analysis (GRETA), a python framework to evaluate and analyze GRNs. Inside GRETA, we provide prior knowledge and statistical methods that can be used to evaluate GRNs using a collection of metrics and to also perform downstream analyses such as feature selection, community detection, GWAS and functional enrichment. Leveraging GRETA, we plan to assess novel multiomic GRN methods in different tasks and biological contexts.

B-339: PHLOWER: A powerful trajectory inference method using graph Hodge Laplacian decomposition
Track: RegSys
  • Mingbo Cheng, Institute for Computational Genomics, Germany
  • Jitske Jansen, Institute of Experimental Medicine and Systems Biology, Germany
  • James Nagai, Institute for Computational Genomics, Germany
  • Martin Grasshoff, Institute for Computational Genomics, Germany
  • Rafael Kramann, Institute of Experimental Medicine and Systems Biology, Germany
  • Christoph Kuppe, Institute of Experimental Medicine and Systems Biology, Germany
  • Michael Schaub, Department of Computer Science, Germany
  • Ivan Costa, Institute for Computational Genomics, Germany


Presentation Overview: Show

We propose PHLOWER(graPh Hodge Laplacian inferring trajectOry floWs of cEll diffeRentiation). We take advantage of the decomposition of normalized Hodge 1-Laplacian (Schaub et al., 2020), which can capture the relation between edges and 1-simplex (triangles) of a graph. This Hodge decomposition enable us to define a space on paths, i.e. a point is represented by a differentiation path. This can be used to cluster the paths found in a single cell data into major trajectories. Moreover, the cumulative sum of eigenvectors enable us to detect the branching points of trajectory clusters, which allow us to precisely infer arbitrarily complex trees. Finally, we leverage the tree model from STREAM for visualisation. PHLOWER also retain the cells information of the each branching points and the end leaves, with which we can detect genes and regulatory programs driving changes during the differentiation. PHLOWER was successfully applied in complex multiomic single cell data on hematopoiesis and kidney organoids.

B-340: Multi-omics analysis of HNSCC reveal non-overlapping epigenomic regulation of gene expression
Track: RegSys
  • Katarina Mandić, Ruđer Bošković Institute, Croatia
  • Ksenija Božinović, Ruđer Bošković Institute, Croatia
  • Nina Milutin Gašperov, Ruđer Bošković Institute, Croatia
  • Ivan Sabol, Ruđer Bošković Institute, Croatia
  • Anja Barešić, Ruđer Bošković Institute, Croatia
  • Emil Dediol, Clinical Hospital Dubrava, Croatia
  • Nathaniel Edward Bennet Saidu, Ruđer Bošković Institute, Croatia
  • Jure Krasić, University of Zagreb School of Medicine, Croatia
  • Nino Sinčić, University of Zagreb School of Medicine, Croatia
  • Magdalena Grce, Ruđer Bošković Institute, Croatia


Presentation Overview: Show

Head and neck squamous cell carcinoma (HNSCC) is a common malignancy which demonstrates large diversity, partly due to alternate anatomical sites of origin of the tumor despite arising from a single cell type. Previous studies have shown neither grouping of patients into two classes according to a combination of HPV infection, lifestyle, age, and 5-year survival rate, nor the interdependence of genetics and epigenetics in cancer progression and etiology is not fully understood. This study aims to identify the interplay of microRNA and DNA methylation and their effect on gene expression in HNSCC.
The study cohort consisted of tumor samples from 20 individuals (10 HPV-positive, HPV+), 10 of which were collected from the oropharynx (6 HPV+) and another 10 from the oral cavity (4 HPV+), with 3 tonsil and 8 oral mucosa samples from healthy participants as the control group.
We performed a multi-omics analysis that included transcriptome, miRNome, and methylome investigating the association of the two regulatory mechanisms with gene expression and changes related to HPV infection. Results suggest that differentially-expressed genes are mostly regulated by one mechanism - either microRNAs or DNA methylation. Additionally, expression and methylation patterns show distinct biological pathways for each regulatory mechanism studied.

B-341: Modelling gene regulation for drug target identification
Track: RegSys
  • Mathilde Robin, ICM, Montpellier, France ; Fondazione G. Bonadonna, Milan, Italy ; LIRMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Lisa Héron-Milhavet, IRCM, Univ Montpellier, Inserm, ICM, Montpellier, France, France
  • Alexandre Djiane, IRCM, Univ Montpellier, Inserm, ICM, Montpellier, France, France
  • Céline Gongora, IRCM, Univ Montpellier, Inserm, ICM, Montpellier, France, France
  • Charles Lecellier, IGMM, Univ Montpellier, CNRS, Montpellier, France ; LIRMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Sophie Lèbre, IMAG, Univ Montpellier, CNRS, Montpellier, France ; LIRMM, Univ Montpellier, CNRS, Montpellier, France, France
  • Diego Tosi, ICM, Montpellier, France ; Fondazione G. Bonadonna, Milan, Italy, France
  • Laurent Bréhélin, LIRMM, Univ Montpellier, CNRS, Montpellier, France, France


Presentation Overview: Show

Cell adaptive responses represent an important mechanism mediating cancer cell resistance to chemotherapy. Hence, untangling these responses would provide valuable information to select targeted treatments to complement chemotherapy. In this context, our project aims to identify the signalling cascades that control the cell's response to oxaliplatin, a chemotherapy administered to colorectal cancer (CRC) patients.

We designed a machine learning approach to investigate the link between regulatory sequences (promoters, enhancers) and gene expression measured in CRC cells treated with oxaliplatin. Our model predicts whether a gene is differentially expressed based on a vector of scores reflecting the affinity of transcription factors (JASPAR PWMs database) for its regulatory sequences. Incorporating prior knowledge, we trained a model for each signalling pathway of the KEGG database, using only the PWMs associated with the pathways respective specific TFs. By measuring the accuracy of the different models, we identify the most likely pathways used by the cell in response to oxaliplatin.

Our first results highlight the importance of pathways associated with P53, a classic TF controlling cell death programs after DNA damage, validating our approach. These analyses identified other pathways such as those associated with TEAD, which may provide new opportunities to tackle resistance to oxaliplatin.

B-342: Species-aware DNA language modeling
Track: RegSys
  • Dennis Gankin, Technical University of Munich, Germany
  • Alexander Karollus, Technical University of Munich, Germany
  • Johannes Hingerl, Technical University of Munich, Germany
  • Martin Grosshauser, Technical University of Munich, Germany
  • Kristian Klemon, Technical University of Munich, Germany
  • Julien Gagneur, Technical University of Munich, Germany


Presentation Overview: Show

Predicting gene expression from DNA is hamstrung by the lack of labelled data. Pretraining on unlabelled data using masked language modeling has proven highly successful in overcoming data constraints in natural language and proteomics. However, in genomics, this approach has generally only been applied to single genomes, neither leveraging conservation of regulatory sequences across species nor the vast amount of available genomes. We train a masked language model on more than 800 species spanning over 500 million years of evolution. We show that explicitly modeling species is instrumental in capturing conserved yet evolving regulatory elements and in controlling for oligomer biases. We extract embeddings for 3’ untranslated regions of Saccharomyces cerevisiae and Schizosaccharomyces pombe and achieve mRNA half-life predictions that are better or on-par with the state-of-the-art, demonstrating the utility of the approach for regulatory genomics. Moreover, we show that the per-base reconstruction probability of our model significantly predicts RNA-binding protein bound sites directly. Altogether, our work establishes a self-supervised framework to leverage large genome collections of evolutionary distant species for regulatory genomics and contributes to alignment-free comparative genomics.

B-343: Gene Regulatory Network Inference of Axolotl Bone Regeneration
Track: RegSys
  • Ines Rivero-Garcia, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Spain
  • Anastasia Polikarpova, Research Institute of Molecular Pathology (IMP), Austria
  • Tobias Gerber, European Molecular Biology Laboratory (EMBL), Germany
  • Miguel Torres, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Spain
  • Fátima Sánchez Cabo, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Spain
  • Elly Tanaka, Research Institute of Molecular Pathology (IMP), Austria


Presentation Overview: Show

Regenerative abilities vary considerably within the animal kingdom. The axolotl (Ambystoma mexicanum) regenerates amputated limbs through the formation of a progenitor cell mass, the blastema, largely comprised of dedifferentiated connective tissue (CT) cells from the stump. Conversely, CT in axolotl bone critical-size defects (CSD, >30% of the bone length) fails to form a blastema and therefore to regenerate. The differences between the CT GRNs in these two injuries remain unknown.
We performed a time-series scRNA-seq of axolotl blastema and CSD to uncover injury-specific CT populations. Whole transcriptome GRN inference using transfer entropy revealed the loss of several blastema hubs in CSD, including the Wnt efector LEF1. Early injury Wnt signaling pathway models predicted with deep learning and coupled bulk ATAC-seq and scRNA-seq data unveiled a differential usage of key TFs from this pathway, with LEF1 and TCF7L2 being preferential in blastema and CSD, respectively. Importantly, the LEF1 regulon, built using CNNC and associated with organogenesis, included the matrix remodelers MMP11 and MMP13. Conversely, the TCF7L2 regulon was associated with reduced matrix remodelling. These results suggest that CSD CT fails to rewire its GRN and retains a homeostatic behaviour that might be reconducted by modulating the expression of LEF1 or TCF7L2.

B-345: Predicting transcriptional regulation through genomic context and evolutionary conservation of transcription factor binding sites
Track: RegSys
  • Laura Turchi, Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, Grenoble, France, France
  • Antoine Frenoy, TIMC, Univ. Grenoble Alpes, CNRS, UMR5525, Rond-Point de la Croix de Vie, 38706, La Tronche, France, France
  • Nicolas Thierry-Mieg, TIMC, Univ. Grenoble Alpes, CNRS, UMR5525, Rond-Point de la Croix de Vie, 38706, La Tronche, France, France
  • Romain Blanc-Mathieu, Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, Grenoble, France, France
  • François Parcy, Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, Grenoble, France, France


Presentation Overview: Show

Transcription factors (TFs) control gene expression through binding to specific DNA sequences, called TF binding sites (TFBS). However, the intrinsic binding affinity of a TF for its TFBS does not always translate into in vivo binding and to a transcriptional effect. Therefore, models based on motif recognition can be poor predictors of functional TFBS (i.e. TFBS bound and linked to gene expression changes in vivo). Since (i) different combinations of TFs can influence gene expression and (ii) conserved noncoding sequences can have regulatory functions, we sought to make a model to predict transcriptional regulation by combining genomic context and evolutionary conservation. We focused on the model plant Arabidopsis thaliana and on LEAFY (LFY), the master floral regulator. We combined genomic context and conservation of LFY TFBS with gene expression and DNA binding data to build a classifier that predicts whether LFY regulates a given DNA sequence. We found that random forests trained on this dataset achieve a higher predictive power than classical motif-based models. Using our model, we identified candidate LFY target sites on a new set of sequences. Our approach revealed new potential targets of the LFY master regulator and can be extended to other species and TFs.

B-346: Cell type directed design of synthetic enhancers
Track: RegSys
  • Ibrahim Ihsan Taskiran, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Katina I. Spanier, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Valerie Christiaens, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • David Mauduit, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
  • Stein Aerts, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium


Presentation Overview: Show

Enhancers are the core elements of gene regulatory networks where transcription factors are bound to orchestrate cell-type specific gene expression. We recently studied the regulatory code of enhancers by training AI models on enhancer sequences of neuronal cell types in the Drosophila brain (DeepFlyBrain) and human melanocytes (DeepMEL). However, “what we cannot create, we do not understand”. Here we implemented three different enhancer design strategies: (1) directed sequence evolution; (2) iterative motif implanting; (3) generative design, to generate de novo enhancers with specific spatiotemporal activity patterns for targeted cell types.

The first strategy also proved useful to modify existing genomic sequences, namely: (1) to prune enhancers and making them specific to only one cell-type; (2) to augment enhancers making them active in multiple chosen cell types; and (3) to rescue “lost” enhancers that only have partial enhancer codes.

For each strategy, we evaluated synthetic enhancers in vivo using transgenic flies, and in vitro human cell culture. We investigated the explainability of each method and compared in detail the engineered enhancers with genomic enhancers. In conclusion, enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.

B-347: A gene signature-based approach for predicting cell type and spot composition in single-cell and spatial transcriptomic data
Track: RegSys
  • Ella Goldschmidt, Tel Aviv University, Israel
  • Ayelet Kaminitz, Tel Aviv University, Israel
  • Elhanan Borenstein, Tel Aviv University, Israel
  • Asaf Madi, Tel Aviv University, Israel


Presentation Overview: Show

A key challenge in single-cell data analysis is identifying the type of each cell based on measured expression of various genes. Cell populations are commonly identified by manually annotating cell clusters based on established marker genes. This process and the assessment of the resulting pattern for each gene are time-consuming. Moreover, recent gene expression technologies, which measure transcriptomic profiles with spatial context, pose an even more daunting challenge. Specifically, these technologies result in two matrices: expression matrix listing gene expression for each spot, and a spatial locations matrix. In contrast to single-cell data, in spatial data, there is a need to identify both the cell types present in each spot and their abundances. To address these challenges, we first developed a signature projection framework that classifies cell clusters based on their respective marker genes in a given tissue. The algorithm achieved 96.46% accuracy in identifying the appropriate cell type. We then set out to use this signature-based approach to infer the cell types and abundances in every location (spot) in spatial transcriptomic data. On simulated data, we achieved an average Pearson correlation of 0.822 across cell types and 0.795 across simulated spots between predicted and actual cell type compositions.

B-347: A gene signature-based approach for predicting cell type and spot composition in single-cell and spatial transcriptomic data
Track: RegSys
  • Ella Goldschmidt, Tel Aviv University, Israel
  • Ayelet Kaminitz, Tel Aviv University, Israel
  • Elhanan Borenstein, Tel Aviv University, Israel
  • Asaf Madi, Tel Aviv University, Israel


Presentation Overview: Show

A key challenge in single-cell data analysis is identifying the type of each cell based on measured expression of various genes. Cell populations are commonly identified by manually annotating cell clusters based on established marker genes. This process and the assessment of the resulting pattern for each gene are time-consuming. Moreover, recent gene expression technologies, which measure transcriptomic profiles with spatial context, pose an even more daunting challenge. Specifically, these technologies result in two matrices: expression matrix listing gene expression for each spot, and a spatial locations matrix. In contrast to single-cell data, in spatial data, there is a need to identify both the cell types present in each spot and their abundances. To address these challenges, we first developed a signature projection framework that classifies cell clusters based on their respective marker genes in a given tissue. The algorithm achieved 96.46% accuracy in identifying the appropriate cell type. We then set out to use this signature-based approach to infer the cell types and abundances in every location (spot) in spatial transcriptomic data. On simulated data, we achieved an average Pearson correlation of 0.822 across cell types and 0.795 across simulated spots between predicted and actual cell type compositions.

B-348: Whole-body coordination pattern of metabolic transcriptomes
Track: RegSys
  • Ehud Sussman, University of Haifa, Israel
  • Danna Mor, University of Haifa, Israel
  • Judith Somekh, University of Haifa, Israel


Presentation Overview: Show

Motivation: Whole-body physiological homeostasis is pivotal for maintaining human health and is acquired by a complex cross-tissue co-regulation. Detecting the general system-level coordination pattern of transcriptomes at the whole-body level is pivotal for understanding health and disease.
Methods: We developed a methodology encompassing dimensionality reduction of co-expression modules to calculate the levels of inter-tissue coordination of functionally similar transcriptomes at a whole-body level. We applied our approach to metabolic transcriptomes across 19 tissues from two human cohorts derived from the GTEx project. Our methodology includes co-expression module generation, annotation, dimensionality reduction, randomization tests, and community analysis to evaluate the degree of coordination of metabolic co-expression modules across the whole human body.
Results: We generated and annotated 609/615 modules for cohort1 and cohort2, respectively, out of which 40/50 were metabolic. Metabolic modules exhibited significantly more inter-tissue positive correlations and a higher ratio of positive-to-negative correlations than the randomly chosen modules and formed larger communities.
Conclusion: Our approach demonstrates a general inter-tissue co-regulation pattern of metabolic transcriptomes across the whole body and can serve as a valuable tool for detecting coordination patterns across systems and conditions.

B-349: UniversalEPI: An Attention-based Method to Predict Chromatin Interactions in Unseen and Rare Cell Types
Track: RegSys
  • Aayush Grover, Department of Computer Science, ETH Zürich, Switzerland
  • Simeon Häfliger, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland
  • Tuna Acisu, Technical University of Munich, Germany
  • Felix Tockner, Department of Computer Science, ETH Zurich, Switzerland
  • Ignacio Ibarra, Institute of Computational Biology, Computational Health Center, Helmholtz Zentrum Muenchen, Germany
  • Valentina Boeva, Department of Computer Science, ETH Zurich, Switzerland


Presentation Overview: Show

Non-coding mutations represent more than 95% of all cancer variants. Most of these variants do not affect the transcription rate of proximal genes and are therefore overlooked in cancer studies. However, a small proportion of non-coding variants may have vast consequences on gene regulation through modifications in the structure of chromatin and/or modulation of proximal and distal regulatory elements. Therefore, prediction of the chromatin structure in a specific tissue or cancer type is an important task and is crucial for our understanding of the mechanisms of cancer development and progression. While several methods have been shown to accurately predict the chromatin structure for a given cell type, little focus has been given to being able to predict the chromatin structure in unseen cell types. Here, we propose an attention-based deep neural model, called UniversalEPI, that can accurately predict the interactions between regulatory elements in an unseen cell type. We show that UniversalEPI captures transcription factors that are important for chromatin organization accurately and can be used along with open chromatin information to robustly predict chromatin interactions in unseen cell types. This in-silico 3D-modelling of DNA represents a crucial step in evaluating the role of mutational processes in different cancer types.

B-350: Simulating scRNA-seq using causal generative adversarial networks
Track: RegSys
  • Seyed Yazdan Zinati, McGill University, Canada
  • Abdulrahman Takiddeen, McGill University, Canada
  • Amin Emad, McGill University, Canada


Presentation Overview: Show

We present GRouNdGAN, a gene regulatory network (GRN)-guided causal implicit generative model for simulation of single-cell RNA-seq data, in-silico perturbation experiments, and benchmarking of GRN inference methods. Through the imposition of a user-defined GRN, describing TF-gene regulatory interactions, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating TFs. Our model is trained on a reference single-cell dataset; it captures non-linear TF-gene dependence and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise with no user manipulation and only implicit parameterization. Despite imposing a rigid constraint on causality, GRouNdGAN outperforms state-of-the-art simulators in generating realistic cells by incorporating domain knowledge through the GRN. GRouNdGAN learns meaningful causal regulatory dynamics allowing sampling from both interventional and observational distributions to synthesize cells under conditions that do not occur in the dataset at inference time, allowing performing in-silico TF knockout and perturbation experiments. Interactions imposed through the GRN are emphasized, resulting in GRN inference algorithms assigning them higher scores than edges not imposed but of equal importance. We used these properties to benchmark various GRN inference algorithms, including those that utilize the concept of pseudo-time.

B-351: BootTHiC: Integrating HiC and transcriptomics to detect transcriptional hubs
Track: RegSys
  • Vipin Kumar, NCMM, University of Oslo, Norway
  • Fabrizio Guidotti, University of Bologna, Italy
  • Anthony Mathelier, NCMM, University of Oslo, Norway


Presentation Overview: Show

The regulatory mechanisms enabling the genome’s transcriptional agility producing the variety of tissue functions observed in eukaryotes remains an active field of study. In particular, reconciling this complex behaviour with the conventional linear representation of the genome highlights the limits of this narrow description.
The broader spatial reconfiguration the genome goes through to produce adequate transcription or upon disease onset recently highlighted the regulatory importance of genome architecture.
In this study we introduce BootTHiC, a method integrating HiC data and CAGE sequencing to detect candidate transcriptional hubs, we believe constitute a decisive locus of gene regulation.
We support the biological relevance of the detected trancriptional hubs by observing their enrichment in cell-line specific genes,significantly more active enhancers and downregulated genes when compared to matching cancer cell-lines. Put together these observations indicate a likely contribution of our transcriptional hubs in determining cell identity.
We provide BootTHiC as a snakemake pipeline.

B-352: Profiling the three-dimensional genomic networks essential for myeloid cell differentiation
Track: RegSys
  • Sandra Deliard, The Wistar Institute, United States
  • Alessandro Gardini, The Wistar Institute, United States


Presentation Overview: Show

The organization and dynamics of enhancer networks are key components in regulating myeloid cell differentiation. However, the underpinnings of the 3D chromatin conformation that mediates differentiation are not thoroughly examined. We aimed to understand enhancer regulation upon monocytic differentiation, by employing Capture Hi-C to examine the chromatin looping to the CSF1R gene locus. Further, we mutated enhancer elements located in the CSF1R gene then assessed genome-wide protein binding and chromatin interactions via ChIP-Seq and Capture Hi-C, respectively in myeloid cells. After differentiating monocytes to macrophages, we detected an increase in chromatin interactions to CSF1R. These interactions were enriched for transcription factor EGR1 in addition to known chromatin looping modulators such as CTCF. The deletion of just one enhancer in CSF1R led to loss of EGR1 binding and disruption of the regulatory element. This was accompanied by a significant loss of DNA looping to CSF1R and dysregulation of CSF1R and neighboring gene expression. These data demonstrate the critical role of this enhancer in maintaining a network for controlling the transcriptional and functional changes required for differentiation and effective inflammatory response at the site of infection.

B-353: Somatic short tandem repeat mutations may regulate gene expression in colorectal cancer
Track: RegSys
  • Max Verbiest, Zurich University of Applied Sciences, Switzerland
  • Oxana Lundström, Stockholm University, Sweden
  • Feifei Xia, Zurich University of Applied Sciences, Switzerland
  • Tugce Bilgin-Sonay, Columbia University, United States
  • Maria Anisimova, Zurich University of Applied Sciences, Switzerland


Presentation Overview: Show

Colorectal cancer (CRC) is caused by (epi-)genetic alterations. Short tandem repeats (STRs) are among the most variable loci in the human genome. STRs are involved with complex traits and regulate gene expression in healthy tissue. While STRs are used clinically to stratify CRC patients, it is unclear whether somatic STR mutations affect gene expression in cancer.

Therefore, we investigated STR mutations in CRC patients from The Cancer Genome Atlas (TCGA). We find that STR mutability in CRC depends on the repeat motif size and the repeat copy number. Patients with defective DNA mismatch repair show elevated STR mutation rates. We present 1281 putative expression STRs (eSTRs) for which the STR copy number correlates with gene expression in CRC tumors. The copy number of 70 eSTRs correlates with the expression of cancer-related genes, 11 of which are CRC-specific genes. Using our eSTRs, we can predict gene expression changes in response to eSTR mutations.

Summarizing, we increase our understanding of the functional impact of STR mutations in CRC. Our evidence of gene regulatory roles for eSTRs in CRC represents a new way through which STR mutations may influence cancer. This could lead to new, STR-based targets in the treatment of cancer.

B-354: Analyzing cell-cell dysregulations in healthy and impaired bone marrow hematopoietic niche
Track: RegSys
  • Maksim Kholmatov, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
  • Karin Prummel, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
  • Kevin Woods, German Cancer Research Institute (DKFZ), DKTK Frankfurt/Mainz, Germany
  • Borhane Guezguez, German Cancer Research Institute (DKFZ), DKTK Frankfurt/Mainz, Germany
  • Judith Zaugg, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany


Presentation Overview: Show

In humans, adult haematopoiesis occurs in the bone marrow (BM) niche, where haematopoietic stem and progenitor cells (HSPCs) differentiate to produce mature blood cells. BM niche cells such as mesenchymal stromal cells (MSCs) and mature T-cells play an important role in regulating haematopoiesis. Upon ageing clonal expansion among HSPCs can cause clonal heamatopoiesis (CH) and myelodysplastic syndromes (MDS), but the role of BM niche cells in this process remains unknown.
We analyzed the single cell landscape of HSPC, MSC and T-cell populations from a cohort of 10 BM biopsies (3 Healthy, 3 CH and 4 MDS) using scRNA-seq. We observe significant alterations in cell type composition within each population, where BM of CH/MDS displays myeloid-skewing of HSPCs, surge of stress-induced MSCs and accumulation of cytotoxic T-cells. Using across-individual correlation of gene expression between different cell types we identify modules of co-expression. These modules are then analyzed for enrichment of signalling pathways as well as transcription factor regulons identified using SCENIC. This joint analysis allows us to explore the mechanisms involved in co-regulation between HSPCs, MSC and T-cells and their dysregulation during CH/MDS. These results highlight the cross-talk in the BM niche in healthy and impaired haematopoiesis.

B-355: Multi-omics profiling unveils regulatory subgroups of PAH-associated right ventricular remodeling
Track: RegSys
  • Fatemeh Khassafi, Max Planck Institute for Heart and Lung Research, Germany
  • Kristina Müller, Max Planck Institute for Heart and Lung Research, Germany
  • Prakash Chelladurai, Max Planck Institute for Heart and Lung Research, Germany
  • Olivier Boucherat, Department of Medicine, Laval University, Quebec, Canada
  • Aryan Kamal, European Molecular Biology Laboratory, Heidelberg, Germany
  • Stefan Günther, Max Planck Institute for Heart and Lung Research, Germany
  • Rajkumar Savai, Max Planck Institute for Heart and Lung Research, Germany
  • Baktybek Kojonazarov, Justus-Liebig University, Giessen, Germany
  • Sébastien Bonnet, Department of Medicine, Laval University, Quebec, Canada
  • Mario Looso, Max Planck Institute for Heart and Lung Research, Germany
  • Soni Pullamsetti, Max Planck Institute for Heart and Lung Research, Germany


Presentation Overview: Show

The right ventricle (RV) plays a crucial role in the functional outcome of Pulmonary Arterial Hypertension (PAH) patients, while its underlying mechanisms are poorly understood. Here, we first identified different subgroups of RV conditions in PAH through an unsupervised RNAseq analysis of human samples, which the decompensated subgroups were later labeled as ‘early’ and ‘late’, based on their similar expression with an animal model of RV remodeling (MCT-induced PAH rats). Furthermore, by integration with an independent plasma proteome, we introduced a panel of five proteins that significantly classify patients with early and late decompensated RV. Secondly, we performed multi-omic profiling to investigate the longitudinal alterations in PAH-associated RV of MCT rats. We could validate fibroblast to myofibroblasts transition in the early stages of maladaptive RV, by identifying different subtypes of fibroblasts using a combination of both modalities, in which a family of TFs (FOSL1-FOSL2) found as one of the key master regulators. This comprehensive multi-omics identified new subgroups of human RV hypertrophy beyond the clinical measurements, while the combination with proteome proposed a panel of potential biomarkers for PAH. Moreover, the integrative single-cell approach revealed significant upstream regulatory complexes that govern the early to late transition in PAH RV remodeling.

B-356: Learning regulatory motifs linked with cell transitions in single-cell genomics
Track: RegSys
  • Ignacio Ibarra, Helmholtz Zentrum München, Germany
  • Johanna Schneeberger, Helmholtz Zentrum München, Germany
  • Ege Erdogan, Helmholtz Zentrum München, Germany
  • Dominik Klein, Helmholtz Zentrum München, Germany
  • Laura Martens, Helmholtz Zentrum München, Germany
  • Hananeh Aliee, Helmholtz Zentrum München, Germany
  • Fabian Theis, Helmholtz Zentrum München, Germany


Presentation Overview: Show

Mubind uses DNA sequences and kNN graphs to accurately predict single-cell genomics counts, filters (motifs), and their activities linked with cell-to-cell relationships, identifying filter activities that connect cell states in a unique optimization task. We demonstrate applications on several scATAC-seq datasets, showcasing desirable running time, memory usage, and robust metrics. Interpretation of learned models showcases how mubind improves the understanding of single-cell transitions.

B-357: Active repression of alternative cell fates safeguards hepatocyte identity and prevents liver tumorigenesis
Track: RegSys
  • Aryan Kamal, EMBL, Germany
  • Bryce Lim, DKFZ, Germany
  • Juan Adrian Segarra, DKFZ, Germany
  • Ignacio Ibarra, EMBL, Germany
  • Judith Zaugg, EMBL, Germany
  • Moritz Mall, DKFZ, Germany


Presentation Overview: Show

Maintaining a stable cell identity requires suppressing inappropriate transcriptional programs. The current dogma suggests that this is achieved through passive epigenetic silencing. Here, we propose that active transcriptional repression by safeguard repressors is crucial for lifelong cell fate stability. This process prevents the loss of cell identity and errors that may lead to developmental disorders or cancer. To support our proposition, we devised a strategy to identify safeguard repressor candidates using single-cell RNA-seq and transcription factor motif data in eighteen cell types from all germ layers. Then, to investigate whether this mechanism could prevent diseases associated with plasticity, such as cancer, we overexpressed one of the safeguard repressor candidates for hepatocytes in an in vivo model of hepatocarcinoma. We found that exogenous overexpression entirely blocked tumor initiation and improved survival. To understand the mechanism, we used direct cellular reprogramming to assess whether the candidate affects cell fate plasticity. Indeed we found that overexpression of the candidate repressed alternative cell fates by targeting - and thereby repressing - master regulators of several alternative fates. In summary, our findings suggest that cell type-specific safeguard repressors maintain lineage commitment by actively repressing alternative identities.

B-358: CDState: Reference-free inference of intratumor transcriptional heterogeneity from bulk RNA-seq cancer data
Track: RegSys
  • Agnieszka Kraft, Institute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland; Department of Thoracic Surgery, University Hospital Zurich, Zurich, Switzerland, Switzerland
  • Laure Ciernik, nstitute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Switzerland
  • Jiayi Wang, Institute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Switzerland
  • Josephine Yates, Institute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Switzerland
  • Florian Barkmann, Institute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Switzerland
  • Mayura Meerang, Department of Thoracic Surgery, University Hospital Zurich, Zurich, Switzerland, Switzerland
  • Michaela B Kirschner, Department of Thoracic Surgery, University Hospital Zurich, Zurich, Switzerland, Switzerland
  • Isabelle Opitz, Department of Thoracic Surgery, University Hospital Zurich, Zurich, Switzerland, Switzerland
  • Valentina Boeva, Institute for Machine Learning, Department of Computer Science, ETH Zurich, Zurich, Switzerland, Switzerland


Presentation Overview: Show

Intratumor transcriptional heterogeneity has been recognized as one of the key drivers of cancer progression, metastasis and treatment resistance. Recent studies in several cancer types, including melanoma and glioblastoma, identified populations of malignant cells exhibiting heterogeneous expression of known hallmarks of cancer, and associated with varied response to anticancer therapy. These studies highlight the need for an accurate assessment of malignant cell phenotypes and abundance to develop effective treatments.
Here, we introduce CDState: a Nonnegative Matrix Factorization (NMF) model which discovers and enumerates source cell states in tumor bulk RNA sequencing data. By extending NMF with sum-to-one constraint on weight matrix and random walk hyperplane search, CDState identifies malignant and nonmalignant cell states along with their proportions in a fully unsupervised manner. To facilitate biological interpretation of malignant cell states, CDState links their expression profiles with the known hallmarks of cancer. We validate CDState using ""bulkified"" data from publicly available single-cell RNA sequencing datasets of glioblastoma, esophageal, lung, and colorectal cancer patients. Finally, we apply CDState to 29 cancer datasets from The Cancer Genome Atlas to evaluate the extent of intratumor transcriptional heterogeneity on a pan-cancer scale and identify potential genetic drivers of the identified malignant cell states.

B-359: Applying machine learning algorithms to single cell multiome acute exercise data identifies cross-ome regulated molecular circuits within individual cell types
Track: RegSys
  • Gregory Smith, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Zidong Zhang, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Xi Chen, Flatiron Institute, United States
  • Aliza Rubenstein, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Wan Sze Cheng, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Frederique Ruf-Zamojski, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Toby Chambers, Ball State University, United States
  • Maria Chikina, University of Pittsburgh, United States
  • Elena Zaslavsky, Icahn School of Medicine at Mount Sinai Hospital, United States
  • Olga Troyanskaya, Princeton University, United States
  • Scott Trappe, Ball State University, United States
  • Stuart Sealfon, Icahn School of Medicine at Mount Sinai Hospital, United States


Presentation Overview: Show

Understanding the effects of exercise and how to tailor exercise regimens to individual needs is critical in resisting the rise of obesity and heart disease. This requires a deeper knowledge of the molecular response to exercise and how that varies between and within individuals. Human tissue is a mixture of cell types with distinct functions and gene expression patterns. The response to an external stimulus, such as acute exercise, produces a heterogeneous response even within the same cell type. Single cell multiomics data affords incredible resolution of both the gene expression and chromatin accessibility for individual cells enabling inference of linked responses over both omes. We generated single cell multiomic data from human vastus lateralis tissue before and after an acute exercise bout and implemented computational algorithms PLIER and MAGICAL to find linked cross-ome exercise responses. We identified instances of up-regulation and down-regulation reflecting both promoter-driven and distal regulation processes. We connected conserved gene-peak associations with known pathways and enriched transcription factors, highlighting transcription factor RORC, whose targets are enriched for exercise up-regulation. This combined analysis highlights the ability to use multiomic data to unravel complex regulatory patterns and develop a deeper understanding of the mechanistic responses to human exercise.

B-360: Joint reanalysis of genetic variation affecting mRNA and protein abundance
Track: RegSys
  • Krisna Van Dyke, University of Minnesota Department of Genetics, Cell Biology, and Development, United States
  • Matthew Feraru, University of Minnesota Department of Genetics, Cell Biology, and Development, United States
  • Frank Albert, University of Minnesota Department of Genetics, Cell Biology, and Development, United States


Presentation Overview: Show

Phenotypic variation is shaped by DNA variation that influences gene expression. Expression quantitative trait loci that alter mRNA abundance (eQTLs) or protein abundance (pQTLs) have been mapped several times in a cross between the BY and RM strains of Saccharomyces cerevisiae in a series of studies that independently quantified mRNA and/or protein. Pairwise comparisons among these studies suggest considerable discrepancies between genetic influences on mRNA vs protein. However, these studies used different analysis methods, leaving it unclear to what extent discrepancies between eQTLs and pQTLs may be inflated by technical variation.

Here, we implement standardized, joint analyses that leverage the statistical power provided by these multiple mRNA and protein datasets. First, we are applying an advanced mapping pipeline to all datasets. This pipeline improves signal via mixed models that correct for polygenic effects and strong QTLs on other chromosomes. This algorithm will likely discover new QTLs previously missed in older datasets. Second, we will develop a mapping method that integrates the multiple mRNA and protein datasets into one joint analysis. This analysis will improve power to detect QTLs by using all available data jointly, and provide precise and rigorous estimates of the mRNA or protein specificity of individual QTLs.

B-361: Sex-specific and multiomic integration enhance accuracy of peripheral blood biomarkers of major depressive disorder
Track: RegSys
  • Amazigh Mokhtari, NeuroDiderot, Inserm U1141, Université de Paris, France
  • El Cherif Ibrahim, Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone, Marseille, France
  • Arnaud Gloaguen, Centre National de Recherche en Génomique Humaine, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
  • David Cohen, NeuroDiderot, Inserm U1141, Université de Paris, France
  • Margot Derouin, NeuroDiderot, Inserm U1141, Université de Paris, France
  • Claire-Cécile Barrot, NeuroDiderot, Inserm U1141, Université de Paris, France
  • Hortense Vachon, Aix-Marseille Université, INSERM, TAGC, Marseille, France
  • Guillaume Charbonnier, Aix-Marseille Université, INSERM, TAGC, Marseille, France
  • Béatrice Loriod, Aix-Marseille Université, INSERM, TAGC, Marseille, France
  • Ipek Yalcin, CNRS, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, Strasbourg, France
  • Cynthia Marie-Claire, Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, Paris, France
  • Bruno Etain, Université de Paris, INSERM UMR-S 1144, Optimisation thérapeutique en neuropsychopharmacologie, OTeN, Paris, France
  • Raoul Belzeaux, Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone, Marseille, France
  • Andrée Delahaye-Duriez, NeuroDiderot, Inserm U1141, Université de Paris, France
  • Pierre-Eric Lutz, CNRS, Université de Strasbourg, Institut des Neurosciences Cellulaires et Intégratives UPR 3212, Strasbourg, France


Presentation Overview: Show

Major depressive disorder (MDD) is a leading cause of disability worldwide. Its heterogeneity and the variety of environmental factors that are implicated contribute to explain why identifying reliable molecular biomarkers has proved difficult.
To address these challenges, here we implemented a multiomic integrative framework designed to extract peripheral blood molecular signatures of MDD, and quantify the value of combining several types of omic data for classifying patients and controls. To do so, we analyzed the transcriptome (RNA-Sequencing), micro-RNAs (small RNA-sequencing) and DNA methylation (EPIC arrays) in a cohort of 80 MDD individuals and 89 healthy controls.
We first conducted comparisons for each omic layer, in order to identify shared and sex-specific signatures of MDD. Then, we implemented 6 methods for joint dimensionality reduction (using Momix and SNF), and identified RNAs, micro-RNAs, and DNA methylation sites that best discriminated between patients and controls. Across methods, higher accuracy was observed when analyses were conducted separately in males and females, and when increasing the number of omic layers used, from 1 to 3.
Towards the development of clinically useful MDD biomarkers, our results, therefore, highlight the importance of accounting for sex, and illustrate the informational gain that may result from multiomic strategies.

B-362: Accurate estimation and correction of open chromatin and intrinsic biases in bulk and single-cell CUT&Tag data
Track: RegSys
  • Shengen Shawn Hu, University of Virginia, United States
  • Qingying Chen, University of Virginia, United States
  • Megan Grieco, University of Virginia, United States
  • Lin Liu, Shanghai Jiao Tong University, China
  • Chongzhi Zang, University of Virginia, United States


Presentation Overview: Show

The accurate detection of transcription factor (TF) binding sites and histone modifications (HM) on a genome-wide scale is essential for studying functional genomics and gene regulation. Cleavage Under Targets & Tagmentation (CUT&Tag) is a low-cost and easy-to-implement epigenomic profiling method that can be performed on a low number of cells and on the single-cell level. CUT&Tag experiments use the hyperactive transposase Tn5 for tagmentation. We find that Tn5 is subject to intrinsic sequence insertion bias (intrinsic bias). Additionally, preference of Tn5 insertion toward accessible chromatin regions also affects the distribution of CUT&Tag reads (open chromatin bias). Both of these biases can significantly confound the analysis of CUT&Tag data, requiring careful assessment and new analytical methods. High sparsity of single-cell data makes the effect of both biases more substantial compared to bulk data. To address these challenges, we present a statistical model for systematic characterization and correction of open chromatin and intrinsic sequence biases in CUT&Tag data. This work is the first of its kind and has paved the way for further development of bioinformatics tools for improving both bulk and single-cell CUT&Tag data analysis.

B-363: Clustered pattern of transcription factor binding at phase-separated transcriptional condensates
Track: RegSys
  • Zhenjia Wang, University of Virginia, United States
  • Shengyuan Wang, University of Virginia, United States
  • Chongzhi Zang, University of Virginia, United States


Presentation Overview: Show

Transcription factors (TFs) and coactivators have been shown to bind at super-enhancers and form transcriptional condensates, activating transcription in various cellular systems including development and cancer. However, the genomic and epigenomic determinants of phase-separated transcriptional condensate formation remain poorly understood. Here we use an integrative computational approach to systematically analyze DNA sequence features and TF binding profiles across cell types to identify the molecular features that contribute to the formation of transcriptional condensates. We find that most TF motif sequences exhibit a clustered pattern, distinct from a random distribution, and are enriched at potential super-enhancer (SE) regions in the genome. TF binding sites are further clustered in the genome and enriched at cell type-specific SEs. TFs with highly clustered patterns have a high potential for liquid-liquid phase separation. Densely clustered TF binding sites are more enriched at cell type-specific SEs with higher chromatin accessibility, higher chromatin interaction, and higher association with cancer outcome. These results suggest that the clustered pattern of TF binding and phase separation properties of TF proteins collectively affect transcription condensate formation at super-enhancers.

B-364: Distal cis-regulatory network inference from single cell accessibility maps uncovers regulatory hubs in cellular reprogramming
Track: RegSys
  • Spencer Halberg-Spencer, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States
  • Shilu Zhang, Wisconsin Institute for Discovery, United States
  • Stefan Pietrzak, Wisconsin Institute for Discovery, United States
  • Rupa Sridharan, Wisconsin Institute for Discovery, Department of Cell and Regenerative Biology University of Wisconsin-Madison, United States
  • Sushmita Roy, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, United States


Presentation Overview: Show

Cis-regulatory interactions that specify which regulatory sequence elements control a gene’s expression, play a major role in defining the structure of these GRNs. Such interactions can occur between regulatory elements in the gene’s promoter as well in distally located sequence units such as enhancers. The availability of single cell transposase accessible chromatin sequencing (scATAC-seq) assays offers unique opportunities for the identification cell type specific cis-regulatory interactions. We present scCisInt, which combines sparse non-negative matrix factorization and non-linear regression to predict gene-specific cis-regulatory elements within the 1MB radius of a gene.

We applied scCisInt to examine the distal regulatory interaction landscape during mouse cellular reprogramming from mouse embryonic fibroblasts (MEFs) to pluripotent stem cells. scCisInt interactions recapitulate small-scale interactions of well-characterized loci and show good overlap with significant interactions from existing Hi-C datasets. scCisInt predictions identified several regulatory sequence hubs that regulate multiple genes. We performed in vivo validation of a novel enhancer hub predicted to repress reprogramming efficiency using CRISPRi. Repression of activity of this hub improves reprogramming efficiency and additionally significantly impacts the expression of its predicted target genes. Our work demonstrates that scCisInt may be a powerful tool for uncovering cis-regulatory interaction from single cell accessibility datasets.

B-365: Histone H1 deficiency leads to aggressive B-cell lymphomas by disrupting 3D chromatin architecture
Track: RegSys
  • Ceyda Durmaz, Weill Cornell Medicine, United States
  • Antonin Papin, Weill Cornell Medicine, United States
  • Matthew Teater, Weill Cornell Medicine, United States
  • Cem Meydan, Weill Cornell Medicine, United States
  • Viviana Risca, Rockefeller University, United States
  • Alexey Soshnev, University of Texas at San Antonio, United States
  • Ethel Cesarman, Weill Cornell Medicine, United States
  • Christopher Mason, Weill Cornell Medicine, United States
  • Ari Melnick, Weill Cornell Medicine, United States


Presentation Overview: Show

Histone H1 proteins, critical for assisting chromatin to fold into higher-order structures, acts as a transcriptional repressor by limiting chromatin accessibility and are depleted from actively transcribed regions. While H1 is mutated in many human cancers, B-cell lymphomas have the highest frequency, where H1 variants B through E are highly recurrent. In a recent study, we showed that H1 variants act as genetic driver mutations in lymphomas, with the most mutated variants being H1C and H1E. However, it is unknown how these mutations drive lymphoma programs. The dynamic folding of chromatin acts as a key regulatory layer of gene expression and cell identity by allowing select chromatin regions to facilitate spatiotemporal rewiring of enhancers-promoters and transcription. Regulatory hubs, highly interactive regions (often enhancers) that can interact with multiple genes, help to coordinate activation of genes at a higher probability compared to pairs of non-interacting genes. Here, we integrate RNA-seq, ATAC-seq, ChIP-seq, HiC, and promoter-capture HiC data from mice modeling the MCD subtype with H1c+/-H1e+/- and H1c-/-H1e-/- deficiencies to show that compartment reorganization by H1 deficiency alters the wiring of promoter interactions with regulatory elements to alter gene expression by hijacking stem cell genes to for aberrant hubs.

B-366: Considerations and comparisons of SNP enrichment tools to link GWAS SNPs to disease-relevant cell types by open chromatin regions
Track: RegSys
  • Rachel M. Moss, University of Minnesota, United States
  • Lauren J. Mills, University of Minnesota, United States
  • Logan G. Spector, University of Minnesota, United States


Presentation Overview: Show

Identifying the cell of origin of a cancer is crucial for understanding its underlying molecular mechanisms, establishing model systems, and developing effective treatments. Our study compares different algorithms to identify disease-relevant cell types using SNP enrichment from GWAS data in open chromatin regions, specifically ATAC-seq peaks. The algorithms we tested included goShifter, CHEERS, and stratified linkage disequilibrium score regression across various cancer and phenotype GWAS summary statistics. Although all algorithms were able to identify a known cell of origin for some cancers or phenotypes, the results were inconsistent. These results highlight the challenges of using GWAS SNP enrichment with ATAC-seq. While the approach holds promise for identifying an unknown cell of origin and linking disease associated SNPs to specific genes, our results suggest that the accuracy and reliability of the results may be affected by the choice of algorithm, the size of the GWAS dataset, normalization of ATAC-seq quality across different experiments, the choice of cell-type specific ATAC-seq peak sets, and the mutational complexity of the cancer itself. Future research is needed to develop more accurate and reliable methods for identifying the cell of origin in cancer using ATAC-seq data to enhance the specificity and sensitivity of the analysis.

B-367: Integrative chromatin state annotation of cis-regulatory elements in ENCODE4 reference cell types
Track: RegSys
  • Marjan Farahbod, School of Computing Science, Simon Fraser University, Canada
  • Abdul Rahman Diab, School of Computing Science, Simon Fraser University, Canada
  • Paul Sud, School of Medicine, Stanford University, United States
  • Mehdi Foroozandeh, School of Computing Science, Simon Fraser University, Canada
  • Ishan Goel, School of Computing Science, Simon Fraser University, Canada
  • Habib Daneshpajouh, School of Computing Science, Simon Fraser University, Canada
  • Meenakshi Kagda, School of Medicine, Stanford University, United States
  • Ian Whaling, School of Medicine, Stanford University, United States
  • Benjamin Hitz, School of Medicine, Stanford University, United States
  • J. Michael Cherry, School of Medicine, Stanford University, United States
  • Maxwell Libbrecht, School of Computing Science, Simon Fraser University, Canada


Presentation Overview: Show

Genome annotation has application in interpreting GWAS variants. It is also a primary step in identifying genomic regulatory elements and untangling gene regulatory mechanisms. Obtaining high quality annotations for various human celltypes is an ongoing effort.
Segmentation and genome annotation (SAGA) methods such as Segway are widely used to gain an integrative understanding of genomic activity.
Here we are presenting a new set of Segway annotations. In this new annotations, we have updated the interpretation terms to better capture the known genomic functions based on data characteristics, for over 300 (fix number) samples using data from 2400 (fix number) sequencing assays. We show that Segway annotations can capture biological signal. We demonstrate how
We discuss the differences between the two SAGA methods available for ENCODE data, and show that despite their differences, both ChromHMM and Segway capture meaningful regulatory and transcribed regions in the genome, with a prediction power. We also demonstrate how differences in model outputs can lead to better understanding of the genomic data and tissue specific behavior of the regulatory regions.

B-368: motifReg: relating transcription factor binding specificity to chromatin accessibility using supervised regression analysis
Track: RegSys
  • Zhengqiao Zhao, Princeton University, United States
  • Yuri Pritykin, Princeton University, United States


Presentation Overview: Show

Single-cell ATAC-seq (scATAC-seq) has been widely used to profile chromatin accessibility and study overall epigenomic and regulatory state of heterogeneous cell populations. Furthermore, differential transcription factor (TF) binding is often associated with differential chromatin accessibility. Therefore the additional power of scATAC-seq analysis is in the ability to leverage prior knowledge about TF binding specificities, typically available in the form of positional weight matrices or motifs.
Here we present a supervised motif regression approach, motifReg, for analysis and interpretation of scATAC-seq data. We comprehensively compare motifReg with other popular methods for motif and sequence analysis in scATAC-seq data, SCBasset and chromVAR, and demonstrate benefits of using motifReg. We show that motifReg can efficiently analyze large-scale scATAC-seq datasets and learn interpretable associations between TF motifs and the accessible peaks. The resultant lower-dimensional cell representations facilitate downstream analyses of scATAC-seq data.
Furthermore, our supervised formulation allows us to prioritize TF motifs by significance of their association with chromatin accessibility, and compare performance of the model with respect to different aspects of input data preprocessing and perturbations, thus developing guidelines and best practices.

B-369: Identification of gene expression features concordant with Topologically Associating Domains
Track: RegSys
  • Patrycja Rosa, University of Warsaw; Nencki Institute of Experimental Biology, Polish Academy of Sciences, Poland
  • Aleksander Jankowski, University of Warsaw, Poland


Presentation Overview: Show

Gene expression profiles differ between distinct tissues, and also within heterogeneous populations of cells forming a single tissue. The variation in gene expression helps achieve robustness of gene expression programs. We aimed to identify features quantifying this variation that would be best correlated with the spatial chromatin structure, as defined by Topologically Associating Domains (TADs). We considered mean gene expression across cell types, standard deviation of gene expression, as well as their ratio (coefficient of variation) and its residual on mean expression.

For our study, we used multiple scRNA-seq datasets from human, mouse and fruit fly. For each dataset, we clustered the cells, calculated the average gene expression level in each cell cluster, and further used the mean and standard deviation of these averages. To obtain the information about gene location in the chromatin structure, we used TAD boundary annotations from published Hi-C data on similar tissues.

We found some dependence between gene expression variation and gene location in a TAD, although the results differed between datasets and species. We further confirmed spatial autocorrelation of gene expression variation within TADs, by considering pairs of genes at different distance ranges, and comparing them to gene pairs with permuted gene expression profiles.

B-370: Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers
Track: RegSys
  • Alexander Karollus, Technical University of Munich, Germany
  • Thomas Mauermeier, Technical University of Munich, Germany
  • Julien Gagneur, Technical University of Munich, Germany


Presentation Overview: Show

The largest current sequence-based models of transcription control are obtained by predicting genome-wide regulatory assays across the human genome. This setting is correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which they capture genuine causal signals. Here we confront predictions of state-of-the-art models of transcription regulation against data from large-scale observational studies and deep perturbation assays. The most advanced model, Enformer, generally captures causal determinants of human promoters. However, models fail to capture the causal effects of enhancers on expression, notably in medium to long distances. More generally, the predicted impact of distal elements on gene expression predictions is small and the ability to correctly integrate long-range information is significantly more limited than the receptive fields of the models suggest. This is likely caused by the escalating class imbalance between actual and candidate regulatory elements as distance increases. Our results suggest that models have advanced to the point that in-silico study of promoter regions and variants can provide meaningful insights. Moreover, we foresee that it will require significantly more and new kinds of data to train models which accurately account for distal elements.

B-371: MethyLasso: a segmentation approach to analyze DNA methylation patterns and identify differentially methylation regions from whole-genome datasets
Track: RegSys
  • Delphine Balaramane, UMR7242 Biotechnology and Cell Signaling, CNRS UMR7242, University of Strasbourg, Illkirch, France, France
  • Yannick Spill, UMR7242 Biotechnology and Cell Signaling, CNRS UMR7242, University of Strasbourg, Illkirch, France, France
  • Michaël Weber, UMR7242 Biotechnology and Cell Signaling, CNRS UMR7242, University of Strasbourg, Illkirch, France, France
  • Anaïs Flore Bardet, IGBMC, CNRS UMR7104, INSERM U1258, University of Strasbourg, Illkirch, France, France


Presentation Overview: Show

DNA methylation is an epigenetic modification affecting cytosines, mainly within CpG dinucleotides. It plays a role in the regulation of gene expression as it can block transcription factors from binding to DNA and aberrant DNA methylation is linked to numerous diseases, including cancer. DNA methylation can be profiled at the single cytosine resolution on the whole genome and has been performed in many cell types and conditions. Computational approaches are then essential to study DNA methylation patterns in a single condition or capture dynamic changes of DNA methylation levels across conditions.
Towards this goal, we developed MethyLasso, a novel segmentation approach that models DNA methylation data. It applies a fused lasso to identify regions in which methylation is constant. MethyLasso can identify low methylated regions (LMRs), unmethylated regions (UMRs), DNA methylation valleys (DMVs) and partially methylated domains (PMDs) in a single condition and differentially methylated regions (DMRs) between two conditions. We performed a rigorous benchmarking comparing existing approaches by evaluating the number, size, level of DNA methylation, boundaries, CpG content and coverage of the regions using several real datasets as well as the sensitivity and precision of the approaches using simulated data and show that MethyLasso performs best overall.

B-372: Expression kinetics of differentiating mouse embryonic stem cells
Track: RegSys
  • Fabian Titz-Teixeira, CECAD Cologne, University of Cologne, Germany
  • Michelle Huth, Max F. Perutz Laboratories, University of Vienna, Austria
  • Martin Leeb, Max F. Perutz Laboratories, University of Vienna, Austria
  • Andreas Beyer, CECAD Cologne, University of Cologne, Germany


Presentation Overview: Show

Due to their ability to self-renew and differentiate into any cell type, embryonic stem cells have great potential in a broad range of applications. In contrast to pluripotency, the exit from pluripotency is only poorly characterized. In order to identify cellular networks involved in the exit we performed transcriptomic characterization of 73 mouse knockout cell lines exhibiting delayed differentiation potential. Utilizing a dense 32 hour expression time course of differentiating mESCs, we developed a strategy to ‘position’ each knockout on the time axis of differentiation. This enabled us to quantify differentiation delay of the entire transcriptome and of molecular sub-networks. Furthermore, we devised a computational strategy for deriving functional dependencies between cellular processes from these data. Integrating these results with single-cell RNA sequencing data resulted in a set of high-confidence relationships between genes involved in basic cellular functions and specialized networks controlling differentiation and cell state transitions.
This analysis lead to the surprising finding that upon exit from pluripotency multiple basic cellular processes start to adapt to the primed state independent of the pluripotency network. Thus, whereas the pluripotency network is essential to maintain pluripotency, genes required for establishing subsequent cellular states are under control of distinct regulatory input.

B-373: scMOSim: Single-Cell Multi-Omics Simulation in R
Track: RegSys
  • Carolina Monzó, Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Spain
  • Arianna Febbo, Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Spain
  • Ángeles Arzalluz-Luque, Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Spain
  • Ana Conesa, Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Spain
  • Sonia Tarazona, Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Spain


Presentation Overview: Show

As single-cell technologies continue to advance, the need for simulation tools that can generate realistic and diverse single-cell multi-omics datasets becomes increasingly important. We present a novel extension of our previously developed R package for bulk multi-omics simulation (MOSim), called scMOSim, which enables the simulation of single-cell multi-omics data. scMOSim allows for the generation of single-cell transcriptomics data, as well as the incorporation of additional regulatory omics layers such as ATAC-seq and transcription factor binding. The tool supports various designs, including simulation of co-expression patterns of genes, simulation of replicates and differential expression between experimental conditions.
scMOSim provides users with the ability to generate count matrices for each simulated omics data type, capturing the heterogeneity and complexity of single-cell multi-omics datasets. Moreover, scMOSim identifies differentially expressed features within each omics layer and elucidates the active regulatory relationships between regulatory omics and gene expression data at the single-cell level.
By harnessing the capabilities of scMOSim, researchers will be able to generate realistic and customizable single-cell multi-omics datasets, allowing them to benchmark and validate analytical methods specifically tailored for integrative analysis of diverse regulatory omics data.

B-374: scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks
Track: RegSys
  • Hechen Li, Georgia Institute of Technology, United States
  • Ziqi Zhang, Georgia Institute of Technology, United States
  • Michael Squires, Georgia Institute of Technology, United States
  • Xi Chen, Southern University of Science and Technology, United States
  • Xiuwei Zhang, Georgia Institute of Technology, United States


Presentation Overview: Show

Simulated single-cell data is essential for designing and evaluating computational methods in the absence of experimental ground truth. Existing de novo simulators typically focus on modeling one or two specific biological factors or mechanisms that affect the output data, which limits their capacity to simulate the complexity and multi-modality in real data. Here, we present scMultiSim, an in silico simulator that generates multi-modal single-cell data, including gene expression, chromatin accessibility, RNA velocity, and spatial cell locations, while accounting for the relationships between modalities. scMultiSim jointly models various biological factors that affect the output data, including cell identity, within-cell gene regulatory networks (GRNs), cell-cell interactions (CCIs), and chromatin accessibility, while also incorporating technical noises. Additionally, users can easily adjust the effect of each factor. We validated scMultiSim's simulated biological effects and demonstrated its applications by benchmarking various computational tasks, including multi-modal and multi-batch data integration, RNA velocity estimation, GRN inference and CCI inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark an unprecedented range of computational problems, including those never benchmarked before and even new potential tasks.

B-375: Quantitative assessment of the cooperation between a small RNA and an endoribonuclease in regulation of gene expression in bacteria
Track: RegSys
  • Meshi Barsheshet Vigoda, The Hebrew University of Jerusalem, Israel
  • Liron Argaman, The Hebrew University of Jerusalem, Israel
  • Hanah Margalit, The Hebrew University of Jerusalem, Israel


Presentation Overview: Show

Small RNAs (sRNAs) are gene expression regulators in bacteria, known to control translation by base pairing with their target mRNAs, blocking or exposing the ribosome binding site. However, accumulating evidence suggest that sRNAs can also affect the transcript stability, by influencing the mRNA accessibility to endoribonucleases, either indirectly through their effect on ribosome binding, or directly by assisting or interfering with the cleavage. While cooperation of sRNAs with endoribonucleases was demonstrated for several target genes, there has not been a systematic study assessing quantitatively this cooperation at a transcriptome-wide scale. We use large-scale RNA-seq-based approaches to study the extent of cooperation between the major endoribonuclease in Escherichia coli, RNase E, and the sRNA GcvB. To study the cooperation between GcvB and RNase E we apply the Double Mutant Cycle approach to four strains of E. coli: with or without GcvB expression or RNase E activity. By fitting a linear model to the differential gene expression data, we infer different modes of cooperation between RNase E and GcvB. Furthermore, using the 5’P mapping approach to map RNase E cleavage sites transcriptome-wide and analysing them in view of GcvB binding sites, we infer the mechanisms of cooperation between the two regulators.

B-376: Integrative Analysis of Enhancer Hijacking Events in Cancer
Track: RegSys
  • Emel Comak, Max Planck Institute for Molecular Genetics, Germany
  • Stefan Haas, Max Planck Institute for Molecular Genetics, Germany


Presentation Overview: Show

Enhancer hijacking is a phenomenon in which active enhancers, as a part of regulatory elements of transcribed genes, are relocated to the proximity of another gene that usually plays a role in another cellular pathway. Such hijacking events are more common in cancer as a way of altering the oncogenic pathways. Here we present a tool that prioritizes rearrangements in cancer according to their enhancer hijacking possibility and the genes involved. The project’s main step is to derive cell-type specific enhancer clusters associated with cancer-related breakpoints. This approach highlights changes in tissue-specificity of enhancers around the breakpoint of structural variations thus potentially changing expression of close by genes. As a ranking strategy we prioritized enhancer cluster-gene pairs around the breakpoints, considering tissue-specificity, recurrence across multiple patients, gene expression change and further annotations of clusters including distance topologically associated domain boundaries, evolutionary conservation scores and tested our tool on the PCAWG consortium's somatic structural variations.

B-377: Introducing Augusta, a Python package for inferring Gene Regulatory and Boolean Networks from RNA-Seq data, and its application to the Caldimonas thermodepolymerans DSM 15344T
Track: RegSys
  • Jana Musilova, Brno University of Technology, FEEC, Department of Biomedical Engineering, Czechia
  • Zdenek Vafek, Brno University of Technology, Institute of Forensic Engineering, Czechia
  • Kristyna Hermankova, Brno University of Technology, FEEC, Department of Biomedical Engineering, Czechia
  • Xenie Kourilova, Brno University of Technology, FCH, Department of Food Chemistry and Biotechnology, Czechia
  • Iva Pernicova, Brno University of Technology, FCH, Department of Food Chemistry and Biotechnology, Czechia
  • Bhanwar Puniya, University of Nebraska-Lincoln, Department of Biochemistry, United States
  • Stanislav Obruca, Brno University of Technology, FCH, Department of Food Chemistry and Biotechnology, Czechia
  • Tomas Helikar, University of Nebraska-Lincoln, Department of Biochemistry, United States
  • Karel Sedlar, Brno University of Technology, FEEC, Department of Biomedical Engineering, Czechia


Presentation Overview: Show

The study of gene regulation is crucial in understanding complex biological processes. RNA-Sequencing has emerged as a powerful technology for quantifying gene expression and identifying crucial pathways. Gene Regulatory Networks (GRNs) and Boolean Networks (BNs) are in silico models for studying gene regulation, as they provide a way to model gene interactions. However, inferring these networks, especially in terms of whole-genomes, remains challenging.

To address these challenges, we developed Augusta, a Python package for inferring whole-genome GRNs from time-series RNA-Seq. Augusta implements a multi-step pipeline including input count table normalization, a GRN inference by mutual information computation, and a two-step network validation. First, transcription factor (TF) motifs are de novo discovered. Next, curated databases are searched and relevant information is added to the network. Moreover, Augusta enables inferring BNs as a combination of database-obtained data and implementing logical rules.

We demonstrate Augusta´s utility by applying it to our dataset of Caldimonas thermodepolymerans DSM 15344T, a bacterium with the potential for industrial production of polyhydroxyalkanoates – ecologically friendly plastics. We inferred a network containing 3,650 genes, which provides insights into key regulatory elements underlying the bacterium's unique properties.

Augusta is available from github.com/JanaMus/Augusta along with documentation, examples, and tutorials.

B-378: Genomic analysis reveals epigenetic changes during Neonatal D2-mediated T3 production affects adult hepatic transcriptome.
Track: RegSys
  • Tatiana Fonseca, University of Chicago, United States
  • Murlidharan Nair, Indiana University South Bend, United States
  • Antonio Bianco, University of Chicago, United States


Presentation Overview: Show

In neonatal liver, increased expression of hepatic type 2 deiodinase (Dio2/D2) upregulates thyroid hormone triiodothyronine (T3) and T3 responsive genes, thus modifying the hepatic transcriptome. How this D2-T3 signaling interferes with epigenetic reprogramming has been delineated using genomic analysis of liver-specific Dio2 knockdown system. Genome-wide methylome analysis revealed de novo methylation of ~1500 DNA hypermethylation sites (H-sites) that were associated with ~1800 areas of reduced chromatin accessibility (RCA) regions. Further, to delineate how the H-sites affect chromatin remodeling, Hi-C analysis of Alb-D2KO genome was carried out and compared with control mouse liver. Results revealed extensive changes in topologically associated domains. Detailed analysis of chromatin loops revealed changes in 3D nuclear organization between control and Dio2 knockdown. Further, transcriptomic analysis of a subset of genes downregulated in adult Alb-D2KO mouse liver that have RCA associated within their core promoters, revealed genes that were within the affected TADs. The data provides evidence to suggest that changes in chromatin remodeling may directly affect the hepatic transcriptome of the adult mouse.

B-380: A statistical model inferring receptor activities from gene expression provides insights into intercellular communication
Track: RegSys
  • Szilvia Barsi, Semmelweis University, Faculty of Medicine, Department of Physiology, Budapest, Hungary, Hungary
  • Daniel Dimitrov, Heidelberg University and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany, Germany
  • Julio Saez-Rodriguez, Heidelberg University and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany, Germany
  • László Hunyady, Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary, Hungary
  • Bence Szalai, Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary, Hungary


Presentation Overview: Show

Cell-cell communication is a fundamental process whereby a cell-mediated signal, the ligand, reaches a target cell and via its corresponding receptors initiates downstream signalling thereby changing the cell’s state. Understanding intercellular communication is crucial to identify pathological mechanisms in cancer and other diseases. Computational methods for inferring receptor or ligand signalling activities can be classified into prior-knowledge-based approaches, which have a limited coverage of known interactions and also fail to identify novel interactions, and data-driven approaches which, although more powerful than the knowledge-driven models, require high-quality, curated data. To overcome the limitations of the existing methods, we have developed a computational model based on large-scale transcriptomic data to estimate receptor activities. We collected the receptor and ligand chemical and genetic perturbation gene expression signatures from the LINCS L1000 database and used ligand-receptor interactions described in the literature to construct a linear model that aggregates receptor-specific gene expression patterns. Accordingly, the model can be used to infer receptor activities to summarise high-dimensional gene expression profiles into interpretable mechanisms driving disease.

B-381: CHARACTERISATION OF ALTERNATIVE POLYADENYLATION AT SINGLE CELL RESOLUTION IN ALZHEIMER DISEASE
Track: RegSys
  • Franz Ake, IDIBELL, Spain


Presentation Overview: Show

Alternative polyadenylation (APA) is a widespread mechanism of gene regulation that
generates mRNA isoforms with distinct 3ʹ ends. APA is well known to be regulated during cell differentiation and is a major source of gene regulation in the brain. Proliferating cells tend to have shorter 3’ UTRs while differentiated cells have longer 3’UTRs. Changes in APA patterns are not only characteristic of cellular differentiation but also have been associated with pathological processes such as cancer or neurodegenerative diseases like Alzheimer’s disease (AD). Here we present SCALPEL, a tool for characterizing APA sites at single cell resolution using 10X Genomics or Dropseq scRNA-seq dataset. SCALPEL allows quantifying RNA expression at isoform level and single-cell resolution and identifying changes in isoform usage across cell populations and conditions. We used SCALPEL to study the changes in APA during the differentiation of induced pluripotent stem cells (iPSCs) to neuroprogenitor cells (NPCs). The results from our analysis show clear changes in 3’end usage between iPSCs and NPCs. We project to use SCALPEL to investigate the role of APA in neural differentiation and its role in the development of AD and how APA changes during neural differentiation and how these changes are altered in AD.

B-382: Epigenetic regulation in Acinetobacter baumannii
Track: RegSys
  • Anna Ershova, Trinity College Dublin, Ireland
  • Carsten Kröger, Trinity College Dublin, Ireland


Presentation Overview: Show

Acinetobacter baumannii is a nosocomial pathogen with a high level of genomic plasticity and the ability to colonise different environments. The Acinetobacter species possesses a conserved orphan methyltransferase that can affect different traits of these bacteria, including growth, stress response, antibiotic sensitivity and virulence. In this work, we investigated the underlying molecular mechanisms.
We identified differentially expressed genes in the methyltransferase deficient mutant of A. baumannii strain AB5075 for two different growth phases and analysed the distribution of methylated sites in the bacterial genomes for both growth stages.
We found that about 10% of A. baumannii AB5075 genes were differentially expressed in at least one growth stage. The analysis of methylated sites showed a surprisingly low level of genome methylation (~30%). We observed a difference in methylomic profiles for the different stages, however, there was no correspondence between the distribution of differentially expressed genes and methylated sites. We hypothesise that methylation can have a global effect on chromosome organisation rather than a local effect on the binding of transcription factors. We also speculate that the bacterial population is heterogenous by methylation state and that we need to develop new approaches to reveal weak methylation signals.

B-383: Cross-species view of transcription termination in bacteria
Track: RegSys
  • Amir Bar, The Hebrew University of Jerusalem, Jerusalem, Israel, Israel
  • Liron Argaman, The Hebrew University of Jerusalem, Jerusalem, Israel, Israel
  • Michal Eldar, The Hebrew University of Jerusalem, Jerusalem, Israel, Israel
  • Hanah Margalit, The Hebrew University of Jerusalem, Jerusalem, Israel, Israel


Presentation Overview: Show

To adapt to environmental changes, bacteria strictly regulate their gene expression through versatile regulatory mechanisms at both the transcriptional and post-transcriptional levels. In addition to the extensively-studied transcription initiation regulation by transcription factors, there are regulatory mechanisms reshaping the 3’ termini of initiated and already transcribed transcripts, involving termination factors, attenuators, riboswitches, and ribonucleases. To study the evolutionary conservation of these mechanisms and their dynamics under various growth conditions, the 3’ termini of gene transcripts need to be explicitly determined. Recent data generated by tailored RNA-seq approaches, aimed to globally identify the 3’ termini, are limited to only a few bacteria and a few growth conditions. To this end, we developed a computational approach to determine with high accuracy transcript 3’ termini from sequencing data generated by the RNAtag-seq protocol. Our method exploits a signal identified in the RNAtag-seq data, without the need for special experimental manipulations. The ample RNAtag-seq data available for many bacterial pathogens, grown under various stress and virulence-inducing conditions, enable us to use our computational method for constructing an atlas of 3’ termini across bacteria and growth conditions, and to study them from an evolutionary point of view.