The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 21, 2025
11:20-11:40
Proceedings Presentation: Fair molecular feature selection unveils universally tumor lineage-informative methylation sites in colorectal cancer
Confirmed Presenter: Cenk Sahinalp, National Cancer Institute, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Xuan Li, Xuan Li, University of Maryland
  • Yuelin Liu, Yuelin Liu, University of Maryland
  • Alejandro Schäffer, Alejandro Schäffer, National Cancer Institute
  • Stephen Mount, Stephen Mount, University of Maryland
  • Cenk Sahinalp, Cenk Sahinalp, National Cancer Institute

Presentation Overview:Show

In the era of precision medicine, performing comparative analysis over diverse patient populations is a fundamental step towards tailoring healthcare interventions. However, the critical aspect of fairly selecting molecular features across multiple patients is often overlooked. To address this challenge, we introduce FALAFL (FAir muLti-sAmple Feature seLection), an algorithmic approach based on combinatorial optimization. FALAFL is designed to perform feature selection in sequencing data which ensures a balanced selection of features from all patient samples in a cohort. We have applied FALAFL to the problem of selecting lineage-informative CpG sites within a cohort of colorectal cancer patients subjected to low read coverage single-cell methylation sequencing. Our results demonstrate that FALAFL can rapidly and robustly determine the optimal set of CpG sites, which are each well covered by cells across the vast majority of the patients, while ensuring that in each patient a large proportion of these sites have high read coverage. An analysis of the FALAFL-selected sites reveals that their tumor lineage-informativeness exhibits a strong correlation across a spectrum of diverse patient profiles. Furthermore, these universally lineage-informative sites are highly enriched in the inter-CpG island regions. FALAFL brings unsupervised fairness considerations into the molecular feature selection from single-cell sequencing data obtained from a patient cohort. We hope that it will aid in designing panels for diagnostic and prognostic purposes and help propel fair data science practices in the exploration of complex diseases.

July 21, 2025
11:40-12:00
Proceedings Presentation: Fast tumor phylogeny regression via tree-structured dual dynamic programming
Confirmed Presenter: Henri Schmidt, Princeton University, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Henri Schmidt, Henri Schmidt, Princeton University
  • Yuanyuan Qi, Yuanyuan Qi, University of Illinois at Urbana–Champaign
  • Ben Raphael, Ben Raphael, Princeton University
  • Mohammed El-Kebir, Mohammed El-Kebir, University of Illinois at Urbana-Champaign

Presentation Overview:Show

Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms. Here, we introduce fastppm, a fast tool to solve the regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the L2, piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the L2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450x speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400x speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime.

July 21, 2025
12:00-12:20
Proceedings Presentation: Bayesian inference of fitness landscapes via tree-structured branching processes
Confirmed Presenter: Xiang Ge Luo, ETH Zurich, Switzerland
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Xiang Ge Luo, Xiang Ge Luo, ETH Zurich
  • Jack Kuipers, Jack Kuipers, ETH Zurich
  • Kevin Rupp, Kevin Rupp, ETH Zurich
  • Koichi Takahashi, Koichi Takahashi, MD Anderson Cancer Center
  • Niko Beerenwinkel, Niko Beerenwinkel, ETH Zurich

Presentation Overview:Show

Motivation: The complex dynamics of cancer evolution, driven by mutation and selection, underlies the molecular heterogeneity observed in tumors. The evolutionary histories of tumors of different patients can be encoded as mutation trees and reconstructed in high resolution from single-cell sequencing data, offering crucial insights for studying fitness effects of and epistasis among mutations. Existing models, however, either fail to separate mutation and selection or neglect the evolutionary histories encoded by the tumor phylogenetic trees.

Results: We introduce FiTree, a tree-structured multi-type branching process model with epistatic fitness parameterization and a Bayesian inference scheme to learn fitness landscapes from single-cell tumor mutation trees. Through simulations, we demonstrate that FiTree outperforms state-of-the-art methods in inferring the fitness landscape underlying tumor evolution. Applying FiTree to a single-cell acute myeloid leukemia dataset, we identify epistatic fitness effects consistent with known biological findings and quantify uncertainty in predicting future mutational events. The new model unifies probabilistic graphical models of cancer progression with population genetics, offering a principled framework for understanding tumor evolution and informing therapeutic strategies.

July 21, 2025
12:20-12:40
MiClone: A Probabilistic Method for Inferring Cell Phylogenies from Mitochondrial Variants
Confirmed Presenter: Emilia Hurtado, The University of British Columbia, Canada
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Emilia Hurtado, Emilia Hurtado, The University of British Columbia
  • Andrew Roth, Andrew Roth, University of British Columbia

Presentation Overview:Show

Cancer development and progression is largely fuelled by somatic mutations that give rise to clones – distinct subpopulations of malignant cells that emerge as a result of differential mutation and proliferation within tumours. Given that clones may exhibit selective survivorship in response to treatment, characterising the evolutionary history of a tumour’s clones is a critical task in cancer research. With the advent of single-cell resolution sequencing technologies, bulk clonal deconvolution is no longer strictly required – as the samples themselves are already separate cellular representations. However, even at single-cell resolution, challenges remain in the study of human disease. Interestingly, there does exist an adjacent source of potential phylogenetic signal in these single-cell measures, in the form of the mitochondrial genome. In the context of the cancer evolution characterisation problem, the use of mitochondrial variants could allow for improved study of copy number stable and low nuclear-genomic somatic variant dependent cancers. In this work we present MiClone, a Bayesian method to infer the phylogenetic relationship of single-cell genomes using the signal available in the mitochondrial genome. MiClone uses the proportions of mitochondrial variants across cells as input, treating each individual single-cell sample as a bulk mixture of mitochondrial genomes. Leveraging the PhyClone phylogenetic machinery, MiClone is able to scalably process thousands of single-cell samples to produce fine-grained and accurate clonal-prevalence estimates for each cell. We demonstrate MiClone’s performance using real-world datasets from 20 patients across a variety of cancer types, each consisting of thousands of single-cell genomes.

July 21, 2025
12:40-12:50
scVarID: Mapping Genetic Variants at Single-Cell Resolution to Uncover Precursor Cells in Cancer
Confirmed Presenter: Jonghyun Lee, National Cancer Center, South Korea
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Jonghyun Lee, Jonghyun Lee, National Cancer Center
  • Juyeon Cho, Juyeon Cho, National Cancer Center
  • Byungjo Lee, Byungjo Lee, 323
  • Dongkwan Shin, Dongkwan Shin, National Cancer Center

Presentation Overview:Show

Next-generation sequencing technologies enable the identification of genetic variants and gene expression; however, measuring both features simultaneously within the same cell has remained challenging. Difficulties in co-isolating DNA and RNA have limited our ability to directly connect somatic mutations with transcriptional consequences. Additionally, variant detection from bulk sequencing cannot resolve haplotype-specific or cell-to-cell genetic heterogeneity—information that is crucial for understanding the progression of genetic diseases such as cancer.
To address this, we developed scVarID, an algorithm that maps variant calls onto cell-level transcriptomes, producing cell-by-variant matrices of both variant and reference allele counts for each transcript across single cells.
Using scVarID, we uncovered widespread allelic imbalance in peripheral blood mononuclear cells (PBMCs), particularly in subsets of cells exhibiting strong allele-specific expression (ASE) of HLA-related genes. This allelic imbalance was also observed in both normal and tumor epithelial cells from colorectal cancer patients. Most notably, ASE analysis of paired normal samples revealed a subset of cells with altered variant ratios in HLA-A, a gene frequently associated with ASE loss in tumors. This suggests the presence of potential precursor cell states marked by early post-transcriptional imbalance, which may precede tumorigenesis.
These findings demonstrate scVarID’s ability to resolve genotype–phenotype relationships at single-cell resolution and to identify early-stage cellular alterations that may contribute to cancer development. This approach opens new avenues for early cancer detection and for studying the functional impact of somatic variation in pre-malignant cell populations.

July 21, 2025
12:50-13:00
High-Resolution Discovery of Lineage-Specific SVs in Pan Genus Through Assembly Comparisons
Confirmed Presenter: Aisha, Beijing Institute of Genomics, (China National Center for Bioinformation)Chinese Academy of Sciences
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Aisha, Aisha, Beijing Institute of Genomics
  • Hua Chen, Hua Chen, Beijing Institute of Genomics

Presentation Overview:Show

The two sister species in the Pan genus, chimpanzees (Pan troglodytes) and bonobos (Pan paniscus), exhibit lineage-specific differences for several behavioral and physiological traits. In this work, our goal was to identify the genome regions within Pan genus that have experienced structural rearrangements in a lineage-specific manner. Using the high-quality genome assemblies of chimpanzee, bonobo and human, we performed genome assembly comparisons using the human genome (GRCh38) as a reference. Lineage-specific structural variants (LSSVs) in Pan genus provided enhanced insights into the genomic rearrangements that are likely to affect gene function and phenotypes. We observed considerable variation in SV distribution between two species, with SVs widespread in chimpanzee assembly and scarce in bonobo assembly. Focusing on the SVs harbored in CDS regions of protein-coding genes, we found 22 genes highly impacted by SVs in bonobos, that either lead to feature truncation or transcript ablation. Notably, a total of 232 SV impacted genes experienced transcript ablation and were found to be involved in olfactory transduction, keratinization and transcriptional regulation. Functional enrichment analysis of LSSV-impacted genes revealed enrichment for body growth, brain function, and neurological disorders in bonobo lineage while metabolism and transcriptional regulation showed enrichment in chimpanzee. Additionally, it was discovered that a small number of the SV-affected genes were responsible for the distinctive behavioral differences between two lineages, indicating their role in determining the lineage-specific characteristics present in Pan genus.

July 21, 2025
14:00-14:20
Tracing the functional divergence of duplicated genes
Confirmed Presenter: Irene Julca, University of Lausanne, Switzerland
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Alex Warwick Vesztrocy, Alex Warwick Vesztrocy, BioSoft Research UK
  • Natasha Glover, Natasha Glover, University of Lausanne
  • Paul D. Thomas, Paul D. Thomas, University of Southern California
  • Christophe Dessimoz, Christophe Dessimoz, University of Lausanne
  • Irene Julca, Irene Julca, University of Lausanne

Presentation Overview:Show

Gene duplication is a fundamental driver of functional innovation in evolution. Following duplication, paralogous genes may be retained, diverge in function (through sub- or neo-functionalisation), or be lost. When paralogues are retained, identifying which copy preserves the ancestral function can be challenging. The “least diverged orthologue” (LDO) conjecture proposes that the paralogue evolving more slowly at the sequence level is more likely to retain the ancestral function. In this study, we systematically test this hypothesis using a novel method that detects asymmetric sequence evolution in gene families. We applied this approach to all gene trees from the PANTHER database, encompassing gene duplications across the Tree of Life. We further integrated structural data for over one million proteins and gene expression data from 16 animal and 20 plant species. Our analysis, spanning thousands of gene families, reveals that although many paralogues evolve at comparable rates, a substantial fraction exhibits marked asymmetry in sequence divergence. This asymmetry correlates with differences in expression profiles and predicted functional annotations. Together, the results strongly support the LDO conjecture: the least diverged paralogue tends to retain ancestral function, while the more rapidly evolving copy may acquire specialised or novel roles. These findings have significant implications for orthology prediction and functional annotation in comparative genomics.

July 21, 2025
14:20-14:40
Duplication Episode Clustering in Phylogenetic Networks
Confirmed Presenter: Agnieszka Mykowiecka, Faculty of Mathematics, Informatics and Mechanics
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Pawel Gorecki, Pawel Gorecki, University of Warsaw
  • Jarosław Paszek, Jarosław Paszek, Faculty of Mathematics
  • Agnieszka Mykowiecka, Agnieszka Mykowiecka, Faculty of Mathematics

Presentation Overview:Show

Phylogenetic networks provide a powerful framework for representing complex evolutionary histories that traditional tree-based models cannot adequately capture. In particular, phylogenetic networks allow modeling evolutionary processes that involve reticulate events such as hybridization, horizontal gene transfer, and introgression. At the same time, macro-evolutionary events such as genomic and whole-genome duplications add another layer of complexity, leading to gene family expansions, functional divergence, and lineage-specific innovation. While each process has been studied extensively in isolation, recent advances highlight the need to consider them jointly, as they often co-occur in shaping genomic landscapes.

We present two novel problems that aim to infer genomic duplication episodes by duplication clustering in the phylogenetic network using a collection of gene family trees. First, we propose a polynomial-time dynamic programming (DP) formulation that verifies the existence of a set of episodes from a predefined set of episode candidates. We then demonstrate how to use DP to design an algorithm that solves a general inference problem. To evaluate our method, we perform computational experiments on empirical data containing whole genome duplication events in a network of Panadales species, showing that our algorithms can accurately verify genomic duplication hypotheses.

July 21, 2025
14:40-15:00
PhytClust: Accurate and Fast Clustering in Phylogenetic Trees
Confirmed Presenter: Katyayni Ganesan, ICCB Cologne, Germany
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Katyayni Ganesan, Katyayni Ganesan, ICCB Cologne
  • Elisa Billard, Elisa Billard, ICCB Cologne
  • Tom L Kaufmann, Tom L Kaufmann, ICCB Cologne
  • Cody B Duncan, Cody B Duncan, ICCB Cologne
  • Maja C Cwikla, Maja C Cwikla, ICCB Cologne
  • Adrian Altenhoff, Adrian Altenhoff, Swiss Institute of Bioinformatics & ETH Zurich
  • Christophe Dessimoz, Christophe Dessimoz, Swiss Institute of Bioinformatics & ETH Zurich
  • Roland F Schwarz, Roland F Schwarz, ICCB Cologne

Presentation Overview:Show

Phylogenetic trees serve an important role in disentangling the evolutionary relationships between taxa, across diverse fields. A key question is the identification of distinct subpopulations within a phylogenetic tree. Several methods have been developed to classify taxa in phylogenetic trees into clusters based on their evolutionary distance. However, these approaches tend to rely on arbitrary thresholds that vary across studies, making meaningful interpretation and comparison between results challenging. Additionally, they often rely on heuristics to limit their cluster search, as enumerating all possible clusters within a tree is prohibitive for large trees. Here, we present PhytClust, a novel tool that provides a standardized approach to clustering taxa in phylogenetic trees into monophyletic clusters, bypassing the use of subjective parameters. PhytClust uses a score derived from the cumulative intra-cluster branch lengths to (i) find the optimal set of clusters for a given number of clusters, and (ii) from these candidate cluster sets, selects the one that optimally represents the tree’s topology and genetic distances. PhytClust provides an exact and efficient solution to the clustering problem based on dynamic programming, making it suitable for large trees with up to 100k taxa. Compared to existing methods, PhytClust is faster and more accurate in recovering ground truth clusters. We apply PhytClust to data spanning various biological domains, including cancer genomics, avian phylogenomics, bacterial and archaea phylogenetics, and plant genomics. By providing a standardized method for node clustering, PhytClust can help infer optimal clusters for a phylogenetic tree without any additional parameters.

July 21, 2025
15:00-15:20
Learning the Language of Phylogeny with MSA Transformer
Confirmed Presenter: Ruyi Chen, University of Queensland, Australia
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Ruyi Chen, Ruyi Chen, University of Queensland
  • Gabriel Foley, Gabriel Foley, University of Queensland
  • Mikael Boden, Mikael Boden, University of Queensland

Presentation Overview:Show

Classical phylogenetic inference assumes independence between sites, potentially undermining the accuracy of evolutionary analyses in the presence of epistasis. Some protein language models have the capacity to encode dependencies between sites in conserved structural and functional domains across the protein universe. We employ the MSA Transformer, which takes a multiple sequence alignment (MSA) as an input, and is trained with masked language modeling objectives, to investigate if and how effects of epistasis can be captured to enhance the analysis of phylogenetic relationships.

We test whether the MSA Transformer internally encodes evolutionary distances between the sequences in the MSA despite this information not being explicitly available during training. We investigate the model's reliance on information available in columns as opposed to rows in the MSA, by systematically shuffling sequence content. We then use MSA Transformer on both natural and simulated MSAs to reconstruct entire phylogenetic trees with implied ancestral branchpoints, and assess their consistency with trees from maximum likelihood inference.

We demonstrate how both previously known and novel evolutionary relationships are available from a ''non-classical'' approach with very different computational requirements, by reconstructing phylogenetic trees for the RNA virus RNA-dependent RNA polymerase and the nucleo-cytoplasmic large DNA virus domain. We anticipate that MSA Transformer will not replace but rather complement classical phylogenetic inference, to accurately recover the evolutionary history of protein families.

July 21, 2025
15:20-15:40
Accurate multiple sequence alignment at protein-universe scale with FAMSA 2
Confirmed Presenter: Adam Gudys, Silesian University of Technology, Poland
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Adam Gudys, Adam Gudys, Silesian University of Technology
  • Andrzej Zielezinski, Andrzej Zielezinski, Adam Mickiewicz University
  • Cedric Notredame, Cedric Notredame, Centre for Genomic Regulation
  • Sebastian Deorowicz, Sebastian Deorowicz, Silesian University of Technology

Presentation Overview:Show

Multiple sequence alignment (MSA) is a crucial analysis in computational biology applied, e.g., in phylogeny reconstruction or protein function prediction. Within few years, the large-scale sequencing efforts such as the Earth BioGenome Project will produce billions of sequences representing full diversity of the protein universe. However, the state-of-art MSA algorithms do not keep pace with the exponentially increasing size of sequence repositories.

To address this issue, we present FAMSA 2. Compared to its predecessor, it offers higher accuracy and speed, enhanced robustness to non-homologous sequence contamination, and a number of usability features like alignment trimming. The algorithmic improvements include a novel guide tree heuristic based on medoid clustering particularly suited for ultra-scale analyses.

The performance of FAMSA 2 has been evaluated on several data sets. They included Pfam families enriched with Homstrad reference alignments, AliSim-simulated alignments, and AlphaFold clusters. The experiments confirmed the presented algorithm to match or exceed the accuracy of Muscle5, Clustal Omega, or T-Coffee's regressive method, being orders of magnitude faster. FAMSA 2 was the only algorithm to align a set of over 12 million sequences. This was done in 40 minutes on a 64 GB RAM workstation.

The first version of FAMSA with almost 60 000 downloads and applications in milestone projects like AlphaFold and Pfam confirmed its usability to the community. We believe that FAMSA 2, by enabling evolutionary and structural analyses at scale beyond reach of the competing tools, would gain an even larger impact in the field.

July 21, 2025
15:40-16:00
Community detection at unprecedented scales with ExoLabel
Confirmed Presenter: Erik Wright, University of Pittsburgh, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: Live stream

Authors List: Show

  • Aidan Lakshman, Aidan Lakshman, University of Pittsburgh
  • Erik Wright, Erik Wright, University of Pittsburgh

Presentation Overview:Show

Many approaches in comparative genomics rely on clusters of orthologous genes (COGs). Methods for constructing COGs often employ community detection algorithms to identify clusters within a network of pairwise similarities among genes. As the number of available genome sequences continues to grow exponentially, this community detection step has proven to be the limiting factor for scaling COGs to more genomes – both in terms of memory and time required. In this study, we developed ExoLabel, a community detection program that can scale to enormous graphs by applying a linear-time algorithm to data outside of memory (i.e., on disk). We show that ExoLabel's accuracy rivals popular programs for identify COGs but is orders of magnitude faster and more memory efficient that existing programs. We demonstrate ExoLabel's performance by clustering a graph with 16.2 million nodes (genes) and 18.3 billion edges (pairwise similarities) in less than a day using only a few gigabytes of RAM. ExoLabel democratizes comparative genomics in settings without access to supercomputers and scales COG detection to new heights.

July 21, 2025
16:40-17:00
EdgeHOG: Scalable and Fine-Grained Ancestral Gene Order Inference Across the Tree of Life
Confirmed Presenter: Charles Bernard, Institut Pasteur, France
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Charles Bernard, Charles Bernard, Institut Pasteur
  • Yannis Nevers, Yannis Nevers, University of Strasbourg
  • Alex Warwick Vesztrocy, Alex Warwick Vesztrocy, University of Lausanne
  • Natasha Glover, Natasha Glover, University of Lausanne
  • Adrian Altenhoff, Adrian Altenhoff, ETH Zurich
  • Christophe Dessimoz, Christophe Dessimoz, University of Lausanne

Presentation Overview:Show

Ancestral genomes are essential for studying the diversification of life from the last universal common ancestor to modern organisms. Methods have been proposed to infer ancestral gene order, but they lack scalability, limiting the depth to which gene neighborhood evolution can be traced back. We introduce edgeHOG, a tool designed for accurate ancestral gene order inference with linear time complexity. Validated on various benchmarks, edgeHOG was applied to the entire OMA orthology database, encompassing 2,845 extant genomes across all domains of life. This represents the first tree-of-life scale inference, resulting in 1,133 ancestral genomes. In particular, we reconstructed ancestral contigs for the last common ancestor of eukaryotes, dating back around 1.8 billion years, and observed significant functional association among neighboring genes. The method also dates gene adjacencies, revealing conserved histone clusters and rapid chromosome rearrangements, enabling computational inference of these features.

July 21, 2025
17:00-17:20
JIGSAW: Accurate inference of exact copy numbers from targeted single-cell DNA sequencing
Confirmed Presenter: Sophia Chirrane, University College London Cancer Institute, Cancer Research UK Lung Cancer Centre of Excellence
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Sophia Chirrane, Sophia Chirrane, University College London Cancer Institute
  • Simone Zaccaria, Simone Zaccaria, University College London Cancer Institute

Presentation Overview:Show

Tumorigenesis is driven by the interplay between somatic single-nucleotide variants (SNVs) and larger structural alterations, like copy number alterations (CNAs), that are simultaneously accumulated in cancer cell genomes. This process results in highly heterogeneous tumours composed of distinct subpopulations of cells, or tumour clones, with different SNV and CNA combinations driving cancer progression and the development of treatment resistance.

Recent targeted single-cell sequencing technologies (tag-scDNA-seq, e.g Mission Bio Tapestri platform) provide ideal data to study this interplay because the deep, unbiased sequencing coverage of a targeted gene panel allows the assessment of both SNVs and CNAs in each cell. However, while analyzing SNVs is relatively straightforward, no method currently exists to accurately infer CNAs from tag-scDNA-seq due to the extremely high level of random variance caused by the very low number of reads sequenced from only a minimal fraction of the genome.

Here, we introduce JIGSAW (Joint Inference by Grouping Single-cell-clones of Amplicon-copy-numbers without Whole-genome), the first algorithm to infer CNAs from tag-scDNA-seq data. JIGSAW overcomes tag-scDNA-seq sparsity challenges by jointly grouping amplicons that share the same CNA state and clustering cells into clones using a Bayesian framework. Through extensive realistic simulations, we demonstrated that JIGSAW not only can accurately retrieve CNAs but is also robust to increasing levels of CNAs heterogeneity, cell-specific noise, and both clonal and subclonal whole-genome doublings. Applied to 2,153 pancreatic ductal adenocarcinoma cells and 12,000 AML cells, JIGSAW uncovered novel CNAs affecting cancer driver genes in conjunction with SNVs.

July 21, 2025
17:20-17:40
EASYstrata: a fully integrated workflow to infer evolutionary strata along sex chromosomes and other supergenes
Confirmed Presenter: Ricardo C. Rodriguez de la Vega, GEE - ESE, CNRS
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Quentin Rougemont, Quentin Rougemont, GEE - Écologie Societé Évolution
  • Elise Lucotte, Elise Lucotte, GEE - Écologie Societé Évolution
  • Loreleï Boyer, Loreleï Boyer, GEE - Écologie Societé Évolution
  • Alexandra Jalaber de Dinechin, Alexandra Jalaber de Dinechin, AGIPP - Institut Jean-Pierre Bourgin
  • Alodie Snirc, Alodie Snirc, GEE - Écologie Societé Évolution
  • Tatiana Giraud, Tatiana Giraud, GEE - ESE
  • Ricardo C. Rodriguez de la Vega, Ricardo C. Rodriguez de la Vega, GEE - ESE

Presentation Overview:Show

New reference-level genomes are becoming increasingly available across the tree of life, opening new avenues for addressing exciting evolutionary questions. However, challenges remain in genome annotation, sequence alignment, evolutionary inference and a general lack of methodological standardization. Here, we present a new workflow designed to overcome these challenges in evolutionary analyses, facilitating the detection of recombination suppression and its consequences, such as structural rearrangements, transposable element accumulation and coding sequence degeneration. To achieve this, we integrate multiple bioinformatic steps into a single, reproducible and user-friendly pipeline. This workflow combines state-of-the-art tools to efficiently detect transposable elements, annotate newly assembled genomes, infer gene orthology, compute sequence divergence, as well as objectively identify stepwise extensions of recombination suppression (i.e., evolutionary strata) and their associated structural changes, while visualizing results throughout the process. We demonstrate how this Evolutionary analysis with Ancestral SYnteny for strata identification (EASYstrata) workflow was used to re-annotate 42 published Microbotryum genomes and a pair of giant plant sex chromosomes. We recovered all previously described strata and identified several that had gone unnoticed. While primarily developed to infer divergence between sex or mating-type chromosomes, EASYstrata can also be applied to any pair of haplotypes with diverging regions of interest, such as autosomal supergenes. This workflow will facilitate the study of the many non-model species for which newly sequenced, phased diploid genomes are now becoming available.
EASYstrata and detailed use cases can be found at https://github.com/QuentinRougemont/EASYstrata
Preprint: https://www.biorxiv.org/content/10.1101/2025.01.06.631483v1.full

July 21, 2025
17:40-18:00
The Phylogenetic Dynamic Regulatory Module Networks (P-DRMN) study infers Cis-regulatory features responsible for evolution of mammalian gene regulatory programs in aortic endothelium
Confirmed Presenter: Suvojit Hazra, Wisconsin Institute for Discovery, University of Wisconsin-Madison
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: Live stream

Authors List: Show

  • Suvojit Hazra, Suvojit Hazra, Wisconsin Institute for Discovery
  • Sara A Knaack, Sara A Knaack, Wisconsin Institute for Discovery
  • Erika Da-Inn Lee, Erika Da-Inn Lee, Wisconsin Institute for Discovery
  • Liangxi Wang, Liangxi Wang, Genetics and Genome Biology section
  • Mohamed Hawash, Mohamed Hawash, Genetics and Genome Biology section
  • Huayun Hou, Huayun Hou, Genetics and Genome Biology section
  • Michael Wilson, Michael Wilson, Genetics and Genome Biology section
  • Sushmita Roy, Sushmita Roy, Wisconsin Institute for Discovery

Presentation Overview:Show

Cis-regulatory elements (CREs), such as promoters and enhancers, interact with transcription factors (TFs) to drive gene regulatory programs and contribute to morphological diversity across species. Comparative regulatory genomics, which integrates omic measurements across species, offers a powerful framework for studying the evolution of gene regulation. While multi-omic profiling, combining transcriptomic and epigenomic data, has advanced, computational tools that are both phylogenetically aware and capable of analyzing high-dimensional, cross-species data remain scarce. To address this, we introduce Phylogenetic Dynamic Regulatory Module Networks (P-DRMN), a novel multi-task regression-based algorithm that models dynamic gene module regulatory networks using RNA-seq, ATAC-seq, and ChIP-seq data while incorporating phylogenetic relationships. P-DRMN clusters genes into similarly expressing, discrete gene modules based on expression levels and uses a regression function of upstream CREs to infer species-specific module-TF regulatory programs. We applied P-DRMN to aortic endothelial cell data, including gene expression, promoter/motif accessibility, and five histone modifications (H3K27ac, H3K36me3, H3K4me3, H3K4me2, H3K27me3), from five mammals (human, rat, cow, pig, and dog). P-DRMN inferred 19-65% conservation of gene modules across species, with high- and low-expression modules being the most conserved and diverged, respectively. We identified 103 transitioning gene sets with species- or clade-specific expression patterns, many regulated by distinct TFs and chromatin marks, for example, CTCF in human-specific high-expression modules, and SHOX2, H3K27me3, and H3K4me3 in pig/cow-specific high-expression modules. These results demonstrate how CREs and chromatin states shape species-specific gene expression. Overall, P-DRMN provides a powerful framework for integrating multi-omics data to study the evolutionary dynamics of gene regulation.