Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Monday, July 21st
11:20-11:40
Proceedings Presentation: Fair molecular feature selection unveils universally tumor lineage-informative methylation sites in colorectal cancer
Confirmed Presenter: Cenk Sahinalp, National Cancer Institute, United States

Format: In person


Authors List: Show

  • Xuan Li, University of Maryland, College Park, United States
  • Yuelin Liu, University of Maryland, College Park, United States
  • Alejandro Schäffer, National Cancer Institute, United States
  • Stephen Mount, University of Maryland, College Park, United States
  • Cenk Sahinalp, National Cancer Institute, United States

Presentation Overview: Show

In the era of precision medicine, performing comparative analysis over diverse patient populations is a fundamental step towards tailoring healthcare interventions. However, the critical aspect of fairly selecting molecular features across multiple patients is often overlooked. To address this challenge, we introduce FALAFL (FAir muLti-sAmple Feature seLection), an algorithmic approach based on combinatorial optimization. FALAFL is designed to perform feature selection in sequencing data which ensures a balanced selection of features from all patient samples in a cohort. We have applied FALAFL to the problem of selecting lineage-informative CpG sites within a cohort of colorectal cancer patients subjected to low read coverage single-cell methylation sequencing. Our results demonstrate that FALAFL can rapidly and robustly determine the optimal set of CpG sites, which are each well covered by cells across the vast majority of the patients, while ensuring that in each patient a large proportion of these sites have high read coverage. An analysis of the FALAFL-selected sites reveals that their tumor lineage-informativeness exhibits a strong correlation across a spectrum of diverse patient profiles. Furthermore, these universally lineage-informative sites are highly enriched in the inter-CpG island regions. FALAFL brings unsupervised fairness considerations into the molecular feature selection from single-cell sequencing data obtained from a patient cohort. We hope that it will aid in designing panels for diagnostic and prognostic purposes and help propel fair data science practices in the exploration of complex diseases.

11:40-12:00
Proceedings Presentation: Fast tumor phylogeny regression via tree-structured dual dynamic programming
Confirmed Presenter: Henri Schmidt, Princeton University, United States

Format: In person


Authors List: Show

  • Henri Schmidt, Princeton University, United States
  • Yuanyuan Qi, University of Illinois at Urbana–Champaign, United States
  • Ben Raphael, Princeton University, United States
  • Mohammed El-Kebir, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms. Here, we introduce fastppm, a fast tool to solve the regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the L2, piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the L2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450x speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400x speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime.

12:00-12:20
Proceedings Presentation: Bayesian inference of fitness landscapes via tree-structured branching processes
Confirmed Presenter: Xiang Ge Luo, ETH Zurich, Switzerland

Format: In person


Authors List: Show

  • Xiang Ge Luo, ETH Zurich, Switzerland
  • Jack Kuipers, ETH Zurich, D-BSSE, Computational Biology Group, Switzerland
  • Kevin Rupp, ETH Zurich, Switzerland
  • Koichi Takahashi, MD Anderson Cancer Center, United States
  • Niko Beerenwinkel, ETH Zurich, Switzerland

Presentation Overview: Show

Motivation: The complex dynamics of cancer evolution, driven by mutation and selection, underlies the molecular heterogeneity observed in tumors. The evolutionary histories of tumors of different patients can be encoded as mutation trees and reconstructed in high resolution from single-cell sequencing data, offering crucial insights for studying fitness effects of and epistasis among mutations. Existing models, however, either fail to separate mutation and selection or neglect the evolutionary histories encoded by the tumor phylogenetic trees.

Results: We introduce FiTree, a tree-structured multi-type branching process model with epistatic fitness parameterization and a Bayesian inference scheme to learn fitness landscapes from single-cell tumor mutation trees. Through simulations, we demonstrate that FiTree outperforms state-of-the-art methods in inferring the fitness landscape underlying tumor evolution. Applying FiTree to a single-cell acute myeloid leukemia dataset, we identify epistatic fitness effects consistent with known biological findings and quantify uncertainty in predicting future mutational events. The new model unifies probabilistic graphical models of cancer progression with population genetics, offering a principled framework for understanding tumor evolution and informing therapeutic strategies.

12:20-12:40
MiClone: A Probabilistic Method for Inferring Cell Phylogenies from Mitochondrial Variants
Format: In person


Authors List: Show

  • Emilia Hurtado, The University of British Columbia, Canada
  • Andrew Roth, University of British Columbia, Canada

Presentation Overview: Show

Cancer development and progression is largely fuelled by somatic mutations that give rise to clones – distinct subpopulations of malignant cells that emerge as a result of differential mutation and proliferation within tumours. Given that clones may exhibit selective survivorship in response to treatment, characterising the evolutionary history of a tumour’s clones is a critical task in cancer research. With the advent of single-cell resolution sequencing technologies, bulk clonal deconvolution is no longer strictly required – as the samples themselves are already separate cellular representations. However, even at single-cell resolution, challenges remain in the study of human disease. Interestingly, there does exist an adjacent source of potential phylogenetic signal in these single-cell measures, in the form of the mitochondrial genome. In the context of the cancer evolution characterisation problem, the use of mitochondrial variants could allow for improved study of copy number stable and low nuclear-genomic somatic variant dependent cancers. In this work we present MiClone, a Bayesian method to infer the phylogenetic relationship of single-cell genomes using the signal available in the mitochondrial genome. MiClone uses the proportions of mitochondrial variants across cells as input, treating each individual single-cell sample as a bulk mixture of mitochondrial genomes. Leveraging the PhyClone phylogenetic machinery, MiClone is able to scalably process thousands of single-cell samples to produce fine-grained and accurate clonal-prevalence estimates for each cell. We demonstrate MiClone’s performance using real-world datasets from 20 patients across a variety of cancer types, each consisting of thousands of single-cell genomes.

12:40-12:50
scVarID: Mapping Genetic Variants at Single-Cell Resolution to Uncover Precursor Cells in Cancer
Format: In person


Authors List: Show

  • Jonghyun Lee, National Cancer Center, South Korea
  • Juyeon Cho, National Cancer Center, South Korea
  • Byungjo Lee, 323, Ilsan-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do, Republic of Korea, South Korea
  • Dongkwan Shin, National Cancer Center, South Korea

Presentation Overview: Show

Next-generation sequencing technologies enable the identification of genetic variants and gene expression; however, measuring both features simultaneously within the same cell has remained challenging. Difficulties in co-isolating DNA and RNA have limited our ability to directly connect somatic mutations with transcriptional consequences. Additionally, variant detection from bulk sequencing cannot resolve haplotype-specific or cell-to-cell genetic heterogeneity—information that is crucial for understanding the progression of genetic diseases such as cancer.
To address this, we developed scVarID, an algorithm that maps variant calls onto cell-level transcriptomes, producing cell-by-variant matrices of both variant and reference allele counts for each transcript across single cells.
Using scVarID, we uncovered widespread allelic imbalance in peripheral blood mononuclear cells (PBMCs), particularly in subsets of cells exhibiting strong allele-specific expression (ASE) of HLA-related genes. This allelic imbalance was also observed in both normal and tumor epithelial cells from colorectal cancer patients. Most notably, ASE analysis of paired normal samples revealed a subset of cells with altered variant ratios in HLA-A, a gene frequently associated with ASE loss in tumors. This suggests the presence of potential precursor cell states marked by early post-transcriptional imbalance, which may precede tumorigenesis.
These findings demonstrate scVarID’s ability to resolve genotype–phenotype relationships at single-cell resolution and to identify early-stage cellular alterations that may contribute to cancer development. This approach opens new avenues for early cancer detection and for studying the functional impact of somatic variation in pre-malignant cell populations.

12:50-13:00
High-Resolution Discovery of Lineage-Specific SVs in Pan Genus Through Assembly Comparisons
Format: In person


Authors List: Show

  • Aisha, Government College University, Faisalabad., Pakistan
  • Hua Chen, Beijing Institute of Genomics, (China National Center for Bioinformation)Chinese Academy of Sciences, Beijing, China, China

Presentation Overview: Show

The two sister species in the Pan genus, chimpanzees (Pan troglodytes) and bonobos (Pan paniscus), exhibit lineage-specific differences for several behavioral and physiological traits. In this work, our goal was to identify the genome regions within Pan genus that have experienced structural rearrangements in a lineage-specific manner. Using the high-quality genome assemblies of chimpanzee, bonobo and human, we performed genome assembly comparisons using the human genome (GRCh38) as a reference. Lineage-specific structural variants (LSSVs) in Pan genus provided enhanced insights into the genomic rearrangements that are likely to affect gene function and phenotypes. We observed considerable variation in SV distribution between two species, with SVs widespread in chimpanzee assembly and scarce in bonobo assembly. Focusing on the SVs harbored in CDS regions of protein-coding genes, we found 22 genes highly impacted by SVs in bonobos, that either lead to feature truncation or transcript ablation. Notably, a total of 232 SV impacted genes experienced transcript ablation and were found to be involved in olfactory transduction, keratinization and transcriptional regulation. Functional enrichment analysis of LSSV-impacted genes revealed enrichment for body growth, brain function, and neurological disorders in bonobo lineage while metabolism and transcriptional regulation showed enrichment in chimpanzee. Additionally, it was discovered that a small number of the SV-affected genes were responsible for the distinctive behavioral differences between two lineages, indicating their role in determining the lineage-specific characteristics present in Pan genus.

14:00-14:20
Tracing the functional divergence of duplicated genes
Format: In person


Authors List: Show

  • Alex Warwick Vesztrocy, BioSoft Research UK, 3rd Floor, 86-90 Paul Street, London, EC2A 4NE, United Kingdom, United Kingdom
  • Natasha Glover, University of Lausanne, Switzerland
  • Paul D. Thomas, University of Southern California, United States
  • Christophe Dessimoz, University of Lausanne, Switzerland
  • Irene Julca, University of Lausanne, Switzerland

Presentation Overview: Show

Gene duplication is a fundamental driver of functional innovation in evolution. Following duplication, paralogous genes may be retained, diverge in function (through sub- or neo-functionalisation), or be lost. When paralogues are retained, identifying which copy preserves the ancestral function can be challenging. The “least diverged orthologue” (LDO) conjecture proposes that the paralogue evolving more slowly at the sequence level is more likely to retain the ancestral function. In this study, we systematically test this hypothesis using a novel method that detects asymmetric sequence evolution in gene families. We applied this approach to all gene trees from the PANTHER database, encompassing gene duplications across the Tree of Life. We further integrated structural data for over one million proteins and gene expression data from 16 animal and 20 plant species. Our analysis, spanning thousands of gene families, reveals that although many paralogues evolve at comparable rates, a substantial fraction exhibits marked asymmetry in sequence divergence. This asymmetry correlates with differences in expression profiles and predicted functional annotations. Together, the results strongly support the LDO conjecture: the least diverged paralogue tends to retain ancestral function, while the more rapidly evolving copy may acquire specialised or novel roles. These findings have significant implications for orthology prediction and functional annotation in comparative genomics.

14:20-14:40
Duplication Episode Clustering in Phylogenetic Networks
Format: In person


Authors List: Show

  • Pawel Gorecki, University of Warsaw, Poland
  • Jarosław Paszek, Faculty of Mathematics, Informatics and Mechanics University of Warsaw, Poland
  • Agnieszka Mykowiecka, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland

Presentation Overview: Show

Phylogenetic networks provide a powerful framework for representing complex evolutionary histories that traditional tree-based models cannot adequately capture. In particular, phylogenetic networks allow modeling evolutionary processes that involve reticulate events such as hybridization, horizontal gene transfer, and introgression. At the same time, macro-evolutionary events such as genomic and whole-genome duplications add another layer of complexity, leading to gene family expansions, functional divergence, and lineage-specific innovation. While each process has been studied extensively in isolation, recent advances highlight the need to consider them jointly, as they often co-occur in shaping genomic landscapes.

We present two novel problems that aim to infer genomic duplication episodes by duplication clustering in the phylogenetic network using a collection of gene family trees. First, we propose a polynomial-time dynamic programming (DP) formulation that verifies the existence of a set of episodes from a predefined set of episode candidates. We then demonstrate how to use DP to design an algorithm that solves a general inference problem. To evaluate our method, we perform computational experiments on empirical data containing whole genome duplication events in a network of Panadales species, showing that our algorithms can accurately verify genomic duplication hypotheses.

14:40-15:00
PhytClust: Accurate and Fast Clustering in Phylogenetic Trees
Format: In person


Authors List: Show

  • Katyayni Ganesan, ICCB Cologne, Germany
  • Elisa Billard, ICCB Cologne, Germany
  • Tom L Kaufmann, ICCB Cologne, Germany
  • Maja C Cwikla, ICCB Cologne, Germany
  • Cody B Duncan, ICCB Cologne, Germany
  • Adrian Altenhoff, Swiss Institute of Bioinformatics & ETH Zurich, Switzerland
  • Christophe Dessimoz, Swiss Institute of Bioinformatics & ETH Zurich, Switzerland
  • Roland F Schwarz, ICCB Cologne, Germany

Presentation Overview: Show

Phylogenetic trees serve an important role in disentangling the evolutionary relationships between taxa, across diverse fields. A key question is the identification of distinct subpopulations within a phylogenetic tree. Several methods have been developed to classify taxa in phylogenetic trees into clusters based on their evolutionary distance. However, these approaches tend to rely on arbitrary thresholds that vary across studies, making meaningful interpretation and comparison between results challenging. Additionally, they often rely on heuristics to limit their cluster search, as enumerating all possible clusters within a tree is prohibitive for large trees. Here, we present PhytClust, a novel tool that provides a standardized approach to clustering taxa in phylogenetic trees into monophyletic clusters, bypassing the use of subjective parameters. PhytClust uses a score derived from the cumulative intra-cluster branch lengths to (i) find the optimal set of clusters for a given number of clusters, and (ii) from these candidate cluster sets, selects the one that optimally represents the tree’s topology and genetic distances. PhytClust provides an exact and efficient solution to the clustering problem based on dynamic programming, making it suitable for large trees with up to 100k taxa. Compared to existing methods, PhytClust is faster and more accurate in recovering ground truth clusters. We apply PhytClust to data spanning various biological domains, including cancer genomics, avian phylogenomics, bacterial and archaea phylogenetics, and plant genomics. By providing a standardized method for node clustering, PhytClust can help infer optimal clusters for a phylogenetic tree without any additional parameters.

15:00-15:20
Community detection at unprecedented scales with ExoLabel
Format: In person


Authors List: Show

  • Aidan Lakshman, University of Pittsburgh, United States
  • Erik Wright, University of Pittsburgh, United States

Presentation Overview: Show

Many approaches in comparative genomics rely on clusters of orthologous genes (COGs). Methods for constructing COGs often employ community detection algorithms to identify clusters within a network of pairwise similarities among genes. As the number of available genome sequences continues to grow exponentially, this community detection step has proven to be the limiting factor for scaling COGs to more genomes – both in terms of memory and time required. In this study, we developed ExoLabel, a community detection program that can scale to enormous graphs by applying a linear-time algorithm to data outside of memory (i.e., on disk). We show that ExoLabel's accuracy rivals popular programs for identify COGs but is orders of magnitude faster and more memory efficient that existing programs. We demonstrate ExoLabel's performance by clustering a graph with 16.2 million nodes (genes) and 18.3 billion edges (pairwise similarities) in less than a day using only a few gigabytes of RAM. ExoLabel democratizes comparative genomics in settings without access to supercomputers and scales COG detection to new heights.

15:20-15:40
Accurate multiple sequence alignment at protein-universe scale with FAMSA 2
Format: In person


Authors List: Show

  • Adam Gudys, Silesian University of Technology, Poland
  • Andrzej Zielezinski, Adam Mickiewicz University, Poland
  • Cedric Notredame, Centre for Genomic Regulation, Spain
  • Sebastian Deorowicz, Silesian University of Technology, Poland

Presentation Overview: Show

Multiple sequence alignment (MSA) is a crucial analysis in computational biology applied, e.g., in phylogeny reconstruction or protein function prediction. Within few years, the large-scale sequencing efforts such as the Earth BioGenome Project will produce billions of sequences representing full diversity of the protein universe. However, the state-of-art MSA algorithms do not keep pace with the exponentially increasing size of sequence repositories.

To address this issue, we present FAMSA 2. Compared to its predecessor, it offers higher accuracy and speed, enhanced robustness to non-homologous sequence contamination, and a number of usability features like alignment trimming. The algorithmic improvements include a novel guide tree heuristic based on medoid clustering particularly suited for ultra-scale analyses.

The performance of FAMSA 2 has been evaluated on several data sets. They included Pfam families enriched with Homstrad reference alignments, AliSim-simulated alignments, and AlphaFold clusters. The experiments confirmed the presented algorithm to match or exceed the accuracy of Muscle5, ClustalOmega, or T-Coffee's regressive method, being orders of magnitude faster. FAMSA 2 was the only algorithm to align a set of over 12 million sequences. This was done in under 25 minutes and 64 GB of RAM on a mid-range workstation.

The first version of FAMSA with almost 60 000 downloads and applications in milestone projects like AlphaFold and Pfam confirmed its usability to the community. We believe that FAMSA 2, by enabling evolutionary and structural analyses at scale beyond reach of the competing tools, would gain an even larger impact in the field.

15:40-16:00
Learning the Language of Phylogeny with MSA Transformer
Confirmed Presenter: Ruyi Chen, University of Queensland, Australia

Format: In person


Authors List: Show

  • Ruyi Chen, University of Queensland, Australia
  • Gabriel Foley, University of Queensland, Australia
  • Mikael Boden, University of Queensland, Australia

Presentation Overview: Show

Classical phylogenetic inference assumes independence between sites, potentially undermining the accuracy of evolutionary analyses in the presence of epistasis. Some protein language models have the capacity to encode dependencies between sites in conserved structural and functional domains across the protein universe. We employ the MSA Transformer, which takes a multiple sequence alignment (MSA) as an input, and is trained with masked language modeling objectives, to investigate if and how effects of epistasis can be captured to enhance the analysis of phylogenetic relationships.

We test whether the MSA Transformer internally encodes evolutionary distances between the sequences in the MSA despite this information not being explicitly available during training. We investigate the model's reliance on information available in columns as opposed to rows in the MSA, by systematically shuffling sequence content. We then use MSA Transformer on both natural and simulated MSAs to reconstruct entire phylogenetic trees with implied ancestral branchpoints, and assess their consistency with trees from maximum likelihood inference.

We demonstrate how both previously known and novel evolutionary relationships are available from a ''non-classical'' approach with very different computational requirements, by reconstructing phylogenetic trees for the RNA virus RNA-dependent RNA polymerase and the nucleo-cytoplasmic large DNA virus domain. We anticipate that MSA Transformer will not replace but rather complement classical phylogenetic inference, to accurately recover the evolutionary history of protein families.

16:40-17:00
EdgeHOG: Scalable and Fine-Grained Ancestral Gene Order Inference Across the Tree of Life
Format: In person


Authors List: Show

  • Charles Bernard, Institut Pasteur, France
  • Yannis Nevers, University of Strasbourg, France
  • Alex Warwick Vesztrocy, University of Lausanne, Switzerland
  • Natasha Glover, University of Lausanne, Switzerland
  • Adrian Altenhoff, ETH Zurich, Switzerland
  • Christophe Dessimoz, University of Lausanne, Switzerland

Presentation Overview: Show

Ancestral genomes are essential for studying the diversification of life from the last universal common ancestor to modern organisms. Methods have been proposed to infer ancestral gene order, but they lack scalability, limiting the depth to which gene neighborhood evolution can be traced back. We introduce edgeHOG, a tool designed for accurate ancestral gene order inference with linear time complexity. Validated on various benchmarks, edgeHOG was applied to the entire OMA orthology database, encompassing 2,845 extant genomes across all domains of life. This represents the first tree-of-life scale inference, resulting in 1,133 ancestral genomes. In particular, we reconstructed ancestral contigs for the last common ancestor of eukaryotes, dating back around 1.8 billion years, and observed significant functional association among neighboring genes. The method also dates gene adjacencies, revealing conserved histone clusters and rapid chromosome rearrangements, enabling computational inference of these features.

17:00-17:20
JIGSAW: Accurate inference of exact copy numbers from targeted single-cell DNA sequencing
Format: In person


Authors List: Show

  • Sophia Chirrane, University College London Cancer Institute, Cancer Research UK Lung Cancer Centre of Excellence, United Kingdom
  • Simone Zaccaria, University College London Cancer Institute, Cancer Research UK Lung Cancer Centre of Excellence, United Kingdom

Presentation Overview: Show

Tumorigenesis is driven by the interplay between somatic single-nucleotide variants (SNVs) and larger structural alterations, like copy number alterations (CNAs), that are simultaneously accumulated in cancer cell genomes. This process results in highly heterogeneous tumours composed of distinct subpopulations of cells, or tumour clones, with different SNV and CNA combinations driving cancer progression and the development of treatment resistance.

Recent targeted single-cell sequencing technologies (tag-scDNA-seq, e.g Mission Bio Tapestri platform) provide ideal data to study this interplay because the deep, unbiased sequencing coverage of a targeted gene panel allows the assessment of both SNVs and CNAs in each cell. However, while analyzing SNVs is relatively straightforward, no method currently exists to accurately infer CNAs from tag-scDNA-seq due to the extremely high level of random variance caused by the very low number of reads sequenced from only a minimal fraction of the genome.

Here, we introduce JIGSAW (Joint Inference by Grouping Single-cell-clones of Amplicon-copy-numbers without Whole-genome), the first algorithm to infer CNAs from tag-scDNA-seq data. JIGSAW overcomes tag-scDNA-seq sparsity challenges by jointly grouping amplicons that share the same CNA state and clustering cells into clones using a Bayesian framework. Through extensive realistic simulations, we demonstrated that JIGSAW not only can accurately retrieve CNAs but is also robust to increasing levels of CNAs heterogeneity, cell-specific noise, and both clonal and subclonal whole-genome doublings. Applied to 2,153 pancreatic ductal adenocarcinoma cells and 12,000 AML cells, JIGSAW uncovered novel CNAs affecting cancer driver genes in conjunction with SNVs.

17:20-17:40
EASYstrata: a fully integrated workflow to infer evolutionary strata along sex chromosomes and other supergenes
Confirmed Presenter: Ricardo C. Rodriguez de la Vega, GEE - ESE, CNRS, Universite Paris Saclay, AgroParisTech, France

Format: In person


Authors List: Show

  • Quentin Rougemont, GEE - Écologie Societé Évolution, CNRS, Université Paris Saclay, AgroParisTech, France
  • Elise Lucotte, GEE - Écologie Societé Évolution, CNRS, Université Paris Saclay, AgroParisTech, France
  • Loreleï Boyer, GEE - Écologie Societé Évolution, CNRS, Université Paris Saclay, AgroParisTech, France
  • Alexandra Jalaber de Dinechin, AGIPP - Institut Jean-Pierre Bourgin, INRAe, AgroParisTech, Université Paris Saclay, France
  • Alodie Snirc, GEE - Écologie Societé Évolution, CNRS, Université Paris Saclay, AgroParisTech, France
  • Tatiana Giraud, GEE - ESE, CNRS. Universite Paris Saclay, AgroParisTech, France
  • Ricardo C. Rodriguez de la Vega, GEE - ESE, CNRS, Universite Paris Saclay, AgroParisTech, France

Presentation Overview: Show

New reference-level genomes are becoming increasingly available across the tree of life, opening new avenues for addressing exciting evolutionary questions. However, challenges remain in genome annotation, sequence alignment, evolutionary inference and a general lack of methodological standardization. Here, we present a new workflow designed to overcome these challenges in evolutionary analyses, facilitating the detection of recombination suppression and its consequences, such as structural rearrangements, transposable element accumulation and coding sequence degeneration. To achieve this, we integrate multiple bioinformatic steps into a single, reproducible and user-friendly pipeline. This workflow combines state-of-the-art tools to efficiently detect transposable elements, annotate newly assembled genomes, infer gene orthology, compute sequence divergence, as well as objectively identify stepwise extensions of recombination suppression (i.e., evolutionary strata) and their associated structural changes, while visualizing results throughout the process. We demonstrate how this Evolutionary analysis with Ancestral SYnteny for strata identification (EASYstrata) workflow was used to re-annotate 42 published Microbotryum genomes and a pair of giant plant sex chromosomes. We recovered all previously described strata and identified several that had gone unnoticed. While primarily developed to infer divergence between sex or mating-type chromosomes, EASYstrata can also be applied to any pair of haplotypes with diverging regions of interest, such as autosomal supergenes. This workflow will facilitate the study of the many non-model species for which newly sequenced, phased diploid genomes are now becoming available.
EASYstrata and detailed use cases can be found at https://github.com/QuentinRougemont/EASYstrata
Preprint: https://www.biorxiv.org/content/10.1101/2025.01.06.631483v1.full

17:40-18:00
The Phylogenetic Dynamic Regulatory Module Networks (P-DRMN) study infers Cis-regulatory features responsible for evolution of mammalian gene regulatory programs in aortic endothelium
Confirmed Presenter: Suvojit Hazra, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA, United States

Format: In person


Authors List: Show

  • Suvojit Hazra, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA, United States
  • Sara A Knaack, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA, United States
  • Erika Da-Inn Lee, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA, United States
  • Liangxi Wang, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Toronto, ON, Canada, Canada
  • Mohamed Hawash, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Toronto, ON, Canada, Canada
  • Huayun Hou, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Toronto, ON, Canada, Canada
  • Michael Wilson, Genetics and Genome Biology section, Sickkids hospital, University of Toronto, Toronto, ON, Canada, Canada
  • Sushmita Roy, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA, Canada

Presentation Overview: Show

Cis-regulatory elements (CREs), such as promoters and enhancers, interact with transcription factors (TFs) to drive gene regulatory programs and contribute to morphological diversity across species. Comparative regulatory genomics, which integrates omic measurements across species, offers a powerful framework for studying the evolution of gene regulation. While multi-omic profiling, combining transcriptomic and epigenomic data, has advanced, computational tools that are both phylogenetically aware and capable of analyzing high-dimensional, cross-species data remain scarce. To address this, we introduce Phylogenetic Dynamic Regulatory Module Networks (P-DRMN), a novel multi-task regression-based algorithm that models dynamic gene module regulatory networks using RNA-seq, ATAC-seq, and ChIP-seq data while incorporating phylogenetic relationships. P-DRMN clusters genes into similarly expressing, discrete gene modules based on expression levels and uses a regression function of upstream CREs to infer species-specific module-TF regulatory programs. We applied P-DRMN to aortic endothelial cell data, including gene expression, promoter/motif accessibility, and five histone modifications (H3K27ac, H3K36me3, H3K4me3, H3K4me2, H3K27me3), from five mammals (human, rat, cow, pig, and dog). P-DRMN inferred 19-65% conservation of gene modules across species, with high- and low-expression modules being the most conserved and diverged, respectively. We identified 103 transitioning gene sets with species- or clade-specific expression patterns, many regulated by distinct TFs and chromatin marks, for example, CTCF in human-specific high-expression modules, and SHOX2, H3K27me3, and H3K4me3 in pig/cow-specific high-expression modules. These results demonstrate how CREs and chromatin states shape species-specific gene expression. Overall, P-DRMN provides a powerful framework for integrating multi-omics data to study the evolutionary dynamics of gene regulation.

Tuesday, July 22nd
11:20-11:30
Introduction
Format: In person


Authors List: Show

Presentation Overview: Show

Introduction to the joint session Function and EvolCompGen

11:30-12:10
Invited Presentation: Evolution of function in light of gene expression
Format: In person


Authors List: Show

  • Marc Robinson Rechavi

Presentation Overview: Show

One of the fundamental questions of genome evolution is how gene function changes or is constrained, whether between species (orthologs) or inside gene families (paralogs).
While computational prediction is making major progress on function in a broad sense, most evolutionary changes concern details that are small in the big picture, yet very
significant for organismal function. For example, new organs or new physiological adaptations often come from repurposing genes whose basic molecular function is conserved while taking a novel role. Gene expression provides a unique window into
such fine details of gene function. I will present how gene expression of diverse species, bulk and single-cell, is integrated into Bgee; how gene expression can be used to test hypotheses of functional change after duplication (the "ortholog conjecture"); and how the evolution of gene expression provides insight into evolvability and the molecular underpinning of new functions.

12:10-12:20
Convergent evolution to similar proteins confounds structure search
Format: In person


Authors List: Show

  • Erik Wright, University of Pittsburgh, United States

Presentation Overview: Show

Advances in protein structure prediction and structural search tools (e.g., FoldSeek and PLMSearch) have enabled large-scale comparison of protein structures. It is now possible to quickly identify structurally similar proteins ("structurlogs"), but it remains unclear whether these similarities reflect homology (common ancestry) or analogy (convergent evolution). In this study, we found that ~2.6% of FoldSeek clusters lack sequence-level support for homology, including about 1% of matches with high TM-score (>= 0.5). The lack of sequence homology could be due to extreme protein divergence or independent evolution to a similar structure. Here, we show that tandem repeats provide strong evidence for the presence of analogous protein structures. Our results suggest analogs infiltrate structure search results and care should be taken when relying on structural similarity alone if homology is desired. This problem may extend beyond repeat proteins to other low complexity folds, and structure search tools could be improved by masking these regions in the same manner as done by sequence search programs.

12:20-12:30
Evolution of the Metazoan Protein Domain Toolkit Revealed by a Birth-Death-Gain Model
Format: In person


Authors List: Show

  • Maureen Stolzer, Carnegie Mellon University, United States
  • Yuting Xiao, Carnegie Mellon University, United States
  • Dannie Durand, Carnegie Mellon University, United States

Presentation Overview: Show

Domains, sequence fragments that encode protein modules with a distinct structure and function, are the basic building blocks of proteins. The set of domains encoded in the genome serves as the functional toolkit of the species. Here, we use a phylogenetic birth-death-gain model to investigate the evolution of this protein toolkit in metazoa. Given a species tree and the set of protein domain families in each present-day species, this approach estimates the most likely rates of domain origination, duplication, and loss.

Statistical hierarchical clustering of domain family rates reveals sets of domains with similar rate profiles, consistent with groups of domains evolving in concert. Moreover, we find that domains with similar functions tend to have similar rate profiles. Interestingly, domains with functions associated with metazoan innovations, including immune response, cell adhesion, tissue repair, and signal transduction, tend to have the fastest rates. We further infer the expected ancestral domain content and the history of domain family gains, losses, expansions, and contractions on each branch of the species tree. Comparative analysis of these events reveals that a small number of evolutionary strategies, corresponding to toolkit expansion, turnover, specialization, and streamlining, are sufficient to describe the evolution of the metazoan protein domain complement. Thus, the use of a powerful, probabilistic birth-death-gain model reveals a striking harmony between the evolution of domain usage in metazoan proteins and organismal innovation.

12:30-12:40
Deep Phylogenetic Reconstruction Reveals Key Functional Drivers in the Evolution of B1/B2 Metallo-β-Lactamases
Format: In person


Authors List: Show

  • Samuel Davis, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Pallav Joshi, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Ulban Adhikary, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Julian Zaugg, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Phil Hugenholtz, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Marc Morris, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Gerhard Schenk, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • Mikael Boden, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia

Presentation Overview: Show

Metallo-β-lactamases (MBLs) comprise a diverse family of antibiotic-degrading enzymes. Despite their growing implication in drug-resistant pathogens, no broadly effective clinical inhibitors against MBLs currently exist. Notably, β-lactam-degrading MBLs appear to have emerged twice from within the broader, catalytically diverse MBL-fold protein superfamily, giving rise to two distinct monophyletic groups: B1/B2 and B3 MBLs.

Comparative analyses have highlighted distinct structural hallmarks of these subgroups, particularly in metal-coordinating residues. However, the precise evolutionary events underlying their emergence remain unclear due to challenges presented by extensive sequence divergence. Understanding the molecular determinants driving the evolution of β-lactamase activity may inform design of broadly effective inhibitors.

We sought to infer the evolutionary features driving the emergence of B1/B2 MBLs via phylogenetics and ancestral reconstruction. To overcome challenges associated with evolutionary analysis at this scale, we developed a phylogenetically aware sequence curation framework centred on iterative profile HMM refinement. This framework was applied over several iterations to construct a comprehensive phylogeny encompassing the B1/B2 MBLs and several other recently diverged clades. The resulting tree represents the most robust hypothesis to date regarding the emergence of B1/B2 MBLs and implies a parsimonious evolutionary history of key features, including variation in active site architecture and insertions and deletions of distinct structural elements.

Ancestral proteins inferred at key internal nodes were experimentally characterised, revealing distinct activity profiles that reflect underlying evolutionary transitions. These findings give rise to testable hypotheses regarding the molecular basis and evolutionary drivers of functional diversification, as well as potential targets for MBL inhibitor design.

12:40-12:50
A compendium of human gene functions derived from evolutionary modeling
Format: In person


Authors List: Show

  • Marc Feuermann, SIB Swiss Institute for Bioinformatics, Switzerland
  • Huaiyu Mi, University of Southern California, United States
  • Pascale Gaudet, Swiss Institute of Bioinformatics, Switzerland
  • Anushya Muruganujan, University of Southern California, United States
  • Suzanna E. Lewis, Lawrence Berkeley National Lab, United States
  • Dustin Ebert, University of Southern California, United States
  • Tremayne Mushayahama, University of Southern California, United States
  • Gene Ontology Consortium, Various, United States
  • Paul D. Thomas, University of Southern California, United States

Presentation Overview: Show

A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. We have recently published a paper (Feuermann et al., Nature 640:146, 2025) describing our initial release of a human gene “functionome,” a comprehensive set of human gene function descriptions using Gene Ontology (GO) terms, supported by experimental evidence. This work involved integration of all applicable experimental Gene Ontology (GO) annotations for human genes and their homologs, using a formal, explicit evolutionary modeling framework. We will review this work and its major findings, and describe subsequent progress on an updated version.

In more detail, we will describe the results of a large, international effort to integrate experimental findings from more than 100,000 publications to create a representation of human gene functions that is as complete and accurate as possible. Specifically, we applied an expert-curated, explicit evolutionary modeling approach to all human protein-coding genes, which integrates available experimental information across families of related genes into models reconstructing the gain and loss of functional characteristics over evolutionary time. The resulting set of integrated functions covers ~82% of human protein-coding genes, and the evolutionary models provide insights into the evolutionary origins of human gene functions. We show that our set of function descriptions can improve the widely used genomic technique of GO enrichment analysis. The experimental evidence for each functional characteristic is recorded, enabling the scientific community to help review and improve the resource, available at https://functionome.geneontology.org.

12:50-1:00
pLM in functional annotation: relationship between sequence conservation and embedding similarity
Format: In person


Authors List: Show

  • Ana Rojas, CABD, Spain
  • Ildefonso Cases, CABD-CSIC, Spain
  • Rosa Fernandez, 3Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain., Spain
  • Gemma Martínez-Redondo, 3Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain., Spain
  • Francisco M. Perez-Canales, CABD-CSIC, Spain

Presentation Overview: Show

Functional annotation of protein sequences remains a bottleneck for understanding the biology of both model and non model organisms, as conventional homology based tools often fail to assign functions to the majority of newly sequenced genes. We first benchmarked each pLM on well‐characterized model organisms, demonstrating superior recovery of functional signals from transcriptomic datasets compared to traditional methods. We then applied our pipeline to annotate ~1,000 animal proteomes, encompassing 23 million genes, and discovered candidate genes involved in gill regeneration in a non model insect. To elucidate how pLM embeddings relate to primary‐sequence conservation, we computed cosine distances between embeddings and aligned sequences to derive percent identity. Statistical analyses—including Pearson correlation, polynomial regression, and quantile regression—revealed complex, non linear relationships between embedding similarity and sequence identity that vary markedly across models. These findings indicate that pLM embeddings capture orthogonal functional features beyond simple residue conservation. Altogether, our work highlights the power of pLM based annotation for expanding functional insights in biodiversity projects and underscores the need to interpret embedding distances in light of each model’s unique representational biases.

14:00-14:20
Disentangling SARS-CoV-2 Lineage Importations and the Role of NPIs Using Bayesian Phylogeography of 1.8 Million Genomes
Format: In person


Authors List: Show

  • Sama Goliaei, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
  • Mohammad-Hadi Foroughmand-Araabi, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
  • Aideen Roddy, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
  • Ariane Weber, Transmission, Infection, Diversification and Evolution Group, Max-Planck Institute of Geoanthropology, Jena, Germany
  • Sanni Översti, Transmission, Infection, Diversification and Evolution Group, Max-Planck Institute of Geoanthropology, Jena, Germany
  • Denise Kühnert, Transmission, Infection, Diversification and Evolution Group, Max-Planck Institute of Geoanthropology, Jena, Germany
  • Alice McHardy, Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany

Presentation Overview: Show

Nonpharmaceutical interventions (NPIs) were key to limiting SARS-CoV-2 transmission before vaccines, though their effectiveness—especially regarding mask use and socioeconomic trade-offs—remains under discussion. Leveraging a Bayesian phylogeographic framework, we analyzed 1.8 million globally sampled SARS-CoV-2 genomes to quantify lineage importations into Germany during the third pandemic wave (late 2020–early 2021). Across three sampling strategies, we observed a consistent decline in importations following key NPIs, notably the provision of free rapid antigen tests and mandates for surgical/FFP2 mask usage. While mask efficacy has been debated, our data show that upgrading from cloth to medical-grade masks coincided with sharp reductions in importation frequency—particularly in densely populated states.

We introduce a novel metric, the Smoothed Importation Frequency (SIF), and a daily effectiveness measure that allows more precise, real-time assessment of NPIs by smoothing fluctuations in importation data, thus overcoming limitations of previous methods that lacked temporal resolution and clarity. Our findings reveal that major lineage importations clustered around the Christmas holiday period, and spread disproportionately from populous states, identifying these as critical nodes in national transmission dynamics.

These results demonstrate the importance of integrating phylogenetic data with real-world intervention timelines to decode the drivers of pathogen spread. Beyond confirming the effectiveness of masks and rapid testing, our study highlights the notable impact of restricting gatherings and movements, supporting a data-driven, targeted approach to pandemic response. We suggest, scalable, low-cost measures like rapid testing and surgical-grade masking may be especially useful early in outbreaks, particularly where elimination is unlikely and strict measures impractical.

14:20-14:40
SARS-CoV-2 Intra-Host Evolution in Immuno-Compromised Individuals: A Fractal Perspective on Genome Geometry
Format: In person


Authors List: Show

  • Nicole A. Rogowski, Leiden University Medical Center, Netherlands
  • Kees Mourik, Leiden University Medical Center, Netherlands
  • Nithya Kuttiyarthu Veetil, Leiden University Medical Center, Netherlands
  • Stefan A. Boers, Leiden University Medical Center, Netherlands
  • Anna H.E. Roukens, Leiden University Medical Center, Netherlands
  • Simon P. Jochems, Leiden University Medical Center, Netherlands
  • Louis A.C.M. Kroes, Leiden University Medical Center, Netherlands
  • Igor A. Sidorov, Leiden University Medical Center, Netherlands
  • Jelle J. Goeman, Leiden University Medical Center, Netherlands
  • Jutte J.C. de Vries, Leiden University Medical Center, Netherlands

Presentation Overview: Show

Studies have associated the punctuated evolution of SARS-CoV-2 variants with prolonged infections and subsequent transmission. We describe the genetic signatures of SARS-CoV-2 intra-host evolution in 10 immuno-compromised (IC) patients and 5 competent controls, in 55 longitudinal samples. We included two types of IC: induced (immune suppressants) and innate (haematological disease). The mutational profile was analysed between IC types, over time, and in response to treatment (host directed and antiviral).
However, almost all studies on viral evolution consider only a ‘consensus’ sequence for a virus (mutations >50% frequency represented as ambiguous nucleotides) – ignoring the diverse viral pool (quasi-species) arising from replication errors. Including the full viral quasi-species profile is essential to understanding how resistance mutations arise. When making phylogenetic trees for all variants, the high levels of ambiguous positions caused failure. Here we report a novel approach based on chaos game which can leverage viral quasi-species and produce phylogenetic trees regardless of ambiguity.
Graphical representations were generated using Chaos Game Representation (CGR), which draws a “walk” to encode genetic information. Each walk has a set of independent mutations, and by compiling thousands of walks for each sample, covering most combinations of mutations, Frequency CGR (FCGR) objects were created. Due to the collection of walks, positional ambiguity and complex mutations can be easily incorporated in phylogeny (using topology-based methods), and results in closer relationships between patient samples. The same topology-based methods produced a 3D visualization of the genome space, similar to an antigen map, highlighting distinct signatures visible in IC patients.

14:40-15:00
Antarctica as a Viral Reservoir: Insights from Comparative Genomics and Metagenomics
Confirmed Presenter: Caroline Martiniuc, UFRJ, Brazil

Format: In person


Authors List: Show

  • Caroline Martiniuc, UFRJ, Brazil
  • Igor Taveira, UFRJ, Brazil
  • Fernanda Abreu, UFRJ, Brazil
  • Anderson Cabral, UFRJ, Brazil
  • Rodolfo Paranhos, UFRJ, Brazil
  • Deborah Leite, UTFPR, Brazil
  • Lucy Seldin, UFRJ, Brazil
  • Diogo Jurelevicius, UFRJ, Brazil

Presentation Overview: Show

Two bioinformatics approaches stand out in the study of viromes in extreme environments: prophage comparative genomics and viral metagenomic analyses. The bacteria Rummeliibacillus stabekisii emerges as an interesting model for investigating extremophilic prophages, as it has been isolated from spacecraft surfaces and Antarctic soils, raising questions about the role of prophages in its environmental resilience. Additionally, Antarctica faces hydrocarbon contamination, making these regions even more hostile. Understanding ecological and metabolic interactions in this context can help elucidate microbial relationships in such environments. For the comparative genomics study, genomes of R. stabekisii from spacecraft surfaces and Antarctic soil were analyzed. PHASTER was used to identify prophages within the genomes, followed by annotation with BLAST. Furthermore, metagenomic analyses were performed on five hydrocarbon-contaminated Antarctic soil samples. The samples were sequenced using Illumina and assembled with MEGAHIT. Viral contigs were identified using VirSorter, and taxonomy was classified with the PhaGCN. Viral hosts were assigned based on data from the International Committee on Taxonomy of Viruses (ICTV) and the CHERRY software. Comparative genomic analysis revealed that Antarctic R. stabekisii harbored the highest number of intact prophages, with genes suggesting adaptive advantages and regions acting as hotspots for recombination. In contaminated soils, the class Caudoviricetes exhibited the highest abundance. Most detected viral hosts belonged to hydrocarbon-degrading bacterial genera within the phyla Pseudomonadota and Actinomycetota. Additionally, auxiliary viral metabolic genes associated with nitrogen and phosphorus cycles were identified. Both results reinforce the relevance of viruses as agents of genetic and ecological modulation in Antarctica.

15:00-15:10
Computational Genomics and Biosynthetic Potential Analysis of a Dead Sea Penicillium sp.
Format: In person


Authors List: Show

  • Dylan Dsouza, Bar-Ilan University, Israel
  • Milana Frenkel-Morgenstern, Reichman University, Israel

Presentation Overview: Show

Extreme environments harbor unique microbial life with biotechnological potential. Here, we characterize a novel Penicillium sp. isolated from the hypersaline Dead Sea, capable of thriving at 70‰ salt concentration. Whole-genome and transcriptome sequencing were performed, followed by de novo assembly and quality assessment using QUAST and BUSCO. Functional annotation of predicted peptides was conducted using InterProScan, UPIMAPI, and Blast+ with NCBI-RefSeq and UniProtKB, validating spectral data from LC-MS/MS (nanoAcquity coupled with Q Exactive HFX) analyzed via Proteome Discoverer v2.4, SequestHT, and MS Amanda 2.0. Key enzymes in penicillin biosynthesis were confirmed.

Biosynthetic potential was assessed using AntiSMASH and dbCAN3, with SignalP 6.0 machine learning predicting secretory proteins. Phylogenetic analysis of single-copy orthologs was performed using OrthoFinder. The genome revealed biosynthetic gene clusters for valuable bioactives, including mellein, lovastatin, sorbicillin, and roquefortine. Strong antimicrobial inhibition was observed in E. coli NEB+ STABL from extracts grown in a high-nitrogen medium with phenylacetate and 20% Dead Sea water.

At the transcript level, RFAM annotation identified THI4 and THI5 riboswitches, with secondary structures predicted via RNAfold and R2DT. Conservation analysis using LocARNA provided insights into regulatory mechanisms.

These findings highlight the computational-driven discovery of biosynthetic pathways and stress-adaptive mechanisms in Penicillium sp., demonstrating its potential for industrial applications in extreme environments.

15:10-15:20
Unravelling the pangenome of autotrophic bacteria: Metabolic commonalities, evolutionary relationships, and industrially relevant traits
Format: In person


Authors List: Show

  • Dr. Karan Kumar, Institute of Applied Microbiology , Aachen Biology and Biotechnology, RWTH Aachen University, Germany, Germany
  • Tobias B. Alter, Institute of Applied Microbiology , Aachen Biology and Biotechnology, RWTH Aachen University, Germany, Germany
  • Lars M. Blank, Institute of Applied Microbiology , Aachen Biology and Biotechnology, RWTH Aachen University, Germany, Germany

Presentation Overview: Show

Atmospheric CO₂ fixation by microbial autotrophs presents a sustainable alternative to energy-intensive chemical processes, offering significant potential for biotechnological applications. However, understanding the genetic diversity, evolutionary adaptations, and metabolic capabilities of autotrophic carbon-fixing lineages (ACL) requires a comparative genomic approach. This study employs pangenome analysis to systematically assess the core, accessory, and unique genetic components across diverse ACL bacteria, with a particular focus on the recently revised genus Xanthobacter and the newly proposed Roseixanthobacter. By integrating phylogenetic, functional, and metabolic insights, we aim to elucidate conserved and variable genetic traits that contribute to CO₂ fixation efficiency and industrial relevance. A total of 546 high-quality genomes spanning 121 ACL microbial species were selected for analysis, following rigorous genome quality control measures based on CheckM contamination thresholds, contig limits, and genome size variation criteria. Initial phylogenomic analyses identified 16 microbial genera closely related to Xanthobacter, including Ancylobacter, Azorhizobium, Cupriavidus, Hydrogenophaga, Moorella, and Synechococcus, among others. Genomes were uniformly re-annotated to ensure consistency in gene identification. Pangenome reconstruction, core-genome diversity assessments, orthologous group clustering, and essential metabolic pathway mapping were performed to identify key functional traits enabling inorganic carbon assimilation, H₂ utilization, and N₂ fixation. Among these traits are RuBisCO for CO₂ fixation, hydrogenases for H₂ metabolism, and nitrogenase complexes for converting atmospheric N₂ into bioavailable forms. The findings of this study would contribute to metabolic engineering efforts, facilitating the development of optimized microbial strains for sustainable biotechnology applications such as alternative protein production, biofuel production, carbon sequestration, and synthetic biology efforts.

15:20-15:30
Spatiotemporal patterns in the human gut dysbiosis contrasted to healthy families
Format: In person


Authors List: Show

  • Falk Hildebrand, Quadram Institute Bioscience, United Kingdom
  • Katarzyna Sidorczuk, Quadram Institute Bioscience, United Kingdom
  • Rebecca Ansorge, Quadram Institute Bioscience, Germany

Presentation Overview: Show

The gut microbiome is essential to the wellbeing and health of its human host, yet most studies to date resolve the gut microbial community only at genus or species level. Yet we do know that two bacterial strains of the same species can differ by more than half their genome and that pathogenicity is encoded at the strain - not species - level. Therefore, my group develops the technologies to track bacterial strain in metagenomic time series, and to investigate evolutionary pressures.

Our studies have uncovered the extreme persistence of bacterial strains in individual human hosts (doi: 10.1016/j.chom.2021.05.008). Using strain tracking in newly established cohorts, we can uncover the colonization path microbes take when colonizing multiple family members, creating a “family-specific microbiome”. Selection pressure exerted on microbial genes can indicate microbial functions important for microbiome-human symbiosis. Yet also in disease we can find significant shifts in microbial strains: Using a meta-analysis of >5,000 metagenomes, I will show typical strain enrichments associated with IBD and their temporal patterns during episodes of inflammatory flares.

These research lines demonstrate the importance to increase both taxonomic and genome resolution in microbiome studies to uncover the microbial patterns prevalent in disease and health.

15:30-15:40
Marker discovery in the large
Confirmed Presenter: Beatriz Vieira Mourato, Max Planck Institute for Evolutionary Biology, Germany

Format: In person


Authors List: Show

  • Beatriz Vieira Mourato, Max Planck Institute for Evolutionary Biology, Germany
  • Ivan Tsers, Max Planck Institute for Evolutionary Biology, Germany
  • Svenja Denker, Max Planck Institute for Evolutionary Biology; Lübeck University, Germany
  • Fabian Klötzl, none, United Kingdom
  • Bernhard Haubold, Max-Planck-Institute for Evolutionary Biology, Germany

Presentation Overview: Show

Pathogen outbreaks are now routinely tracked by whole genome
sequencing. This leads to ever-increasing opportunities for marker
discovery beyond the traditional candidate gene approach. Ideal
genetic markers are present in all target organisms and nowhere
else. Such markers have maximal sensitivity and
specificity. Evolutionary biology implies that the vast majority of
potentially non-specific sequences are present in the closest distinct
relatives of the targets, their neighbors. We have implemented
this insight in our software for finding unique genomic regions,
Fur. Fur takes as input a set of target and neighbor genomes and
returns the regions present in all targets that are absent from all
neighbors. The resulting list of regions is highly enriched for
diagnostic markers. Fur is based on suffix array algorithms, making it
fast. However, its original version required memory proportional to
the size of the neighborhood. Here we present the new Fur, which
requires memory proportional to the longest neighbor sequence. This
allows marker discovery from whole genome sequences on consumer-grade
hardware. For example, the analysis of 178 target and 1,074 neighbor
genomes of Streptococcus pneumoniae took 9m 16s and used
11.6GB RAM. We applied Fur to 120 diverse bacterial taxa and tested
the marker candidates by comparison to nt. We found that the
marker candidates had excellent in silico sensitivity and
specificity making them ideal starting material for developing
diagnostic genetic markers in vitro.

15:40-15:50
Whole-genome detection and origin identification of orphan genes in plant-parasitic nematodes
Format: In person


Authors List: Show

  • Ercan Seçkin, Institut Sophia Agrobiotech, INRAE, France
  • Etienne Danchin, Institut Sophia Agrobiotech, INRAE, France
  • Dominique Colinet, Institut Sophia Agrobiotech, INRAE, France
  • Edoardo Sarti, Inria d'Université Côte d'Azur, France

Presentation Overview: Show

Genes with no known homologs constitute 5% to 30% of every organism’s genome. These orphan genes have either rapidly diverged from a family or have appeared de novo from a previously non-coding region. Their detection, origin identification, and structural characterization are challenging, and evidences about the nature of de novo genes seem to be strongly species-dependent. In root-knot nematodes (Meloidogyne), orphan genes have been linked to parasitic functions, and are thus of great agronomical interest. Starting from recently sequenced whole genomes of eight species of Meloidogyne, we use comparative homology, transcriptomics and proteomics for robust detection of orphan genes. Then, we rely on ancestral sequence reconstruction strategies and synteny approaches for identifying their origin. We find that 19% of all orphan genes are most likely to be de novo, and 30% divergent. Taking an equilibrated subset, we perform protein structure prediction with AlphaFold2, ESMFold and OmegaFold, and find that all three protein language models produce low-confidence predictions. This result does not seem caused by an increased intrinsic disorder in orphan proteins (that we calculated with AIUPred and flDPnn), rather by the low similarity between the query orphan sequences and the training sets of the structure predictors. The dataset is thus a challenging, homology-free benchmark for structure, disorder, and emergence prediction.

15:50-16:00
Construction and Analysis of the Moniliophthora roreri pangenome
Format: In person


Authors List: Show

  • Isabella Gallego, Center for Nuclear Energy in Agriculture, University of São Paulo, Brazil
  • Diego Mauricio Riaño-Pachón, Center for Nuclear Energy in Agriculture, University of São Paulo, Brazil

Presentation Overview: Show

Moniliophthora roreri, the causal agent of frosty pod rot, is a devastating fungal pathogen affecting cacao production across Latin America. Its broad host range, ecological adaptability, and high pathogenicity underscore the need to understand its genomic diversity to inform disease management strategies. Here, we present a comprehensive pangenome analysis of 24 publicly available M. roreri genomes using two state-of-the-art graph-based methods: Minigraph-Cactus and PGGB.

Graph-based approaches allow us to integrate structural variation and genome-wide sequence diversity into a unified representation. The resulting pangenomes were used to classify genes into core, accessory, and strain-specific categories, revealing genomic features likely associated with adaptation and pathogenicity. Functional annotation was performed with HMMER and PANNZER2, and enriched Gene Ontology terms were identified for each gene category using the topGO and REVIGO tools, offering insight into biological processes specific to different parts of the genome.

The study also includes a comparative analysis between our graph-based pangenomes and a previously constructed orthology-based version. This evaluation uses metrics such as genome completeness, representation of structural variants, core/accessory gene content, and computational performance. Our findings demonstrate the value of graph-based methods in capturing the genomic complexity of fungal pathogens and provide a foundation for future research into the molecular basis of virulence and host adaptation in M. roreri.

16:40-17:00
Proceedings Presentation: Recomb-Mix: fast and accurate local ancestry inference
Format: In person


Authors List: Show

  • Yuan Wei, University of Central Florida, United States
  • Degui Zhi, University of Texas Health Science Center at Houston, United States
  • Shaojie Zhang, University of Central Florida, United States

Presentation Overview: Show

Motivation: The availability of large genotyped cohorts brings new opportunities for revealing the high-resolution genetic structure of admixed populations via local ancestry inference (LAI), the process of identifying the ancestry of each segment of an individual haplotype. Though current methods achieve high accuracy in standard cases, LAI is still challenging when reference populations are more similar (e.g., intra-continental), when the number of reference populations is too numerous, or when the admixture events are deep in time, all of which are increasingly unavoidable in large biobanks.

Results: In this work, we present a new LAI method, Recomb-Mix. Recomb-Mix integrates the elements of existing methods of the site-based Li and Stephens model and introduces a new graph collapsing trick to simplify counting paths with the same ancestry label readout. Through comprehensive benchmarking on various simulated datasets, we show that Recomb-Mix is more accurate than existing methods in diverse sets of scenarios while being competitive in terms of resource efficiency. We expect that Recomb-Mix will be a useful method for advancing genetics studies of admixed populations.

Availability and Implementation: The implementation of Recomb-Mix is available at https://github.com/ucfcbb/Recomb-Mix.

17:00-17:20
WINDEX: A hierarchical integration of site- and window-based statistics for modeling the footprint of positive selection
Format: In person


Authors List: Show

  • Hannah Snell, Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA, United States
  • Scott McCallum, Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, PA, 15282, USA, United States
  • Dhruv Raghavan, Department of Computer Science, Brown University, Providence, RI, 02912, USA, United States
  • Ritambhara Singh, Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA, United States
  • Sohini Ramachandran, Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA, United States
  • Lauren Sudgen, Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, PA, 15282, USA, United States

Presentation Overview: Show

In genetics studies, scientists search for mutations that explain changes in phenotype or diversity in populations. Adaptive mutations, or mutations that increase in frequency by conferring a fitness benefit, leave behind statistical signals in genetic data that genome-wide scans for selection can reveal. Computational methods have improved the localization of adaptive mutations in genetic samples using machine learning techniques. However, these methods fail to account for the effect of linkage disequilibrium on localization and miss the opportunity to incorporate statistics at varying resolutions. Leveraging statistics in both individual sites and local genetic windows allow us to capture features of positive selection footprints due to changes in allele frequencies, haplotypes, or site-frequency spectra (SFS). Our proposed method, WINDEX, aims to combine these differing resolutions of statistics with a hierarchical hidden Markov model architecture to improve the prediction of positively selected loci among hitchhiking signals. WINDEX contains site- and window-dependent latent states corresponding to neutral, linked, and adaptive regions. This structure uses the information provided by both statistical resolutions to make classifications, capturing a broader range of signals left by a positive selective sweep. In artificial genetic sequences, WINDEX shows strong performance with 99% accuracy and is currently being tested on canonical sites of positive selection in the human genome. Overall, WINDEX provides the opportunity to incorporate the full range of existing selection statistics to improve localization and understand the footprint of positive selection.

17:20-17:30
Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model
Confirmed Presenter: Pavitra Selvakumar, The Insitute of Mathematical Sciences, India

Format: In person


Authors List: Show

  • Pavitra Selvakumar, The Insitute of Mathematical Sciences, India
  • Rahul Siddharthan, The Institute of Mathematical Sciences, India

Presentation Overview: Show

Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce 'position-specific stationary vectors' (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate 'conditional PSSVs' conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.

17:30-18:00
Panel: Concluding remarks
Format: In person


Authors List: Show