The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 14, 2025
July 15, 2025
July 20, 2025
July 21, 2025
July 22, 2025
July 23, 2025
July 24, 2025

Results

July 22, 2025
11:20-11:30
Introduction
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A

Authors List: Show

  • Iddo Friedberg

Presentation Overview:Show

Introduction to the joint session Function and EvolCompGen

July 22, 2025
11:30-12:10
Invited Presentation: Evolution of function in light of gene expression
Confirmed Presenter: Marc Robinson-Rechavi, University of Lausanne, Switzerland
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Marc Robinson-Rechavi, Marc Robinson-Rechavi, University of Lausanne

Presentation Overview:Show

One of the fundamental questions of genome evolution is how gene function changes or is constrained, whether between species (orthologs) or inside gene families (paralogs). While computational prediction is making major progress on function in a broad sense, most evolutionary changes concern details that are small in the big picture, yet very significant for organismal function. For example, new organs or new physiological adaptations often come from repurposing genes whose basic molecular function is conserved while taking a novel role. Gene expression provides a unique window into such fine details of gene function. I will present how gene expression of diverse species, bulk and single-cell, is integrated into Bgee; how gene expression can be used to test hypotheses of functional change after duplication (the

July 22, 2025
12:10-12:20
Convergent evolution to similar proteins confounds structure search
Confirmed Presenter: Erik Wright, University of Pittsburgh, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: Live stream

Authors List: Show

  • Erik Wright, Erik Wright, University of Pittsburgh

Presentation Overview:Show

Advances in protein structure prediction and structural search tools (e.g., FoldSeek and PLMSearch) have enabled large-scale comparison of protein structures. It is now possible to quickly identify structurally similar proteins ("structurlogs"), but it remains unclear whether these similarities reflect homology (common ancestry) or analogy (convergent evolution). In this study, we found that ~2.6% of FoldSeek clusters lack sequence-level support for homology, including about 1% of matches with high TM-score (>= 0.5). The lack of sequence homology could be due to extreme protein divergence or independent evolution to a similar structure. Here, we show that tandem repeats provide strong evidence for the presence of analogous protein structures. Our results suggest analogs infiltrate structure search results and care should be taken when relying on structural similarity alone if homology is desired. This problem may extend beyond repeat proteins to other low complexity folds, and structure search tools could be improved by masking these regions in the same manner as done by sequence search programs.

July 22, 2025
12:20-12:30
Evolution of the Metazoan Protein Domain Toolkit Revealed by a Birth-Death-Gain Model
Confirmed Presenter: Maureen Stolzer, Carnegie Mellon University, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Maureen Stolzer, Maureen Stolzer, Carnegie Mellon University
  • Yuting Xiao, Yuting Xiao, Carnegie Mellon University
  • Dannie Durand, Dannie Durand, Carnegie Mellon University

Presentation Overview:Show

Domains, sequence fragments that encode protein modules with a distinct structure and function, are the basic building blocks of proteins. The set of domains encoded in the genome serves as the functional toolkit of the species. Here, we use a phylogenetic birth-death-gain model to investigate the evolution of this protein toolkit in metazoa. Given a species tree and the set of protein domain families in each present-day species, this approach estimates the most likely rates of domain origination, duplication, and loss.

Statistical hierarchical clustering of domain family rates reveals sets of domains with similar rate profiles, consistent with groups of domains evolving in concert. Moreover, we find that domains with similar functions tend to have similar rate profiles. Interestingly, domains with functions associated with metazoan innovations, including immune response, cell adhesion, tissue repair, and signal transduction, tend to have the fastest rates. We further infer the expected ancestral domain content and the history of domain family gains, losses, expansions, and contractions on each branch of the species tree. Comparative analysis of these events reveals that a small number of evolutionary strategies, corresponding to toolkit expansion, turnover, specialization, and streamlining, are sufficient to describe the evolution of the metazoan protein domain complement. Thus, the use of a powerful, probabilistic birth-death-gain model reveals a striking harmony between the evolution of domain usage in metazoan proteins and organismal innovation.

July 22, 2025
12:30-12:40
Deep Phylogenetic Reconstruction Reveals Key Functional Drivers in the Evolution of B1/B2 Metallo-β-Lactamases
Confirmed Presenter: Samuel Davis, School of Chemistry and Molecular Biosciences, The University of Queensland
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Samuel Davis, Samuel Davis, School of Chemistry and Molecular Biosciences
  • Pallav Joshi, Pallav Joshi, School of Chemistry and Molecular Biosciences
  • Ulban Adhikary, Ulban Adhikary, School of Chemistry and Molecular Biosciences
  • Julian Zaugg, Julian Zaugg, School of Chemistry and Molecular Biosciences
  • Phil Hugenholtz, Phil Hugenholtz, School of Chemistry and Molecular Biosciences
  • Marc Morris, Marc Morris, School of Chemistry and Molecular Biosciences
  • Gerhard Schenk, Gerhard Schenk, School of Chemistry and Molecular Biosciences
  • Mikael Boden, Mikael Boden, School of Chemistry and Molecular Biosciences

Presentation Overview:Show

Metallo-β-lactamases (MBLs) comprise a diverse family of antibiotic-degrading enzymes. Despite their growing implication in drug-resistant pathogens, no broadly effective clinical inhibitors against MBLs currently exist. Notably, β-lactam-degrading MBLs appear to have emerged twice from within the broader, catalytically diverse MBL-fold protein superfamily, giving rise to two distinct monophyletic groups: B1/B2 and B3 MBLs.

Comparative analyses have highlighted distinct structural hallmarks of these subgroups, particularly in metal-coordinating residues. However, the precise evolutionary events underlying their emergence remain unclear due to challenges presented by extensive sequence divergence. Understanding the molecular determinants driving the evolution of β-lactamase activity may inform design of broadly effective inhibitors.

We sought to infer the evolutionary features driving the emergence of B1/B2 MBLs via phylogenetics and ancestral reconstruction. To overcome challenges associated with evolutionary analysis at this scale, we developed a phylogenetically aware sequence curation framework centred on iterative profile HMM refinement. This framework was applied over several iterations to construct a comprehensive phylogeny encompassing the B1/B2 MBLs and several other recently diverged clades. The resulting tree represents the most robust hypothesis to date regarding the emergence of B1/B2 MBLs and implies a parsimonious evolutionary history of key features, including variation in active site architecture and insertions and deletions of distinct structural elements.

Ancestral proteins inferred at key internal nodes were experimentally characterised, revealing distinct activity profiles that reflect underlying evolutionary transitions. These findings give rise to testable hypotheses regarding the molecular basis and evolutionary drivers of functional diversification, as well as potential targets for MBL inhibitor design.

July 22, 2025
12:40-12:50
A compendium of human gene functions derived from evolutionary modeling
Confirmed Presenter: Paul D. Thomas, University of Southern California, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Marc Feuermann, Marc Feuermann, SIB Swiss Institute for Bioinformatics
  • Huaiyu Mi, Huaiyu Mi, University of Southern California
  • Pascale Gaudet, Pascale Gaudet, Swiss Institute of Bioinformatics
  • Anushya Muruganujan, Anushya Muruganujan, University of Southern California
  • Suzanna Lewis, Suzanna Lewis, Lawrence Berkeley National Laboratory
  • Dustin Ebert, Dustin Ebert, University of Southern California
  • Tremayne Mushayahama, Tremayne Mushayahama, University of Southern California
  • Gene Ontology Consortium, Gene Ontology Consortium, Various
  • Paul D. Thomas, Paul D. Thomas, University of Southern California

Presentation Overview:Show

A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. We have recently published a paper (Feuermann et al., Nature 640:146, 2025) describing our initial release of a human gene “functionome,” a comprehensive set of human gene function descriptions using Gene Ontology (GO) terms, supported by experimental evidence. This work involved integration of all applicable experimental Gene Ontology (GO) annotations for human genes and their homologs, using a formal, explicit evolutionary modeling framework. We will review this work and its major findings, and describe subsequent progress on an updated version.

In more detail, we will describe the results of a large, international effort to integrate experimental findings from more than 100,000 publications to create a representation of human gene functions that is as complete and accurate as possible. Specifically, we applied an expert-curated, explicit evolutionary modeling approach to all human protein-coding genes, which integrates available experimental information across families of related genes into models reconstructing the gain and loss of functional characteristics over evolutionary time. The resulting set of integrated functions covers ~82% of human protein-coding genes, and the evolutionary models provide insights into the evolutionary origins of human gene functions. We show that our set of function descriptions can improve the widely used genomic technique of GO enrichment analysis. The experimental evidence for each functional characteristic is recorded, enabling the scientific community to help review and improve the resource, available at https://functionome.geneontology.org.

July 22, 2025
12:50-1:00
pLM in functional annotation: relationship between sequence conservation and embedding similarity
Confirmed Presenter: Ana Rojas, CABD, Spain
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Francisco M. Perez-Canales, Francisco M. Perez-Canales, CABD-CSIC
  • Ildefonso Cases, Ildefonso Cases, CABD-CSIC
  • Gemma Martínez-Redondo, Gemma Martínez-Redondo, 3Metazoa Phylogenomics Lab
  • Rosa Fernandez, Rosa Fernandez, 3Metazoa Phylogenomics Lab
  • Ana Rojas, Ana Rojas, CABD

Presentation Overview:Show

Functional annotation of protein sequences remains a bottleneck for understanding the biology of both model and non model organisms, as conventional homology based tools often fail to assign functions to the majority of newly sequenced genes. We first benchmarked each pLM on well‐characterized model organisms, demonstrating superior recovery of functional signals from transcriptomic datasets compared to traditional methods. We then applied our pipeline to annotate ~1,000 animal proteomes, encompassing 23 million genes, and discovered candidate genes involved in gill regeneration in a non model insect. To elucidate how pLM embeddings relate to primary‐sequence conservation, we computed cosine distances between embeddings and aligned sequences to derive percent identity. Statistical analyses—including Pearson correlation, polynomial regression, and quantile regression—revealed complex, non linear relationships between embedding similarity and sequence identity that vary markedly across models. These findings indicate that pLM embeddings capture orthogonal functional features beyond simple residue conservation. Altogether, our work highlights the power of pLM based annotation for expanding functional insights in biodiversity projects and underscores the need to interpret embedding distances in light of each model’s unique representational biases.

July 22, 2025
14:00-14:20
Disentangling SARS-CoV-2 Lineage Importations and the Role of NPIs Using Bayesian Phylogeography of 1.8 Million Genomes
Confirmed Presenter: Sama Goliaei, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Sama Goliaei, Sama Goliaei, Braunschweig Integrated Centre of Systems Biology (BRICS)
  • Mohammad-Hadi Foroughmand-Araabi, Mohammad-Hadi Foroughmand-Araabi, Braunschweig Integrated Centre of Systems Biology (BRICS)
  • Aideen Roddy, Aideen Roddy, Braunschweig Integrated Centre of Systems Biology (BRICS)
  • Ariane Weber, Ariane Weber, Transmission
  • Sanni Översti, Sanni Översti, Transmission
  • Denise Kühnert, Denise Kühnert, Transmission
  • Alice McHardy, Alice McHardy, Computational Biology of Infection Research

Presentation Overview:Show

Nonpharmaceutical interventions (NPIs) were key to limiting SARS-CoV-2 transmission before vaccines, though their effectiveness—especially regarding mask use and socioeconomic trade-offs—remains under discussion. Leveraging a Bayesian phylogeographic framework, we analyzed 1.8 million globally sampled SARS-CoV-2 genomes to quantify lineage importations into Germany during the third pandemic wave (late 2020–early 2021). Across three sampling strategies, we observed a consistent decline in importations following key NPIs, notably the provision of free rapid antigen tests and mandates for surgical/FFP2 mask usage. While mask efficacy has been debated, our data show that upgrading from cloth to medical-grade masks coincided with sharp reductions in importation frequency—particularly in densely populated states.
We introduce a novel metric, the Smoothed Importation Frequency (SIF), and a daily effectiveness measure that allows more precise, real-time assessment of NPIs by smoothing fluctuations in importation data, thus overcoming limitations of previous methods that lacked temporal resolution and clarity. Our findings reveal that major lineage importations clustered around the Christmas holiday period, and spread disproportionately from populous states, identifying these as critical nodes in national transmission dynamics.
These results demonstrate the importance of integrating phylogenetic data with real-world intervention timelines to decode the drivers of pathogen spread. Beyond confirming the effectiveness of masks and rapid testing, our study highlights the notable impact of restricting gatherings and movements, supporting a data-driven, targeted approach to pandemic response. The data suggests that scalable, low-socioeconomic-cost measures like rapid testing and surgical-grade masking, when accessible, may be especially valuable early in outbreaks, when vaccines are not yet available.

July 22, 2025
14:20-14:40
SARS-CoV-2 Intra-Host Evolution in Immuno-Compromised Individuals: A Fractal Perspective on Genome Geometry
Confirmed Presenter: Nicole A. Rogowski, Leiden University Medical Center, Netherlands
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Nicole A. Rogowski, Nicole A. Rogowski, Leiden University Medical Center
  • Kees Mourik, Kees Mourik, Leiden University Medical Center
  • Nithya Kuttiyarthu Veetil, Nithya Kuttiyarthu Veetil, Leiden University Medical Center
  • Stefan A. Boers, Stefan A. Boers, Leiden University Medical Center
  • Anna H.E. Roukens, Anna H.E. Roukens, Leiden University Medical Center
  • Simon P. Jochems, Simon P. Jochems, Leiden University Medical Center
  • Louis A.C.M. Kroes, Louis A.C.M. Kroes, Leiden University Medical Center
  • Igor A. Sidorov, Igor A. Sidorov, Leiden University Medical Center
  • Jelle J. Goeman, Jelle J. Goeman, Leiden University Medical Center
  • Jutte J.C. de Vries, Jutte J.C. de Vries, Leiden University Medical Center

Presentation Overview:Show

Studies have associated the punctuated evolution of SARS-CoV-2 variants with prolonged infections and subsequent transmission. We describe the genetic signatures of SARS-CoV-2 intra-host evolution in 10 immuno-compromised (IC) patients and 5 competent controls, in 55 longitudinal samples. We included two types of IC: induced (immune suppressants) and innate (haematological disease). The mutational profile was analysed between IC types, over time, and in response to treatment (host directed and antiviral).
However, almost all studies on viral evolution consider only a ‘consensus’ sequence for a virus (mutations >50% frequency represented as ambiguous nucleotides) – ignoring the diverse viral pool (quasi-species) arising from replication errors. Including the full viral quasi-species profile is essential to understanding how resistance mutations arise. When making phylogenetic trees for all variants, the high levels of ambiguous positions caused failure. Here we report a novel approach based on chaos game which can leverage viral quasi-species and produce phylogenetic trees regardless of ambiguity.
Graphical representations were generated using Chaos Game Representation (CGR), which draws a “walk” to encode genetic information. Each walk has a set of independent mutations, and by compiling thousands of walks for each sample, covering most combinations of mutations, Frequency CGR (FCGR) objects were created. Due to the collection of walks, positional ambiguity and complex mutations can be easily incorporated in phylogeny (using topology-based methods), and results in closer relationships between patient samples. The same topology-based methods produced a 3D visualization of the genome space, similar to an antigen map, highlighting distinct signatures visible in IC patients.

July 22, 2025
14:40-15:00
Antarctica as a Viral Reservoir: Insights from Comparative Genomics and Metagenomics
Confirmed Presenter: Caroline Martiniuc, UFRJ, Brazil
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Caroline Martiniuc, Caroline Martiniuc, UFRJ
  • Igor Taveira, Igor Taveira, UFRJ
  • Fernanda Abreu, Fernanda Abreu, UFRJ
  • Anderson Cabral, Anderson Cabral, UFRJ
  • Rodolfo Paranhos, Rodolfo Paranhos, UFRJ
  • Deborah Leite, Deborah Leite, UTFPR
  • Lucy Seldin, Lucy Seldin, UFRJ
  • Diogo Jurelevicius, Diogo Jurelevicius, UFRJ

Presentation Overview:Show

Two bioinformatics approaches stand out in the study of viromes in extreme environments: prophage comparative genomics and viral metagenomic analyses. The bacteria Rummeliibacillus stabekisii emerges as an interesting model for investigating extremophilic prophages, as it has been isolated from spacecraft surfaces and Antarctic soils, raising questions about the role of prophages in its environmental resilience. Additionally, Antarctica faces hydrocarbon contamination, making these regions even more hostile. Understanding ecological and metabolic interactions in this context can help elucidate microbial relationships in such environments. For the comparative genomics study, genomes of R. stabekisii from spacecraft surfaces and Antarctic soil were analyzed. PHASTER was used to identify prophages within the genomes, followed by annotation with BLAST. Furthermore, metagenomic analyses were performed on five hydrocarbon-contaminated Antarctic soil samples. The samples were sequenced using Illumina and assembled with MEGAHIT. Viral contigs were identified using VirSorter, and taxonomy was classified with the PhaGCN. Viral hosts were assigned based on data from the International Committee on Taxonomy of Viruses (ICTV) and the CHERRY software. Comparative genomic analysis revealed that Antarctic R. stabekisii harbored the highest number of intact prophages, with genes suggesting adaptive advantages and regions acting as hotspots for recombination. In contaminated soils, the class Caudoviricetes exhibited the highest abundance. Most detected viral hosts belonged to hydrocarbon-degrading bacterial genera within the phyla Pseudomonadota and Actinomycetota. Additionally, auxiliary viral metabolic genes associated with nitrogen and phosphorus cycles were identified. Both results reinforce the relevance of viruses as agents of genetic and ecological modulation in Antarctica.

July 22, 2025
15:00-15:10
Computational Genomics and Biosynthetic Potential Analysis of a Dead Sea Penicillium sp.
Confirmed Presenter: Milana Frenkel-Morgenstern, Reichman University, Israel
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Dylan Dsouza, Dylan Dsouza, Bar-Ilan University
  • Milana Frenkel-Morgenstern, Milana Frenkel-Morgenstern, Reichman University

Presentation Overview:Show

Extreme environments harbor unique microbial life with biotechnological potential. Here, we characterize a novel Penicillium sp. isolated from the hypersaline Dead Sea, capable of thriving at 70‰ salt concentration. Whole-genome and transcriptome sequencing were performed, followed by de novo assembly and quality assessment using QUAST and BUSCO. Functional annotation of predicted peptides was conducted using InterProScan, UPIMAPI, and Blast+ with NCBI-RefSeq and UniProtKB, validating spectral data from LC-MS/MS (nanoAcquity coupled with Q Exactive HFX) analyzed via Proteome Discoverer v2.4, SequestHT, and MS Amanda 2.0. Key enzymes in penicillin biosynthesis were confirmed.

Biosynthetic potential was assessed using AntiSMASH and dbCAN3, with SignalP 6.0 machine learning predicting secretory proteins. Phylogenetic analysis of single-copy orthologs was performed using OrthoFinder. The genome revealed biosynthetic gene clusters for valuable bioactives, including mellein, lovastatin, sorbicillin, and roquefortine. Strong antimicrobial inhibition was observed in E. coli NEB+ STABL from extracts grown in a high-nitrogen medium with phenylacetate and 20% Dead Sea water.

At the transcript level, RFAM annotation identified THI4 and THI5 riboswitches, with secondary structures predicted via RNAfold and R2DT. Conservation analysis using LocARNA provided insights into regulatory mechanisms.

These findings highlight the computational-driven discovery of biosynthetic pathways and stress-adaptive mechanisms in Penicillium sp., demonstrating its potential for industrial applications in extreme environments.

July 22, 2025
15:10-15:20
Unravelling the pangenome of autotrophic bacteria: Metabolic commonalities, evolutionary relationships, and industrially relevant traits
Confirmed Presenter: Dr. Karan Kumar, Institute of Applied Microbiology , Aachen Biology and Biotechnology
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Dr. Karan Kumar, Dr. Karan Kumar, Institute of Applied Microbiology
  • Tobias B. Alter, Tobias B. Alter, Institute of Applied Microbiology
  • Lars M. Blank, Lars M. Blank, Institute of Applied Microbiology

Presentation Overview:Show

Atmospheric CO₂ fixation by microbial autotrophs presents a sustainable alternative to energy-intensive chemical processes, offering significant potential for biotechnological applications. However, understanding the genetic diversity, evolutionary adaptations, and metabolic capabilities of autotrophic carbon-fixing lineages (ACL) requires a comparative genomic approach. This study employs pangenome analysis to systematically assess the core, accessory, and unique genetic components across diverse ACL bacteria, with a particular focus on the recently revised genus Xanthobacter and the newly proposed Roseixanthobacter. By integrating phylogenetic, functional, and metabolic insights, we aim to elucidate conserved and variable genetic traits that contribute to CO₂ fixation efficiency and industrial relevance. A total of 546 high-quality genomes spanning 121 ACL microbial species were selected for analysis, following rigorous genome quality control measures based on CheckM contamination thresholds, contig limits, and genome size variation criteria. Initial phylogenomic analyses identified 16 microbial genera closely related to Xanthobacter, including Ancylobacter, Azorhizobium, Cupriavidus, Hydrogenophaga, Moorella, and Synechococcus, among others. Genomes were uniformly re-annotated to ensure consistency in gene identification. Pangenome reconstruction, core-genome diversity assessments, orthologous group clustering, and essential metabolic pathway mapping were performed to identify key functional traits enabling inorganic carbon assimilation, H₂ utilization, and N₂ fixation. Among these traits are RuBisCO for CO₂ fixation, hydrogenases for H₂ metabolism, and nitrogenase complexes for converting atmospheric N₂ into bioavailable forms. The findings of this study would contribute to metabolic engineering efforts, facilitating the development of optimized microbial strains for sustainable biotechnology applications such as alternative protein production, biofuel production, carbon sequestration, and synthetic biology efforts.

July 22, 2025
15:20-15:30
Spatiotemporal patterns in the human gut dysbiosis contrasted to healthy families
Confirmed Presenter: Falk Hildebrand, Quadram Institute Bioscience, United Kingdom
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Falk Hildebrand, Falk Hildebrand, Quadram Institute Bioscience
  • Katarzyna Sidorczuk, Katarzyna Sidorczuk, Quadram Institute Bioscience
  • Rebecca Ansorge, Rebecca Ansorge, Quadram Institute Bioscience

Presentation Overview:Show

The gut microbiome is essential to the wellbeing and health of its human host, yet most studies to date resolve the gut microbial community only at genus or species level. Yet we do know that two bacterial strains of the same species can differ by more than half their genome and that pathogenicity is encoded at the strain - not species - level. Therefore, my group develops the technologies to track bacterial strain in metagenomic time series, and to investigate evolutionary pressures.

Our studies have uncovered the extreme persistence of bacterial strains in individual human hosts (doi: 10.1016/j.chom.2021.05.008). Using strain tracking, we can uncover the colonization of multiple family members, creating a “family-specific microbiome”. Yet also in disease we can find significant shifts in microbial strains: Using a meta-analysis of >5,000 metagenomes, I will show typical strain enrichments associated with IBD and their temporal patterns during episodes of inflammatory flares.

These research lines demonstrate the importance to increase both taxonomic and genome resolution in microbiome studies to uncover the microbial patterns prevalent in disease and health.

July 22, 2025
15:30-15:40
Marker discovery in the large
Confirmed Presenter: Beatriz Vieira Mourato, Max Planck Institute for Evolutionary Biology, Germany
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Beatriz Vieira Mourato, Beatriz Vieira Mourato, Max Planck Institute for Evolutionary Biology
  • Ivan Tsers, Ivan Tsers, Max Planck Institute for Evolutionary Biology
  • Svenja Denker, Svenja Denker, Max Planck Institute for Evolutionary Biology; Lübeck University
  • Fabian Klötzl, Fabian Klötzl, none
  • Bernhard Haubold, Bernhard Haubold, Max-Planck-Institute for Evolutionary Biology

Presentation Overview:Show

Pathogen outbreaks are now routinely tracked by whole genome
sequencing. This leads to ever-increasing opportunities for marker
discovery beyond the traditional candidate gene approach. Ideal
genetic markers are present in all target organisms and nowhere
else. Such markers have maximal sensitivity and
specificity. Evolutionary biology implies that the vast majority of
potentially non-specific sequences are present in the closest distinct
relatives of the targets, their neighbors. We have implemented
this insight in our software for finding unique genomic regions,
Fur. Fur takes as input a set of target and neighbor genomes and
returns the regions present in all targets that are absent from all
neighbors. The resulting list of regions is highly enriched for
diagnostic markers. Fur is based on suffix array algorithms, making it
fast. However, its original version required memory proportional to
the size of the neighborhood. Here we present the new Fur, which
requires memory proportional to the longest neighbor sequence. This
allows marker discovery from whole genome sequences on consumer-grade
hardware. For example, the analysis of 178 target and 1,074 neighbor
genomes of Streptococcus pneumoniae took 9m 16s and used
11.6GB RAM. We applied Fur to 120 diverse bacterial taxa and tested
the marker candidates by comparison to nt. We found that the
marker candidates had excellent in silico sensitivity and
specificity making them ideal starting material for developing
diagnostic genetic markers in vitro.

July 22, 2025
15:40-15:50
Whole-genome detection and origin identification of orphan genes in plant-parasitic nematodes
Confirmed Presenter: Ercan Seçkin, Institut Sophia Agrobiotech, INRAE
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Ercan Seçkin, Ercan Seçkin, Institut Sophia Agrobiotech
  • Etienne Danchin, Etienne Danchin, Institut Sophia Agrobiotech
  • Dominique Colinet, Dominique Colinet, Institut Sophia Agrobiotech
  • Edoardo Sarti, Edoardo Sarti, Inria d'Université Côte d'Azur

Presentation Overview:Show

Genes with no known homologs constitute 5% to 30% of every organism’s genome. These orphan genes have either rapidly diverged from a family or have appeared de novo from a previously non-coding region. Their detection, origin identification, and structural characterization are challenging, and evidences about the nature of de novo genes seem to be strongly species-dependent. In root-knot nematodes (Meloidogyne), orphan genes have been linked to parasitic functions, and are thus of great agronomical interest. Starting from recently sequenced whole genomes of eight species of Meloidogyne, we use comparative homology, transcriptomics and proteomics for robust detection of orphan genes. Then, we rely on ancestral sequence reconstruction strategies and synteny approaches for identifying their origin. We find that 19% of all orphan genes are most likely to be de novo, and 30% divergent. Taking an equilibrated subset, we perform protein structure prediction with AlphaFold2, ESMFold and OmegaFold, and find that all three protein language models produce low-confidence predictions. This result does not seem caused by an increased intrinsic disorder in orphan proteins (that we calculated with AIUPred and flDPnn), rather by the low similarity between the query orphan sequences and the training sets of the structure predictors. The dataset is thus a challenging, homology-free benchmark for structure, disorder, and emergence prediction.

July 22, 2025
15:50-16:00
Construction and Analysis of the Moniliophthora roreri pangenome
Confirmed Presenter: Isabella Gallego, Center for Nuclear Energy in Agriculture, University of São Paulo
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Isabella Gallego, Isabella Gallego, Center for Nuclear Energy in Agriculture
  • Diego Mauricio Riaño-Pachón, Diego Mauricio Riaño-Pachón, Center for Nuclear Energy in Agriculture

Presentation Overview:Show

Moniliophthora roreri, the causal agent of frosty pod rot, is a devastating fungal pathogen affecting cacao production across Latin America. Its broad host range, ecological adaptability, and high pathogenicity underscore the need to understand its genomic diversity to inform disease management strategies. Here, we present a comprehensive pangenome analysis of 24 publicly available M. roreri genomes using two state-of-the-art graph-based methods: Minigraph-Cactus and PGGB.

Graph-based approaches allow us to integrate structural variation and genome-wide sequence diversity into a unified representation. The resulting pangenomes were used to classify genes into core, accessory, and strain-specific categories, revealing genomic features likely associated with adaptation and pathogenicity. Functional annotation was performed with HMMER and PANNZER2, and enriched Gene Ontology terms were identified for each gene category using the topGO and REVIGO tools, offering insight into biological processes specific to different parts of the genome.

The study also includes a comparative analysis between our graph-based pangenomes and a previously constructed orthology-based version. This evaluation uses metrics such as genome completeness, representation of structural variants, core/accessory gene content, and computational performance. Our findings demonstrate the value of graph-based methods in capturing the genomic complexity of fungal pathogens and provide a foundation for future research into the molecular basis of virulence and host adaptation in M. roreri.

July 22, 2025
16:40-17:00
Proceedings Presentation: Recomb-Mix: fast and accurate local ancestry inference
Confirmed Presenter: Yuan Wei, University of Central Florida, United States
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Yuan Wei, Yuan Wei, University of Central Florida
  • Degui Zhi, Degui Zhi, University of Texas Health Science Center at Houston
  • Shaojie Zhang, Shaojie Zhang, University of Central Florida

Presentation Overview:Show

Motivation: The availability of large genotyped cohorts brings new opportunities for revealing the high-resolution genetic structure of admixed populations via local ancestry inference (LAI), the process of identifying the ancestry of each segment of an individual haplotype. Though current methods achieve high accuracy in standard cases, LAI is still challenging when reference populations are more similar (e.g., intra-continental), when the number of reference populations is too numerous, or when the admixture events are deep in time, all of which are increasingly unavoidable in large biobanks.

Results: In this work, we present a new LAI method, Recomb-Mix. Recomb-Mix integrates the elements of existing methods of the site-based Li and Stephens model and introduces a new graph collapsing trick to simplify counting paths with the same ancestry label readout. Through comprehensive benchmarking on various simulated datasets, we show that Recomb-Mix is more accurate than existing methods in diverse sets of scenarios while being competitive in terms of resource efficiency. We expect that Recomb-Mix will be a useful method for advancing genetics studies of admixed populations.

Availability and Implementation: The implementation of Recomb-Mix is available at https://github.com/ucfcbb/Recomb-Mix.

July 22, 2025
17:00-17:20
WINDEX: A hierarchical integration of site- and window-based statistics for modeling the footprint of positive selection
Confirmed Presenter: Hannah Snell, Center for Computational Molecular Biology, Brown University
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Hannah Snell, Hannah Snell, Center for Computational Molecular Biology
  • Scott McCallum, Scott McCallum, Department of Mathematics and Computer Science
  • Dhruv Raghavan, Dhruv Raghavan, Department of Computer Science
  • Ritambhara Singh, Ritambhara Singh, Center for Computational Molecular Biology
  • Sohini Ramachandran, Sohini Ramachandran, Center for Computational Molecular Biology
  • Lauren Sudgen, Lauren Sudgen, Department of Mathematics and Computer Science

Presentation Overview:Show

In genetics studies, scientists search for mutations that explain changes in phenotype or population diversity. Adaptive mutations, or mutations that increase in frequency by conferring a fitness benefit, leave behind statistical signals in genetic data that genome-wide scans for selection can reveal. Computational methods have improved the localization of adaptive mutations in genetic samples using machine learning techniques. However, these methods fail to account for the effect of linkage disequilibrium on localization and miss the opportunity to incorporate statistics at varying resolutions. Leveraging statistics in both individual sites and local genetic windows allows us to capture features of positive selection footprints due to changes in allele frequencies, haplotypes, or site-frequency spectra (SFS). Our proposed method, WINDEX, aims to combine these differing resolutions of statistics with a hierarchical hidden Markov model architecture to improve the prediction of positively selected loci among hitchhiking signals. WINDEX contains site- and window-dependent latent states corresponding to neutral, linked, and adaptive regions. This structure uses the information provided by both statistical resolutions to make classifications, capturing a broader range of signals left by a positive selective sweep. WINDEX shows strong performance with 99% accuracy in artificially generated sequences, and competitive performance against baselines such as SWIF(r) and a multi-layer perceptron (MLP). WINDEX is currently being tested on canonical positive selection sites in the human genome using data from the 1000 Genomes Project. Overall, WINDEX provides the opportunity to incorporate the full range of existing selection statistics to improve localization and understand the footprint of positive selection.

July 22, 2025
17:20-17:30
Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model
Confirmed Presenter: Pavitra Selvakumar, The Insitute of Mathematical Sciences, India
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show

  • Pavitra Selvakumar, Pavitra Selvakumar, The Insitute of Mathematical Sciences
  • Rahul Siddharthan, Rahul Siddharthan, The Institute of Mathematical Sciences

Presentation Overview:Show

Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce 'position-specific stationary vectors' (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate 'conditional PSSVs' conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.

July 22, 2025
17:30-18:00
Panel: Concluding remarks
Track: EvolCompGen: Evolution & Comparative Genomics

Room: 11A
Format: In person

Authors List: Show