Posters - Schedules
Posters Home

View Posters By Category

Monday, July 24, between 18:00 CEST and 19:00 CEST
Tuesday, July 25, between 18:00 CEST and 19:00 CEST
Session A Poster Set-up and Dismantle
Session A Posters set up:
Monday, July 24, between 08:00 CEST and 08:45 CEST
Session A Posters dismantle:
Monday, July 24, at 19:00 CEST
Session B Poster Set-up and Dismantle
Session B Posters set up:
Tuesday, July 25, between 08:00 CEST and 08:45 CEST
Session B Posters dismantle:
Tuesday, July 25, at 19:00 CEST
Wednesday, July 26, between 18:00 CEST and 19:00 CEST
Session C Poster Set-up and Dismantle
Session C Posters set up:
Wednesday, July 26,between 08:00 CEST and 08:45 CEST
Session C Posters dismantle:
Wednesday, July 26, at 19:00 CEST
Virtual
C-218: Comparative genomics study of indigenous cattle for identification of innate immunity related traits using NGS methods.
Track: EvolCompGen
  • Shukrruthi Iyengar, SASTRA Deemed University, India
  • Katari Aakanksha, SASTRA Deemed University, India
  • Ragothaman Yennamalli, SASTRA Deemed University, India
  • Suneel Onteru, National Dairy Research Institute, India
  • Dheer Singh, National Dairy Research Institute, India
  • Menaka Thambiraja, SASTRA Deemed To Be University, India


Presentation Overview: Show

Indigenous cattle are better adapted to local production environments compared to exotic cattle due to their evolution under specific agro-climatic conditions. This is attributed to the complex interaction of many multigene families which undergo the birth-and-death process and control the immune system in vertebrates, resulting in the evolution of new genetic systems that enhance fitness, reproduction, and drug resistance which is verified by chromosome-by-chromosome genome-wide comparative analysis between Bos indicus (Nelore, Gir, Kangeyam, Tharparkar, and Sahiwal) and exotic breed (Holstein-Friesian), and with a hybrid breed (Karan-Fries). In this project, a chromosome-by-chromosome comparative genomics study for the existing genomes and the above-mentioned breeds was performed to obtain a more accurate map of the copy number variations (CNV). Using the NGS genomic data obtained for the indigenous cattle, we compare the CNVs from the comparative genomic workflow using GSAlign, SyRI, and Manta to identify the genes and their associated loci if involved in immunity. We performed the following comparisons: Exotic versus Hybrid and five indigenous breeds, and Hybrid versus the five indigenous breeds. We correlate the outcome with the available QTL data to identify the other loci that associate with a variation of a quantitative trait in the phenotype of the breeds.

C-219: Comparative Genomics study of Bos Genome
Track: EvolCompGen
  • Menaka Thambiraja, SASTRA DEEMED TO BE UNIVERSITY, India
  • Ragothaman M Yennamalli, SASTRA DEEMED TO BE UNIVERSITY, India
  • Shukrruthi K Iyengar, SASTRA DEEMED TO BE UNIVERSITY, India
  • Brintha Satishkumar, SASTRA DEEMED TO BE UNIVERSITY, India
  • Sai Rohith Kavuru, SASTRA DEEMED TO BE UNIVERSITY, India
  • Aakanksha Katari, SASTRA DEEMED TO BE UNIVERSITY, India
  • Suneel K Onteru, National Dairy Research Institute, India
  • Kamlesh Kumari Bajwa, National Dairy Research Institute, India
  • Dheer Singh, National Dairy Research Institute, India


Presentation Overview: Show

Indigenous cattle in India are known for their economic management in comparison to exotic breeds, owing to their evolution under specific agroclimatic conditions. Their adaptation to harsh climatic conditions and resistance is attributed to the birth-and-death evolution model. One way of identification of the genomic variations is to catalog the copy number variations (CNVs) and the relationship between CNVs and the innate immunity of indicine cattle has not been in focus. We performed genome-wide comparative analysis for the existing genomes B. indicus (Nelore breed), B. indicus (Gir breed), and B. taurus. Using SyMap, GSAlign, and SyRI tool, we performed a chromosomes-by-chromosome analysis of these genomes and identified evolutionary-based sequence variations, such as 97.39% SNVs, 2.06% insertions, and 0.54% deletions between B. taurus and B. indicus (Nelore breed), 91.5% SNVs, 4.2% insertions, and 4.24% deletions between B. taurus and B. indicus (Gir breed) and 93.01% SNVs, 3.2% insertions and 3.8% deletions between B. indicus (Nelore breed) and B. indicus (Gir breed). In addition, we also studied the intrachromosomal variation that involved the comparison of autosomes with allosomes. The results identify the key genes and their associated loci involved in innate immunity among each breed.

C-220: L-shaped distribution of the relative substitution rate (c/μ) observed for SARS-COV-2’s genome, inconsistent with the selectionist theory, the neutral theory and the nearly neutral theory but a near-neutral balanced selection theory
Track: EvolCompGen
  • Chun Wu, Rowan University, United States
  • Nicholas Paradis, Rowan University, United States
  • Phillip Lakernick, Rowan University, United States
  • Mariya Hryb, Rowan University, United States


Presentation Overview: Show

The COVID-19 pandemic which has claimed over 6 million lives, is caused by SARS-CoV-2. Understanding the evolution nature of this virus is critical toward elucidating its origin and updating vaccines and therapeutics to mitigate this pandemic. Yet all three existing evolution theories (the Selectionist Theory/ST, Kimura’s Neutral Theory/KNT and Ohta’s Nearly Neutral Theory/ONNT) fail to explain the evolutionary nature of this virus. In this study, we proposed a new hybrid theory between ST and a nearly neutral theory to explain the observed genomic features of this virus: the Near-Neutral Balanced Selection Theory (NNBST). For the very first time, our NNBST can explain a molecular clock feature of time-independent GSR from a balanced selection mechanism rather than a neutral mechanism that has been the mainstream belief over the last 60 years. In other words, the higher substitution rates of genomic segments (e.g., genes) under positive selection are balanced out with the lower substitution rates of genomic segments under negative selection, leading to an apparent time-independent GSR under apparent neutral selection. Our relative substitution rate method provides a tool to resolve the long standing “neutralist-selectionist” controversy. Implications of NNBST in resolving Lewontin’s Paradox is also discussed.

C-221: Robust and platform-independent CNA calling with ASCAT v3
Track: EvolCompGen
  • Tom Lesluyes, The Francis Crick Institute, United Kingdom
  • Maxime Tarabichi, Université Libre de Bruxelles, Belgium
  • Kerstin Haase, Max Delbrück Center for Molecular Medicine, Germany
  • Jonas Demeulemeester, VIB – KU Leuven Center for Cancer Biology, Belgium
  • Peter Van Loo, The University of Texas MD Anderson Cancer Center, United States


Presentation Overview: Show

Tumour initiation and evolution are fuelled by somatic changes, ranging from single nucleotide variants to whole-genome aberrations. As a key process, copy-number alterations (CNAs) have been an important field of investigation for decades and helped uncover insights in terms of diagnosis, prognosis and treatment. Therefore, obtaining accurate CNA calls is crucial for understanding tumour biology.

In 2010, ASCAT was proposed as an allele-specific CNA caller, resolving tumour purity and ploidy from array data. Since then, we have aimed to extend ASCAT, enabling efficient and robust CNA calling from sequencing data, ranging from small targeted panels to whole genomes. We considered TCGA/ICGC/PCAWG cases with patient-matched SNP6, WES and WGS data for validation and compared agreement between CNA profiles. Also, we propose metrics of interest for quality control of ASCAT results. Such features and other improvements are now available in ASCAT v3.

Furthermore, powerful bioinformatics methods allow delving deeper into tumour biology using sequencing data, but they require accurate purity and CNA estimates. We demonstrate how ASCAT and other tools enable characterising tumour evolution and heterogeneity.

All combined, ASCAT accurately assesses somatic and allele-specific copy-number changes in cancer genomes, making it a central entry point into uncovering key aspects of tumour biology.

C-222: SHERLOG: a full homology-based synteny block detection program
Track: EvolCompGen
  • Daehong Kwon, Konkuk University, South Korea
  • Nayoung Park, Konkuk University, South Korea
  • Suyeon Wy, Konkuk University, South Korea
  • Kisang Kwon, Konkuk University, South Korea
  • Youngbeen Moon, Konkuk University, South Korea
  • Hyeonji Kim, Konkuk University, South Korea
  • Jaebum Kim, Konkuk University, South Korea


Presentation Overview: Show

Many alignment-based synteny block detection programs have been successfully applied to diverse studies of comparative genomics for searching syntenic relationships between genomes without any gene annotation information. However, most of them focus on searching one-to-one homologous relationships among regions in different genomes. This leads to difficulties in accurate analysis for regions related to genome duplication as well as phased genomes which have multiple copies for each chromosome. We developed a synteny block detection program, called SHERLOG. Given a whole genome alignment among genomes, SHERLOG can find more than one synteny block for a single genomic region if it is homologous to multiple regions in other related genomes. In addition, SHERLOG does not rely on a reference genome and supports phased genome assemblies. In evaluation, SHERLOG showed good performance for diverse sizes of genomes as well as a wide range of resolutions. SHERLOG is freely available at https://github.com/jkimlab/SHERLOG.

C-223: Machine learning enables prediction of metabolic system evolution in bacteria
Track: EvolCompGen
  • Naoki Konno, The University of Tokyo, Japan
  • Wataru Iwasaki, The University of Tokyo, Japan


Presentation Overview: Show

Evolution prediction is a long-standing goal in evolutionary biology, with potential impacts on strategic pathogen control, genome engineering, and synthetic biology. While laboratory evolution studies have shown the predictability of short-term and sequence-level evolution, that of long-term and system-level evolution has not been systematically examined. Here, we show that the gene content evolution of metabolic systems is generally predictable by applying ancestral gene content reconstruction and machine learning techniques to ~3000 bacterial genomes. Our framework, Evodictor, successfully predicted gene gain and loss evolution at the branches of the reference phylogenetic tree, suggesting that evolutionary pressures and constraints on metabolic systems are universally shared. Investigation of pathway architectures and meta-analysis of metagenomic datasets confirmed that these evolutionary patterns have physiological and ecological bases as functional dependencies among metabolic reactions and bacterial habitat changes. Last, pan-genomic analysis of intraspecies gene content variations proved that even "ongoing" evolution in extant bacterial species is predictable in our framework.

C-224: PAPipe: a pipeline for comprehensive population genetic analysis
Track: EvolCompGen
  • Nayoung Park, Konkuk University, South Korea
  • Hyeonji Kim, Konkuk University, South Korea
  • Jaebum Kim, Konkuk University, South Korea


Presentation Overview: Show

The availability of population genetic variant data resulting from the advances in next-generation sequencing (NGS) technologies has led to the development of various population analysis tools to enhance our understanding of population structure and evolution. Currently, available tools analyzing population genetic variant data generally require different environments, parameters, and formats of input data, which acts as a barrier to the widespread use of such tools by general researchers not familiar with bioinformatics. To address this problem, we developed an automated and comprehensive pipeline PAPipe to perform seven widely used population genetic analyses using population NGS data. PAPipe seamlessly interconnects and serializes multiple steps, such as read mapping, genetic variant calling, data filtering, and format converting, in addition to seven population genetic analyses including population structure analysis, linkage disequilibrium decay analysis, fixation index analysis, population admixture analysis, principal component analysis, phylogenetic analysis, and pairwise sequentially Markovian coalescent analysis. PAPipe can be used to generate extensive results that provide clues or insights to enhance user convenience and data usability. PAPipe is intended for use in the Linux operating system, and the Docker image files are also provided for reducing the challenges in the environment settings.

C-225: Joint copy number and mutation phylogeny reconstruction from single-cell amplicon sequencing data
Track: EvolCompGen
  • Etienne Sollier, DKFZ, Germany
  • Jack Kuipers, ETH Zürich, Switzerland
  • Koichi Takahasi, MD Anderson, United States
  • Niko Beerenwinkel, ETH Zürich, Switzerland
  • Katharina Jahn, FU Berlin, Germany


Presentation Overview: Show

Reconstructing the history of somatic DNA alterations can help understand the evolution of a tumor and predict its resistance to treatment. Single-cell DNA sequencing (scDNAseq) can be used to investigate clonal heterogeneity and to inform phylogeny reconstruction. However, most existing phylogenetic methods for scDNAseq data are designed either for single nucleotide variants (SNVs) or for large copy number alterations (CNAs), or are not applicable to targeted sequencing. Here, we develop COMPASS, a computational method for inferring the joint phylogeny of SNVs and CNAs from targeted scDNAseq data. COMPASS assigns a likelihood to trees of somatic events based on a probabilistic model and uses a Markov Chain Monte Carlo approach to search for the best tree. It is applicable to targeted sequencing datasets where the coverage is not uniform across regions, and scales to datasets of more than 10,000 cells. We evaluate COMPASS on simulated data and apply it to several datasets including a cohort of 123 patients with acute myeloid leukemia. COMPASS detected clonal CNAs that could be orthogonally validated with bulk data, in addition to subclonal ones that require single-cell resolution, some of which point toward convergent evolution.

C-226: Generative Adversarial Networks for the Simulation of DNA Sequence Evolution
Track: EvolCompGen
  • Sean MacRae, McGill University, Canada
  • Mathieu Blanchette, McGill University, Canada


Presentation Overview: Show

Motivation: Sequence evolution models are at the heart of bioinformatics. They play a crucial role in many of its fundamental problems, including sequence alignment, phylogenetic inference, and ancestral genome reconstruction. While mutation rates at a sequence position are known to depend on its flanking positions, accurately incorporating these context dependencies into realistic sequence evolution models remains challenging.

Results: We propose the first generative adversarial network (GAN) approach to automatically learn, in an unsupervised manner, the parameters and weights involved in modeling context-dependent DNA sequence evolution. We exploit a long short-term memory network architecture for both the generator and critic, trained within the framework of a conditional Wasserstein GAN with gradient penalty. We show that the model captures contextual sequence information using various small context sizes. Different strategies to stabilize and accelerate training are discussed. We believe these results open the door for the exploration of more complex network architectures that leverage the state-of-the-art in both GAN and natural language processing research.

C-227: Homology-based genome annotation conversion tool
Track: EvolCompGen
  • Jeongmin Oh, Konkuk University, South Korea
  • Nayoung Park, Konkuk University, South Korea
  • Jaebum Kim, Konkuk University, South Korea


Presentation Overview: Show

Recent advances in sequencing technologies, especially accurate long-read generation, have accelerated the generation and improvement of various genome assemblies. Therefore, converting well-studies and validated genome annotation of an old assembly to a newly generated one is important. Here, homology between two genome assemblies can be used as an additional clue for resolving ambiguity when matching genome assemblies, leading to an increase in the number of conversions. This motivated us to develop a new method for converting genome annotation between two genome assemblies by utilizing their syntenic relationship. Our method first predicts syntenic regions between two given assemblies. Syntenic regions containing query coordinates in one assembly are then identified, and the target coordinates in the other assembly are obtained using the syntenic information. For the query coordinates remaining unconverted, syntenic regions overlapping with them are detected, and the collinearity information of those syntenic regions is used for additional conversion. The performance of our method was compared with LiftOver using structural variation (SV) annotations obtained from the DGV database. Our method converted 6% more SVs in the human hg19 assembly to the hg38 assembly than LiftOver, and 95% of converted SVs were matched with the DGV data.

C-228: Update of ALTS for Inferring Phylogenetic Networks from Gene Trees
Track: EvolCompGen
  • Louxin Zhang, National University of Singapore, Singapore


Presentation Overview: Show

Update of ALTS for
Inferring Phylogenetic Networks from Gene Trees

Louxin Zhang
National University of Singapore


Abstract

Phylogenetic trees have been used to model the evolution of species for over 200 years. However, phylogenetic networks are more useful than trees for describing and visualizing recombination, hybridization and horizontal gene transfer events. Since the space of phylogenetic networks is much larger than the space of phylogenetic trees, it is extremely challenging to infer phylogenetic networks from gene trees and from sequence data.

In a recent work (Zhang, Abhari, Colijn and Wu, RECOMB 2023, arXiv:2301.00992), Zhang et al. introduced a parsimonious method named ALTS for inferring phylogenetic network networks. It is based a reduction from the minimum phylogenetic network inference problem to the shortest common supersequence problem when tree-child networks are inferred. Here, a updated version of the ALTS program is presented. The new version is faster than the original version by multiple times and can be applied to any tree sets containing up to 70 trees on 50 taxa.

C-229: InParanoiDB 9: Ortholog groups for protein domains and full-length proteins
Track: EvolCompGen
  • Emma Persson, Stockholm University, Sweden
  • Erik Sonnhammer, Stockholm University, Sweden


Presentation Overview: Show

Prediction of orthologs is an important bioinformatics pursuit, frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins than the full-length protein ortholog. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We present InParanoiDB 9, covering 640 species, with orthologs for protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion ortholog groups. A new website has been built for the database, with new search options and visualizations. This release constitutes a major upgrade of the InParanoid database in terms of species coverage, and in the capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.

C-230: A global analysis of somatic selection on different copy number states of driver genes in human tumors
Track: EvolCompGen
  • Elizaveta Besedina, IRB Barcelona, Spain
  • Fran Supek, IRB Barcelona, ICREA, Spain


Presentation Overview: Show

Detecting selection in genes during cancer evolution is crucial for understanding tumor progression, identifying driver genes and potential therapeutic targets. However, current methods for detecting selection using somatic mutation data face challenges due to mutation rate heterogeneity and confounding by copy number alterations (CNAs).

We present MutMatch, a statistical methodology that estimates variability in selection strength across different conditions and possible interactions, and quantifies its statistical significance. MutMatch stringently controls for local mutation rate heterogeneity, gene dosage confounding, and trinucleotide mutation signatures by deriving a mutation rate baseline from neighboring non-selected genes or from non-constrained regions within the same gene. We applied MutMatch to study selection specific to different CNA states on driver genes in human cancers using large-scale sequencing datasets. We identified positive selection specific to CNA states on driver genes. On the other hand, various oncogenes and essential genes showed signatures of negative selection, which can confound the search for positive selection in other regions of these genes. We characterized the landscape of CNA-dependent selection in cancer genes, identifying at least four independently varying trends. This provides a systematic classification of mechanisms of oncogene activation and tumor suppressor inactivation by combinations of mutation and CNA in human tumors.

C-231: Unraveling the Contributions of Three Distinct Paths of Protein Evolution across the Tree of Life
Track: EvolCompGen
  • Rubén García Domínguez, Centro Andaluz de Biología del Desarrollo - CSIC, Spain
  • Damien Devos, Centro Andaluz de Biología del Desarrollo - CSIC, Spain


Presentation Overview: Show

Protein evolution involves three distinct paths, which likely involve different molecular mechanisms: the creation of new domains, recombination of existing domains to form new architectures, and the repetition of existing domains leading to tandem repeat proteins (TPRs). Our study investigated the contributions of these three paths to protein evolution across the tree of life (ToL), using Pfam domains and measuring the number of different domains, the recombination rate (pairs of domains/domains), and the proportion of repeat architectures.

Our findings demonstrate significant variations in the relative contributions of these three paths across the ToL. Eukaryotes have expanded all three paths, with notable differences among different phyla. In contrast, Archaea have reduced contributions to all three mechanisms at all levels, despite being expected to have an intermediary position. Moreover, Bacteria exhibit significant differences in their contributions to these three mechanisms. For instance, Proteobacteria have increased their domain repertoire by inventing many novel domains, while Planctomycetes have mostly relied on recombining existing domains. Additionally, Planctomycetes and Cyanobacteria have a greater contribution of repeat proteins compared to any other prokaryotic phyla.

These results suggest that protein evolution has been differentially explored during organismal divergence, leading to varying contributions of each path across the ToL.

C-232: Evolution and taxonomic distribution of siderophores across the bacterial kingdom
Track: EvolCompGen
  • Bita Pourmohsenin, University of Tuebingen, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), Germany
  • Davide Paccagnella, University of Tuebingen, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), Germany
  • Nadine Ziemert, University of Tuebingen, Interfaculty Institute of Microbiology and Infection Medicine Tübingen (IMIT), Germany
  • Zach Reitz, Wageningen University, Bioinformatics Group, Netherlands
  • Marnix Medema, Wageningen University, Bioinformatics Group, Netherlands
  • Marta Franco de Benito, IDENER Research & Development, Biotechnology Applications, Spain
  • Manuel Salvador de Lara, IDENER Research & Development, Biotechnology Applications, Spain


Presentation Overview: Show

Siderophores are small organic molecules with the ability to bind and transport iron, which have applications in various disciplines, including medicine, agriculture, and environmental sciences. Microorganisms such as bacteria produce and secrete siderophores as part of their secondary metabolism to scavenge iron from their environment. Like most bacterial secondary metabolites, siderophores are encoded within biosynthetic gene clusters (BGCs), groups of closely located genes that encode enzymes responsible for their synthesis and regulation.
In order to gain a better understanding of the distribution and evolution of siderophore biosynthesis in the bacterial kingdom, we have collected a comprehensive database of all experimentally validated siderophore-producing BGCs. Thereafter, in an exhaustive search, based on Hidden Markov models, we identified all the potential siderophore producers in the GTDB tree of bacteria and conducted a phylogenetic analysis on various genes involved in the biosynthesis of iron-chelating moieties to unravel their evolutionary history. This approach allows us to identify the diversity of siderophore producers across the bacterial kingdom and trace the evolutionary history of the genes involved.
Moreover, this database can be used for identifying core and accessory genes involved in siderophore biosynthesis, which can be exploited in mix-and-match models to design new-to-nature compounds.

C-233: Clustering diverse Canadian carbapenemase plasmids using MOB-suite tools
Track: EvolCompGen
  • Nicole Lerminiaux, Public Health Agency of Canada, Canada
  • Robyn Mitchell, Public Health Agency of Canada; Canadian Nosocomial Infection Surveillance Program, Canada
  • Jessica Bartoszko, Public Health Agency of Canada; Canadian Nosocomial Infection Surveillance Program, Canada
  • Ian Davis, Nova Scotia Health Authority; Canadian Nosocomial Infection Surveillance Program, Canada
  • Chelsey Ellis, Horizon Health Network; Canadian Nosocomial Infection Surveillance Program, Canada
  • Ken Fakharuddin, Public Health Agency of Canada, Canada
  • Susy Hota, University Health Network; Canadian Nosocomial Infection Surveillance Program, Canada
  • Kevin Katz, North York General Hospital; Canadian Nosocomial Infection Surveillance Program, Canada
  • Pamela Kibsey, Island Health; Canadian Nosocomial Infection Surveillance Program, Canada
  • Jerome Leis, Sunnybrook Health Sciences Centre; Canadian Nosocomial Infection Surveillance Program, Canada
  • Yves Longtin, McGill University Health Centre; Canadian Nosocomial Infection Surveillance Program, Canada
  • Allison McGeer, Mount Sinai Hospital; Canadian Nosocomial Infection Surveillance Program, Canada
  • Jessica Minion, Saskatchewan Health Authority; Canadian Nosocomial Infection Surveillance Program, Canada
  • Michael Mulvey, Public Health Agency of Canada; Canadian Nosocomial Infection Surveillance Program, Canada
  • Sonja Musto, Health Sciences Centre; Canadian Nosocomial Infection Surveillance Program, Canada
  • Ewa Rajda, McGill University Health Centre; Canadian Nosocomial Infection Surveillance Program, Canada
  • Stephanie Smith, Alberta Health Services; Canadian Nosocomial Infection Surveillance Program, Canada
  • Jocelyn Srigley, BC Children's Hospital; Canadian Nosocomial Infection Surveillance Program, Canada
  • Kathryn Suh, The Ottawa Hospital; Canadian Nosocomial Infection Surveillance Program, Canada
  • Nisha Thampi, Children's Hospital of Eastern Ontario; Canadian Nosocomial Infection Surveillance Program, Canada
  • Jen Tomlinson, Health Sciences Centre; Canadian Nosocomial Infection Surveillance Program, Canada
  • Titus Wong, Vancouver Coastal Health; Canadian Nosocomial Infection Surveillance Program, Canada
  • Laura Mataseje, Public Health Agency of Canada; Canadian Nosocomial Infection Surveillance Program, Canada


Presentation Overview: Show

Plasmids are extra-chromosomal DNA elements in bacteria that can transfer horizontally across strains, species, and genera. This mobility is concerning when plasmids encode genes that promote dissemination of antimicrobial resistance like carbapenemases. Plasmids typically have repetitive regions and undergo frequent recombination events which makes genome assembly and comparative genomics challenging. Here, we used an existing program, MOB-suite [Robertson & Nash 2018, DOI:10.1099/mgen.0.000206], to cluster, compare, and predict diverse plasmids associated with 829 Klebsiella pneumoniae carbapenemase (KPC)-harbouring isolates across Canada. We found that the MOB-cluster default Mash distance clustering threshold resulted in 28 plasmid clusters that broadly reflected incompatibility (Inc) groups. For isolates with an incomplete carbapenemase plasmid assembly (n=635), we reconstructed the carbapenemase plasmids using MOB-recon with a custom plasmid database and predicted 95 % (n=605) of isolates had plasmids that grouped in the 28 clusters. We identified different patterns of carbapenemase mobilization across Canada related to different plasmid clusters including clonal transmission of IncF plasmids (n=87, 10%) and horizontal transmission of IncL/M (n=142, 17%) and IncN (n=149, 18%) plasmids. MOB-suite effectively grouped plasmids using an approach that was tolerant to plasmid sequence heterogeneity, ultimately allowing us to compare diverse plasmid families and draw biologically meaningful conclusions.

C-234: Genomic Evolution of the Poxviridae Family
Track: EvolCompGen
  • Christian Zmasek, J. Craig Venter Institute, United States
  • Elliot Lefkowitz, UAB School of Medicine, United States
  • Anna Niewiadomska, J. Craig Venter Institute, United States
  • Richard Scheuermann, J. Craig Venter Institute, United States


Presentation Overview: Show

Poxviridae is a family of double-stranded DNA viruses with relatively large genomes, usually encoding more than 150 proteins. Poxviridae contain several species that can infect humans and domestic animals, such as Variola virus (smallpox), monkeypox virus, and cowpox virus.

Here we present a systematic phylogenetic and evolutionary study based on protein domain architecture, encompassing the entire proteomes of Poxviridae. The main findings of this work are that the genomic evolution of Poxviridae is complex, with significant duplications, gains, and losses of proteins. In particular, the families of Kelch-like proteins, as well as proteins containing Ankyrin-repeats, exhibit evolutionary histories which are similar to eukaryotic proteins in their complex interplay of speciation and gene duplication events. One the other hand, essential proteins, such as DNA polymerase and other proteins of the replication machinery, evolved in a more simply manner, by speciation. Interestingly, proteins that interact with host proteins show the greatest variability between different species.

We used the results of this analysis to develop a novel classification system for Poxviridae proteins. Besides clustering proteins into “Strict Ortholog Groups (SOGs)” of orthologous proteins with shared domain architecture, our scheme also allows the user to quickly infer the taxonomic distribution for any Poxviridae protein.

C-235: Uncovering the Dynamics of CRISPR Array Evolution with a New Maximum Likelihood Approach
Track: EvolCompGen
  • Axel Fehrenbach, University of Tübingen, Germany
  • Alexander Mitrofanov, University of Freiburg, Germany
  • Omer Alkhnbashi, King Fahd University of Petroleum and Minerals, Saudi Arabia
  • Rolf Backofen, University of Freiburg, Germany
  • Franz Baumdicker, University of Tübingen, Germany


Presentation Overview: Show

The CRISPR-Cas technology has revolutionized gene-editing by allowing precise and efficient editing of DNA sequences. Initially discovered within bacteria and archaea, CRISPR-Cas serves as a powerful immune system that effectively defends against foreign invaders by incorporating short snippets of DNA, called spacers, into the CRISPR array within the cell’s genome. Notably, insertions occur at one end of the CRISPR array, therefore they provide a chronology of foreign invasions. The inserted spacers are utilized to identify and eliminate matching foreign DNA during subsequent invasions. CRISPR arrays rapidly evolve due to spacer insertions and deletions.
Commonly used tools for ancestral reconstruction are unsuitable for CRISPR arrays as they do not consider the insertion order.
We introduce SpacerPlacer, a tool that utilizes probabilistic models of CRISPR array evolution and a maximum-likelihood approach to reconstruct ancestral states of a group of CRISPR arrays while respecting the insertion order.
With SpacerPlacer we analyzed a large database of CRISPR arrays to estimate their evolutionary behavior and compare between CRISPR types and different species. Interestingly, we found that spacer deletions are not more frequent at the back end of the array and that multiple spacers are likely to be lost in blocks rather than exclusively individually.

C-236: SonicParanoid2: fast, accurate and comprehensive orthology inference with machine learning and language models
Track: EvolCompGen
  • Salvatore Cosentino, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan, Japan
  • Wataru Iwasaki, Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan, Japan


Presentation Overview: Show

Accurate inference of orthologous genes constitutes a prerequisite for various genomic and evolutionary studies. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and sensitivity have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we report an update of SonicParanoid in which machine learning is used to overcome these two limitations. An AdaBoost classifier reduced execution time for the all-versus-all alignment up to 42% without negative effects on the accuracy. A Doc2Vec neural network model enabled orthology inference at the domain level and increased the number of predicted orthologs by one-third. Evaluation on standardized benchmark datasets and a huge 2,000 MAGs dataset showed that SonicParanoid2 is up to 18X faster, more scalable than other orthology-inference tools, and comparably accurate to well-established methods.

C-237: A deep generative model for deciphering the relationship between intratumor clonal structure and epigenomic heterogeneity
Track: EvolCompGen
  • Taichi Hibi, Division of Systems Biology, Nagoya University Graduate School of Medicine, Japan
  • Yasuhiro Kojima, Laboratory of Computational Life Science, National Cancer Center Research Institute, Japan
  • Teppei Shimamura, Division of Systems Biology, Nagoya University Graduate School of Medicine, Japan


Presentation Overview: Show

The mitochondrial single-cell assay for transposase-accessible chromatin with sequencing (mtscATAC-seq) allows for simultaneous high-throughput mitochondrial DNA (mtDNA) genotyping and accessible chromatin profiling. This technique facilitates the identification of mtDNA somatic mutations and subclonal heterogeneity within a tumor while providing information on cell state and accessible chromatin for each individual cell. However, current analytical methods deterministically assign each cell to a single subclone based solely on mtDNA somatic mutation information, without considering the potential for stochastic assignment to subclones that may be influenced by epigenetic states. Here, we propose a deep generative model based on the Variational Autoencoder (VAE) to decipher the probabilistic relationships between mtDNA genotype and epigenetic state. This approach uses the chromatin profiles of each cell to infer the epigenetic state and estimate the probability of assignment to subclones based on the epigeneitic state and mtDNA genotype. We applied this method to mtscATAC-seq data on 5,610 B cells from a patient with chronic lymphocytic leukemia (CLL) and identified the unique epigenetic signatures of the specific subclone. Our method will allow us to identify malignant subpopulations within subclones and their epigenetic characteristics, which will further advance the study of intratumor heterogeneity.

C-238: UFCG: database of universal fungal core genes and pipeline for genome-wide phylogenetic analysis of fungi
Track: EvolCompGen
  • Dongwook Kim, Seoul National University, South Korea
  • Cameron L.M. Gilchrist, Seoul National University, South Korea
  • Jongsik Chun, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

In phylogenomics the evolutionary relationship of organisms is studied by their genomic information. A common approach to phylogenomics is to extract related genes from each organism, build a multiple sequence alignment and then reconstruct evolution relations through a phylogenetic tree. Often a set of highly conserved genes occurring in single-copy, called core genes, are used for this analysis, as they allow efficient automation within a taxonomic clade. Here we introduce the Universal Fungal Core Genes (UFCG) database and pipeline for genome-wide phylogenetic analysis of fungi. The UFCG database consists of 61 curated fungal marker genes, including a novel set of 41 computationally derived core genes and 20 canonical genes derived from literature, as well as marker gene sequences extracted from publicly available fungal genomes. Furthermore, we provide an easy-to-use, fully automated and open-source pipeline for marker gene extraction, training and phylogenetic tree reconstruction. The UFCG pipeline can identify marker genes from genomic, proteomic and transcriptomic data, while producing phylogenies consistent with those previously reported, and is publicly available together with the UFCG database at https://ufcg.steineggerlab.com.

C-239: Predicting Cancer-Protective Variants using comparative genomics
Track: EvolCompGen
  • Yuval Tabach, The Hebrew University-Hadassah Medical School, Israel
  • Lamis Naddaf, The Hebrew University-Hadassah Medical School, Israel


Presentation Overview: Show

Cancer is a leading cause of mortality. While much work has been done to identify germline and somatic mutations that increase the risk of cancer, there has been little understanding of genomic variants that reduce cancer risk. Systematically Identifying cancer-protecting genetic variations is almost impossible in the human population. It needs massive genomic and clinical data that are currently very limited. However, there are notable differences in the cancer rates among different animal species where some display almost total immunity to the disease. Through the utilization of comparative genomics, we identify genetic variants that are distinctive to cancer resistance rodents and predict cancer risk across mammals. In the human population, we identify 1,000 Resistant Alleles (SNPs) with lower prevalence among cancer patients. We validated two R-Alleles and show reduced cancerous characteristics in cancer cell cultures, with no noticeable phenotypic changes in healthy human cell cultures. Overall, we generated a cross-species map of R-alleles that are correlated with cancer resistance across various species. These findings have significant implications for understanding the evolution of cancer resistance, understanding the genotype-phenotype relationship, and improving cancer risk assessment, diagnosis, prognosis, and the discovery of protective drugs.

C-240: Clade Identification and Understanding Evolutionary Trajectory of Candida auris through Genome Rearrangements
Track: EvolCompGen
  • Pavitra Selvakumar, The Institute of Mathematical Sciences, (HBNI), Chennai, Tamil Nadu, India
  • Rahul Siddharthan, The Institute of Mathematical Sciences, (HBNI), Chennai, Tamil Nadu, India
  • Aswathy Narayanan, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, Karnataka, India
  • Kaustuv Sanyal, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, Karnataka, India


Presentation Overview: Show

Candida auris, a multidrug-resistant human fungal pathogen has emerged and evolved as different clades across the globe in the past decade. C.auris clinical strains exhibit clade- specific features associated with virulence and drug resistance. The molecular events leading to the rapid emergence are yet to be understood. Here, chromosomal rearrangements among C.auris clades and related species are investigated with primary focus on centromeres, to understand its evolutionary trajectory. Centromeres, known to be the hotspots of breaks and downstream rearrangements are identified using a combined approach of chromatin immunoprecipitation and comparative genomic analysis. We find that C.auris and multiple other species in the Clavispora/Candida clade share a conserved small regional GC poor centromeric landscape that lack in pericentromeres and repeats. A centromere inactivation event has led to karyotypic alterations in the species complex. It is observed that one of the geographical clades, the East Asian Clade, has evolved along a unique trajectory compared to other clades and related species. Consequent to this rapid evolution, recently reported strains are indicating cross identification within the previously defined four distinct geographical clades. A rapid and specific colony PCR-based clade identification system (CLaID) is developed using unique DNA sequence junctions conserved in clade-specific manner.

C-241: Identification of genes, cell types, and phenotypes under selective pressure across primate evolution
Track: EvolCompGen
  • Kitty Murphy, UK Dementia Research Institute at Imperial College London, United Kingdom
  • Brian Schilder, UK Dementia Research Institute at Imperial College London, United Kingdom
  • Nathan Skene, UK Dementia Research Institute at Imperial College London, United Kingdom


Presentation Overview: Show

Comparative genomics can help to decipher the evolutionary origins of human traits and disease susceptibility. Using a multiple sequence alignment of 27 primate genomes, we detected genome-wide selective pressures acting at key speciation events across primate evolution. To interpret these genome-wide selection signatures, we performed cell type enrichment tests using single-cell transcriptomic signatures of 100s of different cell types across multiple organ systems. Positive selection in higher order primates is associated with ependymal cells and subventricular zone radial glia-like cells, respectively, both of which have been associated with neurogenesis and cortical size expansion. We also connect positive selection in great apes to perivascular macrophages, a cell type associated with Alzheimer’s disease. Negative selection is associated with neuronal cell types, rather than glial. We also evaluated the enrichment of genes under selective pressure using the Human Phenotype Ontology and genome-wide association studies. Complex trait variants are predominantly enriched within negatively selected genes, whereas in rare disease-associated phenotypes we identify signatures of positive selection. Our study provides a rich resource of genome-wide selective pressures across primate evolution, and links these pressures to cell types, disease risk, and clinically relevant phenotypes, providing insights into the molecular basis of human traits and disease susceptibility.

C-242: GRITIC sheds light on the evolution of copy number gains in genome doubled tumors
Track: EvolCompGen
  • Toby Baker, The Francis Crick Institute, London, United Kingdom
  • Siqi Lai, Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States
  • Stefan Dentro, DKFZ, Heidelberg, Germany
  • Maxime Tarabichi, Institute for Interdisciplinary Research (IRIBHM), Université Libre de Bruxelles, Brussels, Belgium
  • Peter Van Loo, Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States


Presentation Overview: Show

Tumors frequently have a high degree of copy number instability and often contain genomic regions that have undergone a series of genomic gains resulting in multiple copies of both alleles. This is particularly common in tumors that have undergone a whole genome duplication (WGD).

The relative timing of gains in such complex copy number regions is complicated by the fact that there are multiple plausible evolutionary histories that could give rise to the final copy number state, with the most parsimonious history often assumed. Here we describe a method, GRITIC, that overcomes this problem by inferring the likelihood of all possible routes and associated gain timings that lead to a complex state.

By applying GRITIC to 5718 tumors with a WGD, we measure an average posterior probability of 31.7% on non-parsimonious route histories across complex copy number states. As this was measured with a penalty on non-parsimony, we are likely underestimating the true amount of non-parsimonious evolution in tumor development.

GRITIC allows for a more accurate and complete inference of evolutionary histories in different cancer types and better insights into the early copy number events in genomically unstable tumors.

C-243: A quartet-based approach for inferring phylogenetically informative features from genomic and phenomic data
Track: EvolCompGen
  • Vivian B. Brandenburg, Bioinformatics Group, Ruhr-University Bochum, Germany
  • Ben Luis Hack, Bioinformatics Group, Ruhr-University Bochum, Germany
  • Axel Mosig, Bioinformatics Group, Ruhr-University Bochum, Germany


Presentation Overview: Show

The quest to infer phylogenetic trees from genomic sequences is at the heart of biology and has been the subject of computational methods developments in the field of molecular evolution over several decades. Building on well-established phylogenetic trees for many domains of life, our present work addresses a different, but related question: Given phenotypic data across a group of organisms which are related along an established phylogenetic tree, how can we infer phenotypic characters in the data that are phylogenetically informative? In the presented work, we introduce a neural network based approach to identify features in genomic or phenotypic data that evolve along a given phylogenetic tree. Our methodological main contribution is a loss function that reinforces learning of tree-compliant features by operating on quartets of taxa. Our approach builds on quartet theory as the well-established foundation of distance-based phylogeny. As our main result, we provide a proof-of-concept for the approach by demonstrating that a neural network, when trained using quartet loss, can learn features from bacterial rRNA sequences that evolve along a known phylogeny of the underlying bacterial species.

C-244: Improving genome variation calls from non-human sequencing data using machine learning
Track: EvolCompGen
  • Jeonghoon Choi, Pusan National University, South Korea
  • Bo Zhou, Stanford University, United States
  • Gwanghoon Jung, Pusan National University, South Korea
  • Minsu Kim, Pusan National University, South Korea
  • Donggil Kang, Pusan National University, South Korea
  • Giltae Song, Pusan National University, South Korea


Presentation Overview: Show

DeepVariant is a pipeline for accurate genome variation detection using convolutional neural networks that incorporate known genotype information [1]. Although the DeepVariant tool is one of the most popular tools for calling genome variation, its performance for non-human genome data remains suboptimal due to some preprocessing steps that rely on population genome datasets such as indel realignment and base recalibration, which are not available for non-human genomes.

To resolve this issue, we propose an approach based on machine learning for filtering out false positive genome variation calls generated by DeepVariant for non-human sequencing data. To mimic non-human genome variation calling situations, we skip the preprocessing steps for genome alignments. We train a model using genome data of which ground truth variation calls have been already determined such as HG002 from Genome In A Bottle (GIAB) [2] to identify false positives among variations called by DeepVariant. For building the model, we apply decision tree based ensemble approaches. We evaluate our model using other genomes of which ground truth variation calls are available. We expect that our model can accelerate genome variation studies of non-human species.

C-245: Efficient homology-based annotation of transposable elements using minimizers
Track: EvolCompGen
  • Laura Natalia González García, Universidad de los Andes, Université de Montpellier, Colombia
  • Daniela Lozano Arce, Universidad de los Andes, Colombia
  • Juan Pablo Londoño, Universidad de los Andes, Colombia
  • Romain Guyot, Université de Montpellier, France
  • Jorge Duitama, Universidad de los Andes, Colombia


Presentation Overview: Show

Transposable elements (TEs) make up more than half of the genomes of complex plant species and can modulate the expression of neighboring genes, producing significant variability of agronomically relevant traits. The availability of long-read sequencing technologies allows the building of genome assemblies for plant species with large and complex genomes. Unfortunately, TE annotation currently represents a bottleneck in the annotation of genome assemblies. We present a new functionality of the Next-Generation Sequencing Experience Platform (NGSEP) to perform efficient homology-based TE annotation. Sequences in a TEs reference library are treated as long reads and mapped to an input genome assembly using minimizers. A hierarchical annotation is then assigned by homology using the annotation of the reference library. We tested the performance of our algorithm on genome assemblies of different plant species, including Arabidopsis thaliana, Oryza sativa, Coffea humblotiana, and Triticum aestivum (bread wheat). Our algorithm outperforms traditional homology-based annotation tools in speed by a factor of three to >20, reducing the annotation time of the T. aestivum genome from months to hours, and recovering up to 80% of TEs annotated with RepeatMasker with a precision of up to 0.95.

C-246: A Probabilistic Programming Approach to Investigate the Coevolution of Genes and Phenotypes in Birds
Track: EvolCompGen
  • Viktor Senderov, Ecole normale supérieure, France
  • Amaury Lambert, École Normale Supérieure, France
  • Marie Manceau, Collège de France, France
  • Carole Desmarquet, Collège de France, France
  • Caitlyn Jean-Baptiste, Ecole normale supérieure, France
  • Ingrid Lafontaine, Sorbonne Université, France
  • Hélène Morlon, Ecole normale supérieure, France


Presentation Overview: Show

Understanding the molecular basis of phenotypic evolution is essential, yet quantitative tools for exploring this association are limited. We present progress on a novel phylogenetic tool using TreePPL, a universal probabilistic programming language, for identifying genomic regions coevolving with phenotypes at the macroevolutionary scale. Our Monte-Carlo inference-based computational framework allows the detection of simultaneous evolution between DNA sequences and phenotypes across a phylogeny. Simulations demonstrate the ability to identify simultaneous versus independent evolutionary events.

We apply our framework to study the molecular basis of bird color patterns, using a dataset comprising high-resolution images and measurements of bird color patterns and homologous sequences of pattern formation genes (e.g., agouti). By applying our new phylogenetic tool to these data, we aim to detect simultaneous evolution between DNA sequences and phenotypes and discover associations between development and evolution. This will provide a proof of concept for our phylogenetic approach's ability to detect genes underlying phenotypic differences, with potential applications to other genes and systems. Our approach should offer a valuable tool for the scientific community.

C-247: Vesicular transport proteins in Asgard archaea
Track: EvolCompGen
  • Deepak Yadav, Department of Computational Biology, University of Lausanne, Switzerland
  • Dirk Fasshauer, Department of Computational Biology, University of Lausanne, Switzerland


Presentation Overview: Show

A eukaryotic cell is defined by its membrane-bound organelles. They help define the distinct compartments, which perform different metabolic roles. The exchange of metabolites between different components is controlled by vesicular transport. This transport must be selective to maintain proper functioning of a eukaryotic cell. The selectivity is maintained by several proteins, like SNARE, Rab and Arf proteins. Recent discovery of Asgard archaea has bridged the gap between prokaryotes and eukaryotes. Asgard archaea code for a range of Eukaryotic Signature Proteins (ESPs) including proteins involved in vesicle transport. Here, we study the phylogenetic relationship of Arf-like proteins of Asgard archaea and eukaryotic Arf proteins. Arf-like proteins in Asgard archaea were extracted using hidden Markov models (HMMs) of eukaryotic Arf proteins. The clustering results show that Asgard proteins do not follow the same classification as eukaryotes. The phylogenetic analysis illustrates that Asgard clades are distant from eukaryotic clades. The clades of Arf-like proteins in Asgard archaea display sub-patterns, which are unique to them. Overall, our results align with the theory of Asgard archaea as a “bridge” between eukaryotes and archaea. As a further scope, the project will be extended to other archaeal superphyla like TACK which contain few ESPs as well.

C-248: Multiple RNA tree Robinson-Foulds Phylogeny
Track: EvolCompGen
  • Yoann Anselmetti, University of Sherbrooke, Canada
  • Aïda Ouangraoua, University of Sherbrooke, Canada


Presentation Overview: Show

Over the last three decades, many methods were developed to predict the secondary structure of ncRNAs and build accurate ncRNA multiple sequence alignments accounting for their secondary structure. But until now, only a few algorithms and methods were designed to study the evolution of ncRNA secondary structure, and none of them allows to reconstruct the complete evolutionary history of the secondary structures of a ncRNA family.
In this talk, we consider the Small Parsimony and the Large Parsimony problems for families of ncRNAs whose secondary structures are represented as trees. For these two optimization problems, we have designed heuristic solutions under the Robinson-Foulds (RF) tree metric model. We study the theoretical complexity of the problems under the RF distance model, as well as the tree edit distance model, and provide efficient algorithmic solutions for the two problems under the two tree distance models. The study of the evolution of ncRNA structures has the potential to lead to interesting insights for therapeutic targeting of ncRNAs based on the comparison of their structures involved in metabolic pathways in different species, thus combining genomic, transcriptomic and metabolomic information.

C-250: A2TEA: Identifying trait-specific evolutionary adaptations
Track: EvolCompGen
  • Florian Boecker, Crop Bioinformatics Uni Bonn, Germany
  • Heiko Schoof, University of Bonn, Germany
  • Tyll Stöcker, University of Bonn, Germany
  • Carolin Uebermuth-Feldhaus, University of Bonn, Germany


Presentation Overview: Show

We explore genetic diversity and uncover evolutionary adaptation to stresses by exploiting genome comparisons and RNA-Seq data sets from stress experiments in plants.
Expanded gene families that are stress-responsive based on differential expression analysis could hint at species or clade-specific adaptation traits, making these gene families especially interesting candidates for follow-up tolerance studies and crop improvement.
To address the integration, transformation, filtering and visualization of such cross-species omics data, we developed A2TEA: Automated Assessment of Trait-specific Evolutionary Adaptations, a Snakemake workflow for detecting adaptation footprints in silico. It functions as a one-stop processing pipeline, integrating phylogeny, expression, and protein function analysis. The pipeline is accompanied by an R Shiny Web Application that allows exploring, highlighting, and exporting the results interactively. This allows the user to formulate hypotheses regarding the genomic adaptations of one or a subset of the investigated species to a given stress.
While our research focus is on crops, the pipeline is completely independent of the underlying species and can be used with any set of species.

C-251: Deciphering Genetic Heterogeneity through Multi-sample Analysis of Glioblastoma
Track: EvolCompGen
  • Aaron Gillmor, University of Calgary, Canada
  • Heewon Seo, University of Calgary, Canada
  • Marco Gallo, University of Calgary, Canada
  • Sorana Morrissy, University of Calgary, Canada


Presentation Overview: Show

Glioblastoma is an aggressive and heterogeneous brain tumor with a poor prognosis. In this study, we have developed novel methods to reconstruct the evolutionary history of glioblastoma using multiple samples from individual patients. We collected a diverse set of samples, generated mutational input, and used the phylogenetic reconstruction tool, Pairtree to infer the phylogenetic architecture of glioblastoma. We analyzed the phylogenetic output within each sample and patient, examining the number of clones, branch points, overall diversity, and functional pathways. Throughout our research, we found that multiple samples were necessary to accurately assess tumor evolution. We identified positive correlations between the number of clones, branch points, and samples. Recurrent tumors exhibited increased clonal diversity compared to primary tumors, and branching phylogenies were associated with an increased survival length. Using linked reads we validated 88% of clones and 78% of branches that were inferred by Pairtree. Additionally, linked reads allowed us to identify the existence of a novel branch in five patients. Functional analysis identified recurring pathways that drive tumor progression, such as microtubule modification, ATP-related pathways and mitotic spindle formation, among others. Overall, this study highlights the importance of studying genetic heterogeneity to improve our understanding of glioblastoma.

C-252: Improved interpretability of bacterial genome-wide associations using gene cluster centric k-mers
Track: EvolCompGen
  • Hannes Neubauer, Twincore/Hannover Medical School (MHH), Germany
  • Marco Galardini, Twincore/Hannover Medical School (MHH), Germany


Presentation Overview: Show

The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial datasets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent datasets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and available at https://github.com/microbial-pangenomes-lab/panfeed.

C-253: Fast and performant pipeline for coevolutionary analysis of eukaryotic genes
Track: EvolCompGen
  • Giulia Sassi, University of Parma-Department of Chemistry, Life Sciences and Environmental Sustainability, Italy
  • Carlo De Rito, University of Parma-Department of Chemistry, Life Sciences and Environmental Sustainability, Italy
  • Riccardo Percudani, University of Parma-Department of Chemistry, Life Sciences and Environmental Sustainability, Italy


Presentation Overview: Show

Coevolution can be used to predict gene function, as in the case of phylogenetic profiling (PP) methods. PP delineates, with binary (presence/absence) vectors, the evolutionary association among genes. As an alternative to assessing global similarity between profiles, we have recently described the co-transition (cotr) analysis as a method to score and determine the significance of correlated transitions between gene pairs across phylogenetically ordered genomes (https://doi.org/10.1073/pnas.2218329120). Cotr analysis can find coevolutionary associations even among genes with low profile similarity. We propose an extended procedure as a first step to investigate the influence of OGs construction in the coevolutionary analysis. The process consists of: 1) EukProt species subselection, 2) Broccoli orthology inference, 3) ClustalΩ intra-OG multiple sequence alignment (MSA), 4) hmmbuild for HMM construction from each MSA, 5) hmmsearch to compare HMM against OrthoDB sequences and to recover orthologous in 1929 species, 6) PPs building and metric analysis through cotr analysis. Our in-depth pipeline is able to build from scratch the OGs and assign significant coevolutionary scores (adjusted P-values < 10-3) to 36,541 co-transitions between gene pairs in a manageable time. This analysis revealed novel coevolutionary associations and testable gene functions.

C-254: msyd: Identifying whole-genome population synteny
Track: EvolCompGen
  • Leon Rauschning, Ludwig Maximilian University and Technical University of Munich, Germany
  • Manish Goel, Ludwig Maximilian University and Max Planck Institute for Plant Breeding Research, Germany
  • Korbinian Schneeberger, Ludwig Maximilian University and Max Planck Institute for Plant Breeding Research, Germany


Presentation Overview: Show

Synteny is a core concept in molecular genetics, but is frequently not integrated into population genetic, phylogenetic and functional genomic analyses, despite the need to ensure that the loci being compared correspond to another and are able to recombine.
Identifying synteny across multiple genomes is currently done on a per-gene basis, discarding the information present in nongenic regions of the genome.
Additional challenges for this approach are paralogs and undiscovered structural variation.
We present msyd, a software tool and Python library for merging pairwise synteny callsets generated by SyRI from pairwise whole-genome alignments into a population-level multisynteny callset, taking into account structural variation and covering the entire genome.
msyd can also aggregate VCF files to identify and merge corresponding SNPs and indels, and filter for variants syntenic across many samples for meaningful comparison.
Using msyd, we identify multisynteny in the A. thaliana population dataset published by Jiao and Schneeberger (2020) and the Human Genome Structural Variation Consortium dataset and identify variants for further genomic analyses.

C-255: Elucidating the genomic and transcriptomic landscapes of primary and metastatic Paediatric Medulloblastoma
Track: EvolCompGen
  • Ana Isabel Castillo Orozco, Research Institute of the McGill University Health Center, Canada
  • Masoomeh Aghababazadeh, Research Institute of the McGill University Health Center, Canada
  • Marjan Khatami, Research Institute of the McGill University Health Center, Canada
  • Niusha Khazaei, Research Institute of the McGill University Health Center, Canada
  • Geoffroy Louis Yvon Danieau, Research Institute of the McGill University Health Center, Canada
  • Livia Garzia, Research Institute of the McGill University Health Center, Canada


Presentation Overview: Show

Medulloblastoma (MB) is a highly aggressive and the most common pediatric brain tumor. It arises mainly in the cerebellum and can metastasize to the leptomeningeal space, known as Leptomeningeal Disease (LMD). The presence of metastatic dissemination is a universal predictor of poor outcome among MB patients. Although LMD represents a main clinical challenge, its molecular mechanisms remain poorly characterized. Recent research has shown that primary and metastases diverge dramatically. Our work has focused on establishing MB Patient-Derived Xenografts (PDXs) that faithfully replicate primary and metastatic compartments. We have performed comparative genomic analyses to profile the intertumoral LMD heterogeneity and to identify genetic drivers/pathways that sustain metastatic development. Our results show profound differences in gene expression between primary and metastatic Medulloblastoma, with various signaling pathways enriched across LMD models. We also have identified differentially expressed genes (DEG), with few gene sets shared in more than one PDX model. These findings were concordant with DEA results and single-cell atlas from GEO expression omnibus datasets for breast and lung cancer metastatic to the leptomeninges. Our recent findings are in progress for validation using functional genomics approaches. We aim to use this information to treat or prevent leptomeningeal disease (LMD) effectively.

C-256: Genome-scale compression-based phylogeny estimation: An improved approach that uses the physicochemical properties of amino acids.
Track: EvolCompGen
  • Edward Braun, University of Florida, United States


Presentation Overview: Show

Phylogenomic analyses face several fundamental challenges. First, they should use methods that address sources of bias common to many phylogenetic datasets. Second, they should be robust to multiple sequence alignment error. Bias in phylogenetic estimation can reflect issues like long-branch attraction, but a major source of bias in phylogenomics is discordance among gene trees due to processes like incomplete lineage sorting (ILS). Distance-based phylogenetic methods can be a consistent estimator of the species tree under these conditions, raising the possibility that alignment-free distance methods could solve the challenges associated with ILS along and multiple sequence alignment error. Alignment-free approaches that calculate distances using conditional Kolmogorov complexity of the genome (or proteome) of one organism given the genome (or proteome) of another organism have been suggested but have received limited use in empirical studies. This reflects the fact that: 1) models of sequence evolution cannot be incorporated into the methods; and 2) it is difficult to estimating clade support using these methods. Herein, I provide a way to incorporate information about patterns of protein sequence evolution and estimate clade support similar to the bootstrap. The utility of this modified alignment-free distance method was demonstrated using empirical phylogenies of mammals and birds.

C-257: PhyClone: Accurate Bayesian reconstruction of cancer phylogenies from bulk sequencing
Track: EvolCompGen
  • Emilia Hurtado, The University of British Columbia, Canada
  • Alexandre Bouchard-Côté, The University of British Columbia, Canada
  • Andrew Roth, The University of British Columbia, Canada


Presentation Overview: Show

Cancer is driven by somatic mutations that result in genomically distinct sub-populations of cells called clones. Identifying the clonal composition of tumours and understanding the evolutionary relationships between clones is crucial in cancer studies. Previous methods have limitations in inferring the phylogeny and capturing the uncertainty in mutational clustering from bulk DNA sequencing data.
Leveraging the clonal population deconvolution model of PyClone, we present an accurate, efficient, and robust method for constructing clonal phylogenies — PhyClone. It uses a novel non-parametric Bayesian prior called the Forest Structured Chinese Restaurant Process (FSCRP) to capture the underlying distribution of clusters and tree topologies. A Particle Gibbs sampler based on a novel auxiliary variable construction is used to fit PhyClone. Furthermore, through outlier modelling, PhyClone is robust to violations of the infinite sites assumption common to cancer datasets.
We demonstrate the performance of PhyClone on simulated and real-world datasets and show that it outperforms previous methods in terms of accuracy and scalability. PhyClone accurately clusters mutations into clonal groups and reconstructs their phylogenetic relationships, remaining accurate when mutations which violate the infinite sites assumption are present. PhyClone thereby presents a scalable, accurate, and robust solution to inferring clonal phylogenies from bulk sequencing data.

C-258: Probing domain architecture design using language models
Track: EvolCompGen
  • Xiaoyue Cui, Carnegie Mellon University, United States
  • Maureen Stolzer, Carnegie Mellon University, United States
  • Dannie Durand, Carnegie Mellon University, United States


Presentation Overview: Show

Multidomain proteins are mosaics of structural or functional modules, called domains. The architecture of a multidomain protein - that is, its domain composition in N- to C-terminal order - is intimately related to its function, with each module playing a distinct functional role. For example, in cell signaling proteins, distinct domains are responsible for recognition and response to a stimulus. Multidomain architectures evolve via gain and loss of domain-encoding segments. This evolutionary exploration of domain architecture composition underlies the protein diversity seen in nature.
We exploit sophisticated machine learning algorithms for natural language processing, combined with a rapidly expanding repertoire of domain architecture data, to develop a framework for investigating the forces that govern this process. We represent domain architectures as vectors in a multidimensional space by applying various information retrieval and natural language processing techniques. This system provides a basis for exploratory analysis using visualization with a nonlinear dimensionality reduction method. We further provide quantitative measures that support rigorous comparison of sets of embedded domain architectures. This framework has many applications, including investigating taxonomic differences in the domain architecture complement, identifying domain ""synonyms"" with similar functional roles, and exploring substructure of the world of domain architectures.

C-259: An extended super-reconciliation model with synteny cuts and transfers through unsampled or extinct lineages
Track: EvolCompGen
  • Mattéo Delabre, University of Montreal, Canada
  • Yoann Anselmetti, University of Sherbrooke, Canada
  • Nadia El-Mabrouk, University of Montreal, Canada


Presentation Overview: Show

The gene tree-species tree reconciliation framework enables the inference of evolutionary histories for gene families. Various extensions of this model have been proposed to infer the histories of gene syntenies, one of which is super-reconciliation, where a synteny tree is reconciled with a species tree.

In this work, we investigate an extended model for super-reconciliation. In addition to segmental duplications, horizontal transfers, and losses, our model accounts for synteny splits and gene gains. We explicitly model the possibility of transient transfers going through unsampled or lost lineages.

We examine the combinatorial properties of this extended model and its associated parsimony optimization problem, and introduce a polynomial-time optimization algorithm. We evaluate the algorithm’s performance against other state-of-the-art approaches using simulated datasets.

Finally, we apply our algorithm to studying the evolution of CRISPR-Cas systems. We discuss the challenges and solutions for constructing a synteny tree from sequence data and for defining event costs, two key steps necessary to running the algorithm.

C-260: Zoonosis Prediction Using Language Models
Track: EvolCompGen
  • Blessy Antony, Virginia Polytechnic Institute and State University, United States
  • Jie Bu, Virginia Polytechnic Institute and State University, United States
  • Andrew Chan, Virginia Polytechnic Institute and State University, United States
  • Anuj Karpatne, Virginia Polytechnic Institute and State University, United States
  • T. M. Murali, Virginia Polytechnic Institute and State University, United States


Presentation Overview: Show

Zoonoses are diseases that are transmitted from non-human animals to humans through the evolution of the disease-causing pathogens. Identifying which species may be infected by a novel virus is an important first step in predicting and preventing the outbreak of an infectious disease in animal and human populations. In this study, we proposed a computational framework to understand the zoonotic potential of viruses. We trained a Transformer-based model on viral protein sequences to learn the language of the constituting amino acids. For this purpose, we used the collection of protein sequences from a diverse set of viruses. Further, we used one-dimensional convolution to gather local neighborhood features in viral sequences. We evaluated the performance of this proposed model in the challenging multi-class classification setting of predicting the animal hosts of a given virus sequence. The Transformer-based model yielded substantially higher AUPRC scores compared to standard machine learning classification algorithms. In ongoing research, we are developing interpretations of model results to discover the genetic mutations that may drive viral zoonoses.

C-261: Bladder Cancer Evolution: Population genetics-based modeling of Whole-Organ Mapping data
Track: EvolCompGen
  • Paweł Kuś, Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland, Poland
  • Huiqin Chen, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA, United States
  • Peng Wei, Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA, United States
  • Bogdan Czerniak, Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA, United States
  • Marek Kimmel, Department of Statistics and Bioengineering, Rice University, Houston, TX, USA, United States


Presentation Overview: Show

The widespread use of DNA sequencing experiments has enabled the development of methods to analyze cancer evolution. Population genetics-based approaches, such as MOBSTER, rely on the assumptions of exponential tumor growth and constant mutation rate to estimate the evolutionary dynamics parameters, including selection coefficients and mutation rates. Under these conditions, the frequencies of neutral mutations follow a power-law distribution with an exponent equal to 2, from which tumor mutation rates can be inferred.

We analyzed the DNA sequencing data from two bladder cancer specimens subjected to whole-organ mapping, including regions with different disease stages. Using our new R package cevomod, we fitted the power-law models to the Whole Exome Sequencing data. We observed the increased mutation rates and early clonal expansions in normal urothelium samples, supporting the field effect origin of the bladder cancer. Mutation rates were increasing with progression toward urothelial cancer, violating the assumptions of popular models. Power-law exponents diverged from the expected value of 2 in many samples, which we mathematically linked to the non-constant tumor mutation rate.

This work was co-funded by the European Social Fund grant POWR.03.02.00-00-I029 (PK), NCI Genitourinary Bladder SPORE grant P50CA 91846 (HC, PW, BC), and Polish NCN grant 2021/41/B/NZ2/04134 (MK).

C-262: From Genomics to Proteomics: A Bioinformatics Toolkit for Improved Prokaryotes Classification
Track: EvolCompGen
  • Jihyeon Kim, Seoul National University, South Korea
  • Stephanie Kim, Seoul National University, South Korea
  • Martin Steinegger, Seoul National University, South Korea


Presentation Overview: Show

Addressing the pressing need for improved bacterial taxonomy above the genus level, we’ve created a methodological toolkit that integrates established bioinformatics measures and novel structural analysis. We used amino acid identity (AAI) and percentage of conserved proteins (POCP) for sequence-based analysis. Additionally, we applied a novel structure-based orthology local distance difference test (orthoLDDT) implemented in our Foldseek protein structure aligner.

Our methodological toolkit comprises (1) POCPCAL for efficient POCP calculation using MMseqs2 and (2) a reciprocal best hit (RBH) Foldseek module for orthoLDDT computation. We tested this approach on genomes and proteomes of five key phyla found in human gut and skin microbiomes, with UBCG2 serving as a reference for phylogenomic relationship inference. The performance on order level was evaluated through quartet distances, and the results showed 90%, 91%, and 89% with EzAAI, POCP, and orthoLDDT, respectively.

Our study presents novel tools and a new measure for bacterial delineation, which promise to significantly enhance a wide spectrum of microbiome research.

C-263: Comparison of Stacks and a custom pipeline for RADSeq analysis
Track: EvolCompGen
  • Enora Geslain, KU Leuven, Belgium
  • Alvaro Cortes Calabuig, Genomics Core Leuven, Belgium
  • Sarah Maes, ILVO, Belgium
  • Gregory Maes, Nonacus Ltd, Belgium
  • Filip Volckaert, KU Leuven, Belgium


Presentation Overview: Show

Restriction-site Associated DNA (RADSeq) is a sequencing technique to scan complete genomes of organisms without sequencing them entirely. It's commonly used to identify thousands of Single Nucleotide Polymorphisms (SNPs) randomly distributed across the genome for applications in population genomics. Different tools exist to search for SNPs from RADSeq data. Here, we compare a custom pipeline using GBSx for demultiplexing and a well-established RADSeq pipeline called Stacks developed by Rochette and Catchen. We also compare two different mappers: Bowtie2 and BWA. We observed a difference between both mappers used but also between both pipelines. Furthermore, we detected a possible lane effect with the PCAs in the Bowtie2 datasets. To conclude, Bowtie2 seems to be better for the mapping but it might require more filtering to remove the lane effect.

C-264: Scalable and Precise Lineage Tree Inference from Single-Cell Variant Calls: Introducing the Scelestial Algorithm
Track: EvolCompGen
  • Mohammad Hadi Foroughmand Araabi, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany
  • Sama Goliaei, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany
  • Alice McHardy, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany


Presentation Overview: Show

We introduce Scelestial, a computationally lightweight and accurate method for reconstructing phylogenetic trees from single-cell variant calls. Scelestial leverages a Steiner tree algorithm adapted for single-cell data with missing values, enabling efficient and scalable lineage tree reconstruction. Evaluation on diverse benchmark datasets, including comparisons with state-of-the-art methods such as BitPhylogeny, OncoNEM, SCITE, SASC, SCIPhI, SiFit, and SiCloneFit, demonstrated Scelestial's superior performance in reconstructing lineage tree topology. Application to real single-cell cancer data confirms Scelestial's ability to effectively separate cancer cells from normal cells in lineage trees. Moreover, Scelestial exhibits significantly faster run times compared to other methods, making it well-suited for large-scale studies and multi-dataset meta-analyses. These findings underscore the value of Scelestial in advancing our understanding of tumor evolution and its potential for furthering computational biology research.

C-265: Investigation of potential divergence of cell type functions between human and mouse
Track: EvolCompGen
  • Hirofumi Kariyayama, Doctoral Program in Medical Sciences, University of Tsukuba, Japan
  • Haruka Ozaki, Institute of medicine, University of Tsukuba, Japan


Presentation Overview: Show

Mice are one of the most commonly used model animals in clinical research. While these experimental systems are expected to mimic the biological phenomena and pathological conditions that occur in humans, they are not always transferable. Recent advances in gene expression measurement technology have made it possible to perform comprehensive gene expression analysis at the single-cell level. Using scRNA-seq data, we expected to extract two species’ differences with higher resolution. In this study, we classified cell clusters by integrating scRNA-seq data from 9 organs between humans and mice. We searched for differentially expressed genes among two species for each cell cluster. Each cell cluster included 20 to 1,369 genes as differentially expressed genes. Next, we addressed what diseases are associated with the genes differentially expressed using disease ontology. As a result, we found a total of 537 diseases associated with 75 cell clusters. Especially, diseases with a high percentage of genes differentially expressed between the two species were included in ciliated cells and fibroblasts in the tracheas. The catalog of the differential cell types found in our study could help to contemplate the extent to which the conditions of health and disease in humans can be mirrored in mice.

C-266: Cancer origin tracing and timing in two high risk prostate cancers using multisample whole genome analysis: potential clinical value
Track: EvolCompGen
  • Jenni Kesäniemi, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • G. Steven Bova, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Peter Van Loo, Department of Genetics, The University of Texas MD Anderson Cancer Center, United States
  • David C. Wedge, Manchester Cancer Research Centre, Division of Cancer Sciences, University of Manchester, United Kingdom
  • Matti Nykter, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Tapio Visakorpi, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Teuvo L. Tammela, Tampere University Hospital, TAYS Cancer Center, Department of Urology, Finland
  • Teemu Murtola, Tampere University Hospital, TAYS Cancer Center, Department of Urology, Finland
  • Antti Kaipia, Tampere University Hospital, TAYS Cancer Center, Department of Urology, Finland
  • Jarno Riikonen, Tampere University Hospital, TAYS Cancer Center, Department of Urology, Finland
  • Irina Rinta-Kiikka, Imaging Centre, Department of Radiology, Tampere University Hospital, Finland
  • Paula Kujala, Fimlab Laboratories, Department of Pathology, Tampere University Hospital, Finland
  • Tiia Nikupaavola, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Anssi Nurminen, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Hanna Rauhala, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Juho Jasu, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Matti Kankainen, Institute for Molecular Medicine Finland, University of Helsinki, Finland
  • Antti Koskenalho, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Kerstin Haase, Charité – Universitätsmedizin Berlin, ECRC Experimental and Clinical Research Center, Germany
  • Teemu Tolonen, Fimlab Laboratories, Department of Pathology, Tampere University Hospital, Finland
  • Naser Ansari-Pour, MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  • Tom Lesluyes, The Francis Crick Institute, United Kingdom
  • Gunilla Högnäs, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Sinja Taavitsainen, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland
  • Serafiina Jaatinen, Faculty of Medicine and Health Technology, Prostate Cancer Research Center, Tampere University and Tays Cancer Center, Finland


Presentation Overview: Show

Prostate cancer (PrCa) genomic heterogeneity causes resistance to current therapies. Heterogeneity can be deciphered using evolutionary analyses, but whether they provide unique added value in the research and clinical domains remains an open question.

We analyzed 22 whole genome-sequenced sites from two men (GP5 and GP12) with high-risk PrCa using evolutionary reconstruction tools and spatio-evolutionary models. Probability models were used to trace spatial and chronological origins of the primary tumor and metastases, chart their genetic drivers, and distinguish metastatic and non-metastatic subclones.

In patient GP5, CDK12 inactivation was among the first mutations in tumorigenesis, leading to a tandem duplicator phenotype and initiating the cancer around age 50, followed by rapid cancer evolution after age 57, and metastasis around age 59, five years prior to prostatectomy. In patient GP12, accelerated cancer progression was detected after age 54, and metastasis occurred around age 56, three years prior to prostatectomy. Multiple metastasis-originating events were identified in each patient and tracked anatomically.

In this pilot, metastatic subclone content analysis appears to considerably enhance identification of key drivers and the potential impact of evolutionary analysis on therapy selection appears positive. Extending this approach to larger cohorts could add substantial biological insight and clinically relevant value.

C-267: Forecasting High Risk SARS-CoV-2 Variants: A Hybrid Modeling Approach
Track: EvolCompGen
  • Meghan Kane, Max Planck Institute for Molecular Genetics, Germany
  • Zhihao Shao, Max Planck Institute for Molecular Genetics, Germany
  • Prabhav Kalaghatgi, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany


Presentation Overview: Show

SARS-CoV-2 imposes a substantial disease burden worldwide, and its impact and prevalence fluctuate due to complex factors such as immune escape mutations and epidemiological dynamics. Insights into emerging high risk variants can guide key stakeholders such as public health authorities and vaccine manufacturers. We are taking a hybrid modeling approach encompassing epidemiological, evolutionary, and immunological components to forecast which variants are likely to become high risk and why.

We trained a model to predict if a variant will be high or low risk in a future time interval given a past time interval using Nextstrain sequence data covering the whole pandemic (6.9 million sequences). Metadata such as Pangolin lineages were utilized to construct our time series training data where they were transformed to capture their relative frequency and trajectory. Key phase changes were analyzed, e.g. the growth of AY.103, BA.2.12.1, and BA.2. For example, for the three months preceding BA 2.12.1's dominance, our model yielded 91.7% accuracy for forecasting BA2.12.1's dominance for the subsequent month. To improve our approach's robustness, we are augmenting it further with evolution and immunology-driven models.

C-268: A fully differentiable approach to Bayesian phylogenetic inference
Track: EvolCompGen
  • Takahiro Mimori, Waseda University, Japan
  • Michiaki Hamada, Waseda University, AIST-Waseda CBBD-OIL, Japan


Presentation Overview: Show

Bayesian phylogenetic inference, which quantifies the uncertainty of evolutionary trees based on a probabilistic model and molecular data of species, is a complex problem involving discrete tree topologies and continuous branch length variables. While Bayesian phylogenetic inference has been exclusively approached by Markov-chain Monte Carlo (MCMC) methods, which rely on random local movement between phylogenetic trees, a recent series of variational Bayesian (VB) approaches is a promising alternative for efficient and scalable inference beyond MCMCs by using the gradient of the model evidence to move approximate tree distributions. However, due to the
combinatorially many possibilities of binary tree topologies, the state-of-the-art VB method still relies on a reasonable preselection of candidate topologies before inference. Here, we propose a novel scheme to represent tree topology distribution in a continuous space; thereby we develop a fully differentiable approach, which we name GeoPhy, to perform VB phylogenetic inference without the preselection of candidates. We present that GeoPhy consistently outperforms the other preselection-free VB approaches in terms of the closeness of
marginal log-likelihood estimates to the gold standard obtained with long-run MCMCs, showing a promising step towards efficiency and scalability in phylogenetics.

C-269: Bayesian Phylogeographic Analysis Reveals the Effect of Nonpharmaceutical Interventions on SARS-CoV-2 Lineage Importations and Dissemination during the Third Wave of the Pandemic in Germany
Track: EvolCompGen
  • Sama Goliaei, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany
  • Mohammad Hadi Foroughmand Araabi, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany
  • Alice McHardy, Helmholtz Center for Infection Research, Braunschweig Integrated Center of Systems Biology, Germany


Presentation Overview: Show

During the initial years of the global SARS-CoV-2 pandemic, the world faced severe morbidity and mortality. This study focuses on the third wave of the pandemic in Germany and investigates the importation of SARS-CoV-2 lineages using viral genomes obtained through systematic surveillance. A Bayesian phylogeographic approach was employed to analyze the genomic data. Importations of SARS-CoV-2 lineages experienced a rise and reached a peak shortly after the Christmas holiday. Most lineages emerged initially in three states with the highest population and case numbers and subsequently spread throughout the country. Furthermore, information of nationwide nonpharmaceutical interventions (NPIs) are gathered and their effectiveness in importation and dissemination of lineages within the country was assessed. NPIs including the availability of free rapid tests, strengthening of facial mask regulations, and internal movement restrictions were most effective across evaluated subsampling methods. The pronounced effect of free rapid tests suggests their importance for future pandemic preparedness efforts due to their minimal negative socioeconomic effects compared to other interventions.

C-270: From Protein Embeddings to Phylogeny: Exploring Evolution in Space
Track: EvolCompGen
  • Adel Schmucklermann, Technical University of Munich (TUM), Germany
  • Alexander Fastner, Technical University of Munich (TUM), Germany
  • Kyra Erckert, Technical University of Munich (TUM), Germany
  • Tobias Olenyi, Technical University of Munich (TUM), Germany
  • Tobias Senoner, Technical University of Munich (TUM), Germany
  • Ivan Koludarov, Technical University of Munich (TUM), Germany
  • Burkhard Rost, Technical University of Munich (TUM), Germany


Presentation Overview: Show

We explored the use of protein sequence embeddings to construct phylogenetic trees, aiming to reduce the time required compared to traditional methods. By plotting the correlation between sequence similarity and embeddings, we found that specific dimensions of the embeddings correlated with the sequence similarity more than others, confirming that the embeddings contain evolutionary information. To filter out the noise and extract more information, we employed PCA and t-SNE, as well as a variational auto-encoder (VAE), to reduce embedding dimensions. Our analysis showed that while the embeddings contain some evolutionary information, applying PCA, t-SNE, or VAE on the dataset did not improve the quality of constructed trees. Constructing trees from non-trained embeddings resulted in the closest tree to the reference tree. Moreover, as in traditional phylogenetics, the presence of a proper outgroup is crucial for constructing a solid tree.