The SciFinder tool lets you search Titles, Authors, and Abstracts of talks and panels. Enter your search term below and your results will be shown at the bottom of the page. You can also click on a track to see all the talks given in that track on that day.

View Talks By Category

Scroll down to view Results

July 12, 2024
July 13, 2024
July 14, 2024
July 15, 2024
July 16, 2024

Results

July 15, 2024
10:40-11:00
Proceedings Presentation: Median and Small Parsimony Problems on RNA trees
Confirmed Presenter: Bertrand Marchand, Université de Sherbrooke, Canada
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Lars Arvestad


Authors List: Show

  • Bertrand Marchand, Bertrand Marchand, Université de Sherbrooke
  • Yoann Anselmetti, Yoann Anselmetti, University of Sherbrooke
  • Manuel Lafond, Manuel Lafond, Université de Sherbrooke
  • Aida Ouangraoua, Aida Ouangraoua, Université de Sherbrooke

Presentation Overview:Show

Motivation:
Non-coding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis.
Results:
We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labelled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically-relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility.

July 15, 2024
11:10-11:20
Inferring transcript phylogenies based on precomputed groups of conserved transcripts
Confirmed Presenter: Wend Yam Donald Davy Ouedraogo, Université de Sherbrooke, Canada
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Lars Arvestad


Authors List: Show

  • Wend Yam Donald Davy Ouedraogo, Wend Yam Donald Davy Ouedraogo, Université de Sherbrooke
  • Aida Ouangraoua, Aida Ouangraoua, Université de Sherbrooke

Presentation Overview:Show

Alternative Splicing (AS) is a mechanism in eukaryotic gene expression by which different combinations of introns are spliced to produce distinct transcript isoforms from a gene. Recent studies have highlighted that the transcript isoforms of human genes are often conserved in orthologous genes from various species. The conserved transcripts are referred to as transcript orthologs, and the identification of transcript ortholog groups provides valuable insights for studying their functions. Exploring the evolutionary histories of transcripts enhances our understanding of their proteins functions and their origins. It also allows us to better understand the role of alternative splicing in transcript evolution.
In a previous work(DOI: 10.1007/978-3-031-36911-7_2), we addressed the problem of inferring orthology and paralogy relations at the transcript level. In this work, we focus on the reconstruction of transcript evolutionary histories. We present a progressive supertree construction algorithm that relies on a dynamic programming approach to infer a transcript phylogeny based on precomputed clusters of orthologous transcripts.
We applied our algorithm to transcripts from simulated gene families, as well as to two case studies involving the transcripts of real gene families—specifically, the TAF6 and PAX6 gene families from the Ensembl-Compara database. The results align with those of previous studies aimed at reconstructing transcript phylogenies, while improving the computing time. The results also show that accurate transcript phylogenies can be obtained by first inferring accurately the pairwise homology relationships among transcripts and then using the latter to compute a phylogeny that agrees with the homology relationships.

July 15, 2024
11:20-11:40
A Representation for Phylogenetic Trees and Networks
Confirmed Presenter: Louxin Zhang, National University of Singapore, Singapore
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Lars Arvestad


Authors List: Show

  • Cedric Chauve, Cedric Chauve, Simon Fraser University
  • Caroline Colijn, Caroline Colijn, Simon Fraser University
  • Louxin Zhang, Louxin Zhang, National University of Singapore

Presentation Overview:Show

Good representations for phylogenetic trees and networks are important for human-computer interface and implementation of scalable heuristic methods for inference of evolution for genes, genomes and species. We present a new representation for phylogenetic trees. It maps every binary tree on n taxa to a string of taxa in which each taxon appears exactly twice. The new representation is i) shorter than the Newick format, ii) bijective in the space of phylogenetic trees and iii) easy for recovering tree edges. Using this new format, we introduce a tree operation that enables to traverse tree space in at most 2n steps and a new metric for tree comparison that is computable in linear time and correlated with the Subtree Prune and Regraft distance better than the Robinson-Foulds distance. The new representation can be further generalized to the so-called tree-child networks.

July 15, 2024
11:40-12:00
Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES
Confirmed Presenter: Anshu Gupta, Department of Computer Science and Engineering, University of California San Diego
Track: EvolCompGen

Room: 518
Format: Live Stream
Moderator(s): Lars Arvestad


Authors List: Show

  • Anshu Gupta, Anshu Gupta, Department of Computer Science and Engineering
  • Siavash Mirarab, Siavash Mirarab, Department of Electrical and Computer Engineering
  • Yatish Turakhia, Yatish Turakhia, Department of Electrical and Computer Engineering

Presentation Overview:Show

Species tree inference is crucial in advancing our understanding of evolutionary relationships of life on Earth and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, unraveling intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps involving gene annotations, orthology inference, and accounting for gene tree discordances, which are neither entirely automated nor standardized and require substantial human intervention. Therefore, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using genomic datasets released from large-scale sequencing consortia (Birds 10K Genome Project, Zoonomia) across three diverse life forms (placental mammals, pomace flies, and birds), ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches while achieving >100x speedup compared to the conventional pipelines. ROADIES supports various modes of operation and is expected to improve the accuracy, speed, scalability, and reproducibility of phylogenomic analyses.

July 15, 2024
12:00-12:20
Generalized c/µ Ratio Test for Detecting Molecular Adaptation: Beyond the conventional Ka/Ks Ratio test without Assuming Synonymous Site Neutrality or Limitation to Translated Regions
Confirmed Presenter: Chun Wu, Rowan University, United States
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Lars Arvestad


Authors List: Show

  • Chun Wu, Chun Wu, Rowan University
  • Nicholas Paradis, Nicholas Paradis, Rowan University

Presentation Overview:Show

The 60-year debate in evolutionary biology over "neutralist-selectionist" views demands a robust method to measure fitness changes due to mutations, yet the Ka/Ks ratio test, despite its prominence, has significant limitations. This test, which assesses fitness changes in the genome's Translated Region (TR) based on non-synonymous (Ka) and synonymous (Ks) substitution rates, presupposes the neutrality of synonymous mutations—a notion increasingly challenged by evidence highlighting their non-neutral impacts in replication, transcription, and translation processes. Our previous work (Comp in Bio and Med 153 (2023) 106522) introduced the relative substitution rate c/µ test (c: a mean value of Ka and Ks; µ: mutation rate) as a versatile alternative, offering a broader application without assuming synonymous site neutrality. This paper derives a general equation linking c/µ with Ka/µ, Ks/µ, and Ka/Ks, demonstrating c/µ's superior accuracy in quantifying fitness changes across both TR and UTR. Through a comparative analysis of the c/µ and Ka/Ks tests across 10 genes, 11 UTRs, and significant SARS-CoV-2 mutations, using three independent genomic datasets from December 2019 to July 2021, we validate our molecular adaptation predictions with activity data from existing literature. Our findings advocate for the c/µ test as a more effective tool than the traditional Ka/Ks test, potentially resolving the longstanding debate in evolutionary biology by accommodating non-neutral effects at synonymous sites and extending applicability beyond the TR. This method was applied to over 2000 viruses with at least 50 genomes sequences, the preliminary results will be discussed.

July 15, 2024
12:00-12:20
AlphaHOGs, a protein structure-based reference classification to improve orthology inference
Confirmed Presenter: Christophe Dessimoz, University of Lausanne, Switzerland
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Lars Arvestad


Authors List: Show

  • Stefano Pascarelli, Stefano Pascarelli, University of Lausanne
  • Christophe Dessimoz, Christophe Dessimoz, University of Lausanne

Presentation Overview:Show

The increasing availability of genomic sequences is driving forward our understanding of the diverse life forms on Earth. However, the ability to generalize organism-specific knowledge is limited by how well we relate genomes to each other. Genomes can be compared in terms of orthologous genes — genes of different species that derive from a single gene in the last common ancestor. Currently, the majority of orthology prediction software is based on the amino acid sequence, the most abundant information about proteins. However, the sequence signal of distantly related genes is weak, distributed in saturated positions, and confounded by evolutionary forces. In this work, I show how the more conserved protein 3D-structure can improve orthology prediction. I devised a method that combines sequence k-mers with k-mers generated from a local structural alphabet. The enriched sets of k-mers can be used to generate a reference classification of proteins into Hierarchical Orthologous Groups (HOGs) — a coarse-grained representation of protein families. The structure-informed reference HOGs, here named AlphaHOGs, can be exploited to infer orthology in thousands of proteomes, by using the recently developed software FastOMA. As a test case, we reconstruct the ancestral genome of the first multicellular animal with an unprecedented resolution, paving the way to higher-level analyses such as the ancestral gene content and protein interaction network, potentially shedding light on the current uncertainties about the origin of the animal lineage.

July 15, 2024
14:20-14:40
Proceedings Presentation: Joint inference of cell lineage and mitochondrial evolution from single-cell sequencing data
Confirmed Presenter: Viola Chen, Princeton University, United States
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Palash Sashittal, Palash Sashittal, University of Illinois at Urbana-Champaign
  • Viola Chen, Viola Chen, Princeton University
  • Amey Pasarkar, Amey Pasarkar, Princeton University
  • Ben Raphael, Ben Raphael, Princeton University

Presentation Overview:Show

Eukaryotic cells contain organelles called mitochondria that have their own genome. Most cells contain thousands of mitochondria which replicate, even in non-dividing cells, by means of a relatively error-prone process resulting in somatic mutations in their genome. Because of the higher mutation rate compared to the nuclear genome, mitochondrial mutations have been used to track cellular lineage, particularly using single-cell sequencing that measures mitochondrial mutations in individual cells. However, existing methods to infer the cell lineage tree from mitochondrial mutations do not model heteroplasmy, which is the presence of multiple mitochondrial clones with distinct sets of mutations in an individual cell. Single-cell sequencing data thus provides a mixture of the mitochondrial clones in individual cells, with the ancestral relationships between these clones described by a mitochondrial clone tree that must be concordant with the cell lineage tree. We formalize the problem of inferring a concordant pair of a mitochondrial clone tree and a cell lineage tree from single-cell sequencing data as the NESTED PERFECT PHYLOGENY MIXTURE (NPPM) problem. We derive an algorithm, MERLIN, to solve the NPPM problem. We show on simulated data that MERLIN outperforms existing methods that do not model mitochondrial heteroplasmy nor the concordance between the mitochondrial clone tree and the cell lineage tree. We use MERLIN to analyze single-cell whole genome sequencing data of 5220 cells of a gastric cancer cell line and show that MERLIN infers a more biologically plausible cell lineage tree and mitochondrial clone tree compared to existing methods.

July 15, 2024
14:50-15:00
Tracking tumorigenesis and the transition state through copy number variation-based pseudotime
Confirmed Presenter: Jonghyun Lee, National Cancer Center, South Korea
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Jonghyun Lee, Jonghyun Lee, National Cancer Center
  • Najung Lim, Najung Lim, National Cancer Center
  • Dongkwan Shin, Dongkwan Shin, National Cancer Center

Presentation Overview:Show

Driver mutations for different cancer types are extensively categorized. However, there appears to be no fixed number of driver mutations that guarantee the transition from normal to cancer cells. Therefore, we hypothesized that there are ranges of genetic alterations where both normal and cancer cells coexist, where two types of cells share similar genetic backgrounds while exhibiting vastly different phenotypes. By leveraging the technical advances of single-cell sequencing that captures the characteristics of thousands of cells, we sought to identify cells that belong to the transition state, where the tumor and normal cells share similar genetic alterations.
One of the well-established factors regarding the genetic changes during tumorigenesis is the accumulation of aneuploidy. Copy number variation (CNV) inference algorithms such as CopyKat and SCEVAN deduce the aneuploidy from the single-cell expression data. we utilized the accumulation of CNV to construct a pseudotime to describe the genetic changes during the tumorigenesis.
We found that there is indeed genetic background overlap between the tumor and the normal epithelial cells of breast cancer, from both patient data and mouse model. Cells within the transition state appear to share CNV events, which are represented by a NJ-based tree. These transition cells are also located between tumor and normal cell clusters in expression space. This result demonstrates that sufficient sampling can identify per-malignant normal cells with similar mutation profiles of tumor cells, which may aid in early detection and prevention of oncogenesis.

July 15, 2024
15:00-15:20
SPICE: Probabilistic reconstruction of copy-number evolution in cancer
Confirmed Presenter: Abigail Bunkum, University College London Cancer Institute, United Kingdom
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Abigail Bunkum, Abigail Bunkum, University College London Cancer Institute
  • Simone Zaccaria, Simone Zaccaria, University College London Cancer Institute

Presentation Overview:Show

Somatic copy number alterations (SCNAs) are frequent genetic alterations that accumulate in tumour cells during cancer evolution and amplify or delete large genomic regions. SCNAs are implicated to drive cancer progression, providing cancer cells with the ability to metastasise or resist treatment. Therefore, cancer sequencing studies aim to reconstruct the evolutionary history of SCNAs to investigate their role in cancer progression. Whilst several related phylogenetic methods have been introduced, these methods rely on the reconstruction of a single tumour phylogeny explaining SCNA evolution, discarding the innate uncertainty of this complex problem. In fact, modelling SCNA evolution is challenging and many different explanations for SCNA evolution are equally plausible. Therefore, reconstructing a single phylogeny might hinder the ability to accurately characterise SCNA evolution.

In this work, we introduce SPICE (Subclone Probability Inference of Copy-number Evolution), a novel algorithm that enumerates equally plausible explanations of SCNA evolution, enabling the estimation of the probabilities of SCNA events. We show, using a novel, realistic simulation framework, that SPICE outperforms previous methods on simulated datasets by combining multiple inferred phylogenies. To highlight the impact of our method, we applied SPICE to 49 bulk samples from metastatic prostate cancers to detect the presence of well-known driver cancer genes that appear to be recurrently affected by similar events in the same tumour, providing evidence for parallel evolution. Finally, we leverage information regarding the uncertainty of inferred phylogenetic topologies to identify novel metastatic migration patterns and characterise the probability of migrations between different tumour sites.

July 15, 2024
15:20-15:40
Uncovering Cancer's Fitness Landscape
Confirmed Presenter: Meaghan Parks, Case Western Reserve University, United States
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Meaghan Parks, Meaghan Parks, Case Western Reserve University

Presentation Overview:Show

CRISPR-based genome editing technologies have enabled massively-parallel genomic screens, such as DepMap – a Broad Consortium effort to catalog gene knockouts in cancer cell lines. These projects find that the growth effects of a mutation depend heavily on the background genotype of a cell. Evolutionary theory has studied the effects of background genotype on mutations for generations and has uncovered general patterns across the tree of life These patterns found in evolving populations have culminated in a ‘Geometric Model’ of adaptation that has successfully predicted the effects of novel combinations of mutations in yeast and E. coli. This model could in principle be applied to DepMap and other massively-parallel genomic screens to learn genotype to phenotype to fitness mappings and potentially predict the evolution of a population. Fitting this model to large-scale real data, however, is challenging because the model infers a latent (hidden) space of phenotypes with mathematical symmetries which confuse regression methods. Here, we present a methodology for fitting a Geometric Model of adaptation to large-scale genomic screens that is guaranteed to converge to a single inferred background genotype for any mutant. This methodology eliminates rotational, translational, and permutation symmetries in the inferred phenotype space and successfully reconstructs genotype to phenotype to fitness mappings of simulated cancer cell line knockout data. Thus, making comprehensive quantitative models of genotype to phenotype to fitness mappings possible in a multitude of diseases, which in turn will allow us to infer phenotypic complexity and predict treatment response.

July 15, 2024
15:40-16:00
Measuring pseudogenes' kinship to unravel overlooked evolutionary patterns
Confirmed Presenter: Valeriia Vasylieva, Sherbrooke University, Canada
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Valeriia Vasylieva, Valeriia Vasylieva, Sherbrooke University
  • Marie A. Brunet, Marie A. Brunet, Sherbrooke University

Presentation Overview:Show

Pseudogenes are defined as copies of protein-coding genes that have lost their ability to encode proteins and are functionless elements of our genomes. Yet, thousands are transcribed, and hundreds encode proteins. These discoveries question the definition of pseudogenes and their roles in our genome. To unravel the contribution of pseudogenes to the evolution of our genomes, we need to understand their origin. However, identification of the parental transcript and parental gene of pseudogenes is complex and currently incomplete. PsiCube database is the most up-to-date reference for pseudogene-parental gene annotation in human, yet it only references parental genes for 48% (8,225) of the pseudogenes currently annotated in Ensembl (17,349).
Here, we present a method based on the Mash distance, commonly used in metagenomics approaches, to measure kinships between transcripts of pseudogenes and protein-coding genes. Our strategy outperforms PsiCube, without any significant biases for sequence length, complexity, or GC content. We applied our method to unravel the evolutionary relationships between GAPDH and its pseudogenes. Mash distance was able to confidently separate unrelated sequences from the GAPDH paralog and from the parental GAPDH. Interestingly, our methodology highlighted pseudogenes with other pseudogenes as their closest related sequence. We expanded our Mash distance analysis to the whole human genome and identified pseudogenes arising from other pseudogenes amongst many gene families, including in loci associated with diseases.
Our work highlights an overlooked mechanism of gene evolution where pseudogenes can arise from existing pseudogenes and contribute to the diversity and evolution of our genomes.

July 15, 2024
15:40-16:00
Pseudogenes in plasmid genomes reveal past transitions in plasmid mobility
Confirmed Presenter: Dustin Hanke, Kiel University, Germany
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Giltae Song


Authors List: Show

  • Dustin Hanke, Dustin Hanke, Kiel University
  • Yiqing Wang, Yiqing Wang, Kiel University
  • Tal Dagan, Tal Dagan, Kiel University

Presentation Overview:Show

Evidence for gene non-functionalization due to mutational processes is found in genomes in the form of pseudogenes. Pseudogenes are known to be rare in prokaryote chromosomes, with the exception of lineages that underwent an extreme genome reduction (e.g., obligatory symbionts). Much less is known about the frequency of pseudogenes in prokaryotic plasmids; those are genetic elements that can transfer between cells and may encode beneficial traits for their host. Non-functionalization of plasmid-encoded genes may alter the plasmid characteristics, e.g., mobility, or their effect on the host. Analyzing 10,832 prokaryotic genomes, we find that plasmid genomes are characterized by threefold-higher pseudogene density compared to chromosomes. The majority of plasmid pseudogenes correspond to deteriorated transposable elements. A detailed analysis of enterobacterial plasmids furthermore reveals frequent gene non-functionalization events associated with the loss of plasmid self-transmissibility. Reconstructing the evolution of closely related plasmids reveals that non-functionalization of the conjugation machinery led to the emergence of non-mobilizable plasmid types. Examples are virulence plasmids in Escherichia and Salmonella. Our study highlights non-functionalization of core plasmid mobility functions as one route for the evolution of domesticated plasmids. Pseudogenes in plasmids supply insights into past transitions in plasmid mobility that are akin to transitions in bacterial lifestyle.

July 15, 2024
16:40-17:00
Long range segmentation of prokaryotic genomes by gene age and functionality
Confirmed Presenter: Yuri Wolf, NCBI/NLM/NIH, United States
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Edward Braun


Authors List: Show

  • Yuri Wolf, Yuri Wolf, NCBI/NLM/NIH
  • Ilya Schurov, Ilya Schurov, Radboud University
  • Kira Makarova, Kira Makarova, NCBI/NLM/NIH
  • Mikhail Katsnelson, Mikhail Katsnelson, Radboud University
  • Eugene Koonin, Eugene Koonin, NCBI/NLM/NIH

Presentation Overview:Show

Bacterial and archaeal genomes encompass numerous operons that typically consist of two to five genes. On larger scales, however, gene order is poorly conserved through the evolution of prokaryotes. Nevertheless, non-random localization of different classes of genes on prokaryotic chromosomes could reflect important functional and evolutionary constraints. We explored the patterns of genomic localization of evolutionarily conserved (ancient) and variable (young) genes across the diversity of bacteria and archaea. Nearly all bacterial and archaeal chromosomes were found to encompass large segments of 100-300 kilobases that were significantly enriched in either ancient or young genes. Similar clustering of genes with lethal knockout phenotype (essential genes) was observed as well. Mathematical modeling of genome evolution suggests that this long-range gene clustering in prokaryotic chromosomes reflects perpetual genome rearrangement driven by a combination of selective and neutral processes rather than evolutionary conservation.

July 15, 2024
17:00-17:20
The evolution of antibiotic resistance islands occurs within the framework of plasmid lineages
Confirmed Presenter: Yiqing Wang, Institute of General Microbiology, Kiel University
Track: EvolCompGen

Room: 518
Format: Live Stream
Moderator(s): Edward Braun


Authors List: Show

  • Yiqing Wang, Yiqing Wang, Institute of General Microbiology
  • Tal Dagan, Tal Dagan, Institute of General Microbiology

Presentation Overview:Show

Bacterial pathogens carrying multidrug resistance (MDR) plasmids are a major threat to human health. The acquisition of antibiotic resistance genes (ARGs) in plasmids is often facilitated by mobile genetic elements that copy or translocate ARGs between DNA molecules. The agglomeration of mobile elements in plasmids generates resistance islands comprising multiple ARGs. However, whether the emergence of resistance islands is restricted to specific MDR plasmid lineages remains understudied. Here we show that the agglomeration of ARGs in resistance islands is biased towards specific large plasmid lineages. Analyzing 6,784 plasmids in 2,441 Escherichia, Salmonella, and Klebsiella isolates, we quantify that 84% of the ARGs in MDR plasmids are found in resistance islands. We furthermore observe the rapid evolution of ARG combinations in resistance islands. Most regions identified as resistance islands are shared among closely related plasmids but rarely among distantly related plasmids. Our results suggest the presence of barriers to the dissemination of ARGs between plasmid lineages, which are related to plasmid genetic properties, host range, and the plasmid evolutionary history. The agglomeration of ARGs in plasmids is attributed to the workings of mobile genetic elements that operate within the framework of existing plasmid lineages.

July 15, 2024
17:00-17:20
Elucidating the Co-Evolution and Genetic Diversity of Acquired Phototrophy in Marine Worm Convolutriloba longifissura
Confirmed Presenter: Adena Collens, Invertebrate Zoology, National Museum of Natural History
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Edward Braun


Authors List: Show

  • Adena Collens, Adena Collens, Invertebrate Zoology
  • Allen Collins, Allen Collins, Invertebrate Zoology

Presentation Overview:Show

While instances of acquired phototrophy can be found across the eukaryotic tree of life, much about the evolution and maintenance of these endosymbiotic interactions remains unknown. Marine acoel worms are one such group that host photosynthetic algae (Tetraselmis sp.) within their tissues. For example, past work shows that unfed Convolutriloba retrogemma with endosymbiotic Tetraselmis algae lose less biomass in the light than in the dark, suggesting a transfer of photosynthethates to the host. However, the mechanisms and likely benefits to host and alga have yet to be described. Further, genetic diversity of the alga T. convolutae (associated with acoel Symsagittifera roscoffensis) is minimal even across diverse host genotypes and geographies, suggesting an intimate, long-term coevolution between acoel worm hosts and their algae.
Our study centers on Convolutriloba longifissura - Tetraselmis sp. photosymbiosis, a case for which even less is known. We present the first genetic characterization of the intercellular algae using low-pass whole genome sequencing data. Using short-read Illumina data, we assembled and annotated organellar genomes from both the acoel worm and the Tetraselmis alga, as well as nuclear ribosomal repeats from the acoel worm. We conducted phylogenetic analyses using several assembled markers to elucidate the relationships of C. longifissura and Tetraselmis sp. to existing sequencing data from acoel worms and green alga, respectively. In light of these findings, we intend to expand to the comparative analysis of transcriptome data to illuminate possible indicators, mechanisms, and interactions of this likely acquired phototrophic interaction.

July 15, 2024
17:20-17:40
Quality assessment of gene repertoires with OMArk
Confirmed Presenter: Yannis Nevers, Université de Lausanne, Switzerland
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Edward Braun


Authors List: Show

  • Yannis Nevers, Yannis Nevers, Université de Lausanne
  • Alex Warwick Vesztrocy, Alex Warwick Vesztrocy, Université de Lausanne
  • Victor Rossier, Victor Rossier, Université de Lausanne
  • Clément Marie Train, Clément Marie Train, Université de Lausanne
  • Adrian Altenhoff, Adrian Altenhoff, ETH Zurich
  • Christophe Dessimoz, Christophe Dessimoz, Université de Lausanne
  • Natasha Glover, Natasha Glover, Swiss Institute of Bioinformatics

Presentation Overview:Show

The amount and diversity of new genomes getting sequenced across the world opens the doors for large-scale comparative genomics. Thus, reliably ensuring the quality of the protein-coding gene repertoire derived from these data before including them in an analysis is becoming more critical. State-of-the-art genome annotation assessment tools measure some aspects of this quality but are blind to errors such as gene over-prediction or contamination.
We developed OMArk, a method that relies on fast alignment comparisons to precomputed gene families from the OMA orthology database. By identifying differences between a gene annotation and the typical gene repertoires of closely related species, OMArk assesses not only the completeness, but also the consistency of the gene repertoire as a whole compared to closely related species. This includes classification of genes with no relatives in close species, with dubious gene models, or those resulting from contamination. Through this global assessment, OMArk helps point out flaws in any given annotation.
We validated OMArk’s performances on simulated data, then performed an analysis of over 8,000 proteomes from public reference databases (UniProt, Ensembl, RefSeq..). We identified and confirmed cases of contaminations in multiple proteomes, characterized the improvements in gene repertoire quality resulting from improvement in genome assemblies, and found evidence of systematic errors induced by annotation pipelines in certain datasets. OMArk is available on GitHub (https://github.com/DessimozLab/OMArk), as a Python package on PyPi, as a bioconda package, and as an interactive online webserver at https://omark.omabrowser.org.

July 15, 2024
17:40-18:00
Leveraging machine learning to predict antimicrobial resistance in ESKAPE pathogens
Confirmed Presenter: Ethan Wolfe, Department of Pathobiology and Diagnostic Investigation, Michigan State University
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Edward Braun


Authors List: Show

  • Jacob Krol, Jacob Krol, Department of Biomedical Informatics
  • Ethan Wolfe, Ethan Wolfe, Department of Pathobiology and Diagnostic Investigation
  • Evan Brenner, Evan Brenner, Department of Biomedical Informatics
  • Keenan Manpearl, Keenan Manpearl, Department of Biomedical Informatics
  • Joseph Burke, Joseph Burke, Department of Pathobiology and Diagnostic Investigation
  • Charmie Vang, Charmie Vang, Department of Biomedical Informatics
  • Vignesh Sridhar, Vignesh Sridhar, Department of Pathobiology and Diagnostic Investigation
  • Jill Bilodeau

Presentation Overview:Show

Since the clinical introduction of antibiotics in the 1940s, antimicrobial resistance (AMR) has become an increasingly dire threat to global public health. Pathogens acquire AMR much faster than we discover new drugs, warranting new methods to better understand the molecular underpinnings of AMR. Traditional approaches for detecting AMR in novel bacterial strains require time-consuming, labor-intensive assays. Here, we introduce a machine learning approach to identify AMR-associated features. We focus on six highly drug-resistant bacterial pathogens responsible for most nosocomial infections: the “ESKAPE” pathogens. We use all NCBI-PGAP-annotated ESKAPE genomes with known AMR phenotype data from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC). Then, for all complete and WGS genomes for each ESKAPE species, we cluster similar genes and construct pangenomes with Panaroo. To uncover the molecular mechanisms behind drug-/drug family-specific resistance and cross-resistance, we train logistic regression and random forest models on our pangenomes, which include antibiotic resistance/susceptibility labels per genome. The models are tested rigorously to yield ranked lists of AMR-associated genes and protein domains. In addition to recapitulating known AMR genes, our models have identified novel candidates for individual and cross-resistance mechanisms that await experimental validation. Our holistic approach promises thorough, reliable prediction of existing or developing resistance in newly identified pathogen genomes, along with mechanistic molecular contributors of resistance.

July 15, 2024
17:40-18:00
Predicting pathogen preferences and host adaptation by leveraging microbial genomics and machine learning
Confirmed Presenter: Evan Brenner, University of Colorado Anschutz Medical Campus, United States
Track: EvolCompGen

Room: 518
Format: In Person
Moderator(s): Edward Braun


Authors List: Show

  • Evan Brenner, Evan Brenner, University of Colorado Anschutz Medical Campus
  • Janani Ravi, Janani Ravi, University of Colorado Anschutz Medical Campus

Presentation Overview:Show

Most emerging infectious diseases (EIDs) of humans originate in animals and are transmitted through zoonotic spillover events. However, the genetic determinants underlying host adaptation or host switching are often unclear. We hypothesize that genomic markers of pathogen adaptation to different hosts are detectable and can yield valuable insights into EID pathobiology. Utilizing publicly available databases, millions of bacterial and viral genomes with metadata, including their hosts of origin, are accessible for study. To leverage these, we are training machine learning models that associate pathogen genetic elements (e.g., genes, k-mers) with host labels. Our models are simple and interpretable (e.g., decision trees), run with reasonable computational requirements, and have been tested on a sampling of phylogenetically distinct bacterial and viral pathogens.

Our preliminary results have yielded high predictive performance for bacterial and viral pathogens, and top-ranked features in these models often pinpoint genomic elements that are 1) associated with horizontal gene transfer elements, and 2) demonstrated to play biologically relevant roles to host adaptation in prior literature. We will expand these models to new species, build more complex models that incorporate additional levels of genomic information (e.g., protein domains), and begin testing performance across species or genera rather than solely within. These advances offer promise in assessing threats to different host populations posed by new EIDs.