Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES
Confirmed Presenter: Anshu Gupta, Department of Computer Science and Engineering, University of California San Diego
Track: EvolCompGen
Room: 518
Format: Live Stream
Moderator(s): Lars Arvestad
Authors List: Show
- Anshu Gupta, Anshu Gupta, Department of Computer Science and Engineering
- Siavash Mirarab, Siavash Mirarab, Department of Electrical and Computer Engineering
- Yatish Turakhia, Yatish Turakhia, Department of Electrical and Computer Engineering
Presentation Overview:Show
Species tree inference is crucial in advancing our understanding of evolutionary relationships of life on Earth and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, unraveling intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps involving gene annotations, orthology inference, and accounting for gene tree discordances, which are neither entirely automated nor standardized and require substantial human intervention. Therefore, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using genomic datasets released from large-scale sequencing consortia (Birds 10K Genome Project, Zoonomia) across three diverse life forms (placental mammals, pomace flies, and birds), ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches while achieving >100x speedup compared to the conventional pipelines. ROADIES supports various modes of operation and is expected to improve the accuracy, speed, scalability, and reproducibility of phylogenomic analyses.