Join us for our upcoming ISCBacademy Webinars. Check back regularly for updates
June 12, 2020 at 11:00AM EDT!
Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the MSC to compute (1) the local posterior probability (PP) that the branch is in the species tree and (2) the length of the branch in coalescent units. We evaluate the precision and recall of the local PP on a wide set of simulated and biological datasets, and show that it has very high precision and improved recall compared with multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene tree estimation error is low, but are underestimated when gene tree estimation error increases. Computation of both the branch length and local PP is implemented as new features in ASTRAL.
June 23, 2020 at 11:00AM EDT!
Engineering gene and protein sequences with defined functional properties is a major goal of synthetic biology. Rational design of gene enhancers, splice sites, 3’-end regulatory sequences and more has the potential of greatly accelerating the fields of nanotechnology and medical therapeutics. Deep neural network models, together with gradient ascent optimization, show promise for sequence design. The optimized sequences can however get stuck in local minima, have low diversity and may be computationally very costly to generate at scale. In the first part of this talk, I will present our work on using gradient-based methods to design regulatory sequences of Alternative Polyadenylation (APA), a post-transcriptional mechanism where multiple polyadenylation signals (PAS) in the mRNA compete for cleavage. Given a deep neural network trained on a massively parallel reporter assay of APA variants, we forward-engineer new functional polyadenylation signals with precisely defined cleavage and isoform distributions. In the second part of this talk, I discuss how we extend this design framework using a class of generative neural networks called deep exploration networks (DENs). By penalizing any two generated patterns based on similarity, DENs learn to jointly maximize fitness and diversity. DENs can be used to design transcription factor binding sites, splice sequences and functional proteins. In the context of APA, we used DENs to engineer PAS with more than 10-fold higher selection odds than the best gradient ascent-generated patterns.
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference
by Alexey Kozlov
September 30, 2020 at 11:00AM EDT!
Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.
We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric.
IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies
by Minh Bui
Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.
Click here to Register - Coming Soon!