To view previous webinars use the links below
2020 Webinars | 2021 Webinars | 2022 Webinars
ISCBacademy is an online webinar series including the ISCB COSI, COVID webinars, Indigenous Voices and practical tutorials. We aim to inspire, connect, and communicate the science while providing a hands-on experience accessing and using newly developed bioinformatics tools while ensuring best practices for rigour and reproducibility.
January 10, 2023
Leveraging Microbiome Data in the Era of Precision Medicine
Our DREAM challenge was to predict (a) preterm or (b) early preterm birth from 9 publicly available studies of the vaginal microbiome representing 3578 samples from 1268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on 2 novel datasets. From 318 participants we received 148 and 121 submissions for our prediction tasks with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87 respectively.
Random forest model accurately predicts early preterm labor
About 10% of births worldwide are preterm (delivery before 36 weeks), and 2% of births worldwide are early preterm (delivery before 32 weeks). For the Preterm Birth Microbiome DREAM Challenge we used 3,578 vaginal microbiome samples from 1,268 individuals to predict both preterm and early preterm birth. We explore the use of a generative adversarial network (GAN) applied to preterm birth prediction to train models on more data, but find that a basic random forest model trained on real relative abundances, diversity metrics, community state types, race, and collect week outperforms both a random forest and support vector machine trained on generated relative abundance data. For early preterm birth prediction, we employ a basic random forest model and find that the most important features for early preterm birth prediction include diversity statistics, collect week, race, community state types, and many phylotypes. However, few of these data show significant difference between early preterm and post-32 week samples, indicating that individual features on their own are not good predictors of early preterm birth. When tested on the validation dataset for the challenge, our early preterm birth prediction model had an AUC ROC of 0.87, an AUC PRC of 0.44, and an accuracy of 0.91.
Prediction model construction of early-preterm birth via vaginal microbiomes based on ensemble learning approach
The occurrence of preterm, including early-preterm birth, is estimated annually at 15 million births worldwide. Preterm birth(PTB) is a great concern as it is one of the leading causes of neonatal mortality, and the inflammation of the vaginal microbiome is known as the major cause of PTB. Because of the complexity of the vaginal microbial environment in pregnancy, it is necessary to accurately predict (early-) preterm birth using computational approaches based on microbiome characteristics and meta-information. In this Preterm Birth Prediction Microbiome DREAM Challenge, we constructed prediction models with selected features, handling highly sparse and similar data points in a given raw data. We applied the minimum redundancy maximum relevance method to select relevant features. Then, various machine learning models were tested to construct ensemble models to avoid overfitting and optimize the model. The constructed prediction models resulted in high performances, with an AUC of 0.635 and 0.841 for tasks 1 and 2.
Hosted by:
January 17, 2023
Elucidating the design principles of regulatory networks driving cellular decision-making is of fundamental importance in mapping and controlling cell-fate. Despite their size and complexity, large regulatory networks often lead to a limited number of phenotypes. How this canalization is achieved remains largely elusive. Here, we investigated multiple different networks governing cellular plasticity during cancer metastasis, and identified a latent design principle in their topology that limits their phenotypic repertoire – the presence of two “teams” of nodes engaging in a mutually inhibitory feedback loop. These "teams" are specific to these networks and directly shape the phenotypic landscape and consequently the cell-fate trajectories. Our analysis reveals that network topology alone can contain information about phenotypic distributions it can lead to, thus obviating the need to simulate them. We present experimental evidence of such "teams" in transcriptomic datasets across many contexts (cancer cell plasticity in breast cancer, melanoma, lung cancer etc.). Overall, we propose these “teams” as a network design principle that can drive cell-fate canalization in diverse decision-making processes.
Hosted by:
January 24, 2023
Analysis of disease progression patterns of multimorbid patients typically try to find systematic patterns of risk factors, diseases and complications. Such analyses are complicated by the fact that certain risk factors also can present as complications, thus representing “promiscuous” diseases that appear in quite different contexts. Another problem is that similar outcomes can be caused by different mechanisms, mixed etiologies, that can be difficult to disentangle longitudinally. Using population-wide registry data from Denmark (7-10 M patients) we construct disease and prescription trajectories that reflect these situations. However, compared to classical disease registries electronic patient records contain even deeper information, for example in the clinical narratives. The talk will describe how the text in EHRs can add to the construction of disease trajectories and reflect temporal patterns that are not coded in structured form as conventional registry data are.
Hosted by:
January 31, 2023
In vivo studies of human metabolism are encumbered with serious ethical and technical issues. Only very few metabolic parameters can be directly measured and many parameters, such as metabolic fluxes are hard or impossible to assess experimentally.
Inferring the response of a biological system to external or internal perturbations from the properties and interactions of its constituting molecules is a central goal of systems biology. For metabolic systems, reaching this goal requires the establishment of mathematical models enabling the computation of metabolite concentrations and fluxes at given external conditions (nutrients and hormones), level of metabolic enzymes, and the system’s history (e.g. current filling of nutrient stores).
Classical approaches include biostatistical methods (gene set enrichment) and stoichiometric models (such as flux balance analysis) but neglect most of the known regulatory mechanisms. We develop comprehensive biochemistry-based kinetic models of the central metabolism for different organ systems taking into account the regulation of enzyme activities by their reactants, allosteric effectors, and hormone-dependent phosphorylation.
The models have their utility in basic research, medicine, and pharmacology. Using proteomics data to scale maximal enzyme activities, the models help to investigate alterations in the metabolic functions of tissues or cells. Applications include genetic manipulations, diseases, and pharmacological treatments.
Our computational models have reached a high level of maturity making them suitable as screening platforms on an individual basis enabling a mechanistic understanding of functional mitochondrial and metabolic alterations. In the future, we will use our platform to translate findings from animal studies by conducting virtual clinical trials applying the observed modes of action to real-world proteomic data of a well-characterized patient cohort. Furthermore, we aim to couple multiple organs to better understand whole-body metabolism.
Hosted by:
February 14, 2023
Although the functions of proteins and nucleic acids are determined by the 3D structures that they fold into, only a subset of residues are directly involved in a particular function’s mechanism. Such crucial residues are substructural 3D arrangements that are conserved as motifs and partake in molecular interactions such as binding sites, catalytic mechanisms as well as the maintenance of specific folds or domains. Recently, protein structure prediction algorithms such as AlphaFold have computed highly accurate 3D structure models of protein sequences available in the UniProt database resulting in more than 200 million models. The deposition rate of experimentally determined structures into the Protein Data Bank has also increased and surpassed the 200,000 entries mark in January 2023. Our research group has developed tools and resources that allow for the searching and comparisons of 3D substructures in protein and RNA molecules. These capabilities allow for the functional annotation of existing structures as well as new experimentally determined or computationally generated structures with unknown functions and thus providing further insights into the extent of diversity or conservation of functional mechanisms. This can in turn lead to knowledge regarding a wider repertoire of functions that use known molecular mechanisms and usher in a new era of structure similarity driven function annotation beyond the sequence similarity based function annotation that has been in use for decades. These substructure searching tools are available at http://mfrlab.org/grafss/.
Hosted by:
February 27, 2023
Protein sequence and three-dimensional structural analysis can provide valuable insight into its function and biological significance. This tutorial session aims to explore protein sequence and structural retrieval databases, analysis resources and tools. Session will include introduction and hands on training on the exploration of protein sequence (UniProt) and structural databases (Protein Data Band (PDB)) with analysis tools available in these databases. Hands on session on protein sequence similarity and homology through pairwise sequence alignment. Protein secondary structure prediction will be conducted using PSIPRED and TMHMM. Protein structural modeling and analysis will be done using Swiss Model and PyMOL. All the software and databases used in the tutorial session are web browser-based, free, open source, compatible with Windows or Mac OS X, requires no installation except PyMOL, which needs to be installed. License for the educational version PyMOL is available free.
Hosted by:
March 2, 2023
Small molecular metabolites are a fundamental component of biological processes. Bio-ontologies such as ChEBI describe and classify biologically relevant metabolites with information about their molecular structures, metadata such as names and identifiers, and biological activities. However, the number of metabolites in living systems far exceeds the size of available ontologies, and manual expert curation limits the speed at which knowledge resources can grow. Ontology-informed machine learning provides a promising approach to automatically extend chemical ontologies by using the information contained in the ontology to train a model that is able to automatically classify new metabolites in the ontology. In addition, we have seen that a model that was pre-trained on this ontology classification task was able to perform better on a subsequent toxicity prediction task than a model without pre-training. This shows that ontologies and machine learning can work together to create a virtuous cycle of knowledge capture and discovery.
Hosted by:
March 7, 2023
Over the years, scientific 3D visualization technology has matured to the extent where large and complex datasets can be displayed at interactive framerates, shifting the bottleneck in visualization from rendering to intuitive and effective interaction. In this talk, I will introduce my research in solving challenges related to biological data featuring several specific characteristics: they are multi-scale, multi-instance, three-dimensional, and incredibly dense. I will summarize my doctoral research work aimed at visualization for science communication. Afterward, I'll introduce our recent work on web-based visualization of 3D chromatin structures modeled from Hi-C data.
Hosted by:
March 14, 2023
So often, a patient’s role in research has been limited to the donor of data and receiver of care, limiting the possibilities in between for participatory methodologies and open science collaboration.
This webinar will be a case study of the patient-led research model. We will explore how the model was formed under the context of those with COVID-19 developing their own research capacity to study Long COVID, or Post-Acute Sequelae of COVID (PASC). We will present the levels of patient-involvement, data ownership and access principles, and accommodations for disabled contributors from our community. We will further discuss how patient-generated data can guide research direction and inform research hypotheses, while patients’ lived experiences, experiments and own research can enrich a learning health system. We will suggest ways to incorporate patient-led research models into open science innovations.
The talk will be delivered by a patient-researcher, co-founder and technologist at the Patient-Led Research Collaborative, a multi-disciplinary, patient-run organization dedicated to placing patient voices at the forefront of Long COVID research.
Hosted by:
March 21, 2023
Scientific literature grows very fast. One of the first studies regarding scientific literature production was conducted by De Solla Price, who used publication data collected over the 100 years (1862–1961) to calculate a doubling time. The results showed 13.5 years for doubling the scientific corpus with a 5.1% annual growth rate (de Solla Price, 1965). The development of technologies created conditions for scientific literature production, which made scientific information more accessible and introduced new challenges.
Our research focuses on the biomedical domain, which is one of the largest and most rapidly developing. Accessibility of biomedical literature through databases such as Medline (Medline, 2021) and research activity in biomedicine creates an opportunity to use natural language processing (NLP) techniques.
We implemented an information theory-based statistical approach and compared it with modern transformers on a relevant practical task ‒ classifying biomedical papers related to Drug-Induced Liver Injury (DILI) as part of the CAMDA 2022 Challenge 1. DILI is a clinically significant condition and is one reason for drug registration failures. Scientific literature is the primary source of information related to DILI. Thus collecting and processing vast amounts of biomedical literature can help pharma companies, research organizations, and regulators to find relevant information.
Hosted by:
March 31, 2023 at 2:00 PM UTC
Over the past couple of decades, immunotherapy treatments have been widely adopted as an alternative treatment for a variety of cancers. The study of tumour microenvironment of immune cells such as macrophages, T cells and B cells amongst others can help to unravel the mystery of differential outcomes to immunotherapy treatments. Gene expression profiling can help to identify the patterns of genes expressed in major immune cells amongst cohorts of patients at different stages of cancer to generate new biological hypotheses. Statistical approaches can facilitate the identification of highly variable genes and their expression in immune cells by performing analysis of scRNA sequencing data. The tutorial will be divided in three parts ; comparing the popular annotation tools , applying dimensionality reduction techniques to obtain multi-stage downstreaming of scRNA data and extracting crucial insights from immune cell populations and subpopulations. Throughout the tutorial we will follow the seurat pipeline version 4.0.
Hosted by:
April 11, 2023 at 9:30 AM EDT
Diverse organisms change their genomes during evolution, and prediction of those changes is a long-standing problem. While recent laboratory evolution studies have shown the predictability of short-term and sequence-level evolution, that of long-term and system-level evolution has not been systematically examined. Here, we show that the gene content evolution of metabolic systems is generally predictable by applying ancestral gene content reconstruction and machine learning techniques to ~3000 bacterial species. Our computational framework, Evodictor, successfully predicted gene gain and loss evolution at the branches of the reference phylogenetic tree, suggesting that evolutionary pressures and constraints on metabolic systems are universally shared. Investigation of pathway architectures and meta-analysis of metagenomic datasets confirmed that these evolutionary patterns have physiological and ecological bases as functional dependencies among metabolic reactions and bacterial habitat changes. Furthermore, pan-genomic analysis of intraspecies gene content variations proved that even “ongoing” evolution in extant bacterial species is predictable in our framework. I herein present our findings on the predictability of biological system evolution, and discuss future perspectives on the versatility of Evodictor concept to extract evolutionary rules with growing datasets of genetic/phenotypic traits and megaphylogenies.
Join the Webinar by logging in to ISCB Nucleus
Hosted by: