To view previous webinars use the links below
2020 Webinars | 2021 Webinars | 2022 Webinars | 2023 Webinars | 2025 Webinars
ISCBacademy is an online webinar series including the ISCB COSI, COVID webinars, Indigenous Voices and practical tutorials. We aim to inspire, connect, and communicate the science while providing a hands-on experience accessing and using newly developed bioinformatics tools while ensuring best practices for rigour and reproducibility.
January 24, 2024
We live in a microbial world estimated to contain more than a million species, and yet humanity’s adversarial relationship with microbes is shaped by a small fraction of pathogenic species and the pervasive use of antimicrobial agents. Efforts to eradicate microbes often have limited success, with disinfected environments being rapidly recolonized, and antibiotic treatment increasingly selecting for resistant pathogens. The global rise in antimicrobial resistance (AMR) rates for common pathogens (e.g. ESKAPE) is recognized as a pre-eminent threat to healthcare systems. As the range of effective antibiotics shrinks we approach a tipping point where no antibiotic works for a pathogen, putting at risk the lives of millions of vulnerable patients in hospitals worldwide. Already >1 million deaths/year are attributed to AMR, and by 2050 the UN projects that AMR will be responsible for more deaths every year than all cancers (>10 million deaths/year).
We need new approaches to track the transmission of antibiotic resistance across microbes and to understand how we can leverage ecological functions to reduce AMR reservoirs. We propose that the emerging field of genome-resolved metagenomics aided by long-read sequencing [1] can transform our ability to do microbial surveillance, and we showcase its application in tracking pathogens through hospital environments [2] as well as the gut microbiome [3]. In order to decipher how microbial communities assemble and can provide colonization resistance against pathogens, we have developed new microbiome modelling approaches that can provide mechanistic insights based on high-throughput metagenomic datasets [4, 5]. Together with other data mining approaches [6], we are now leveraging these to understand how microbiomes recover from the impact of antibiotics and how new classes of biotherapeutics can be developed to prevent the spread of antimicrobial resistant pathogens.
1. Bertrand D et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nature Biotechnology 2019 Aug;37(8):937-944
2. Chng KR et al. Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. Nature Medicine 2020 Jun;26(6):941-951
3. Kang JTL et al. Long-term ecological and evolutionary dynamics in the gut microbiomes of carbapenemase-producing Enterobacteriaceae colonized subjects. Nature Microbiology 2022 Oct;7(10):1516-1524.
4. Li C et al. An expectation-maximization algorithm enables accurate ecological modeling using longitudinal microbiome sequencing data. Microbiome 2019 Aug 22;7(1):118
5. Li C et al. BEEM-Static: Accurate inference of ecological interactions from cross-sectional microbiome data. PLoS Computational Biology 2021 Sep 8;17(9):e1009343
6. Chng KR et al. Metagenome-wide association analysis identifies microbial determinants of post-antibiotic ecological recovery in the gut. Nature Ecology & Evolution 2020 Sep;4(9):1256-1267. doi: 10.1038/s41559-020-1236-0
Hosted by:
January 30, 2024
Translation is a fundamental process in all biological kingdoms. The initiation of translation, which marks the outset of protein synthesis, is a highly regulated and crucial step. This initiation hinges on the recognition of the start codon by a scanning ribosome. Contrary to the conventional representation, more than half of mRNA molecules contain one or more upstream AUGs (uAUGs) before the primary AUG (mAUG). The presence of these uAUGs provides potential alternative sites for the ribosome to initiate translation before it reaches the mAUG. This diversity in start codon presence raises an intriguing question: How do eukaryotes dynamically select the appropriate start codon to initiate translation, especially when confronted with varying environmental conditions?
Hosted by:
February 16, 2024
The FAIR Principles provide guidance on how to improve the Findability, Accessibility, Interoperability, and Reusability of digital objects. Since the publication of the principles in 2016, several workflows have been proposed to support the process of making resources FAIR (FAIRification). These workflows define steps such as identifying FAIRification objectives, semantic modelling of (meta)data, hosting and assessment of FAIR data. However, to respect the uniqueness of different communities, both the principles and the available workflows have been deliberately designed to remain agnostic in terms of standards, tools, and related implementation choices. While this flexibility is beneficial, it also poses challenges for those wishing to carry out their own FAIRification, especially for beginners. For instance, the question “Is there a checklist I can follow during FAIRification?” has been frequently asked by newcomers when consulting with FAIR experts, implying a need for simple and detailed guidance. Furthermore, based on previous experiences with FAIRification projects, we have found that preparing for FAIRification (e.g., identifying the FAIRification objectives) and designing the semantic (meta)data models to make resources FAIR are the most challenging FAIRification steps. This talk focus on research results towards providing detailed guidance for these two crucial steps.
Hosted by:
February 20, 2024
Systems biology has been widely used to study signalling in immune cells. Multiscale modelling approaches have provided insight into how cellular signalling leads to distinct cell fates, which control the immune response. Less attention has been paid to what happens to these networks, cell fates and patient outcomes when signalling is impacted by mutations. B cell lymphoma is a highly heterogeneous disease and treatment progress has been challenged by patient-to-patient variability. The Mitchell lab is asking whether systems biology models can enable us to overcome this patient-to-patient heterogeneity and get the right drugs to the right patients. Through combined computational modelling and experimental work, we found that mutations cause “crossed wires” within molecular signalling that result in tumour cells misinterpreting their microenvironment. We found that when mutations impact multiple signalling networks that control multiple cell fates, the resultant changes in cellular proliferation can be greater than expected. We find that by combining DNA sequencing data with ordinary differential equation models we can create heterogeneous populations of virtual patients. Within these patients, we computationally identify a new subgroup of patients who have co-occurring dysregulation of their cell cycle and apoptosis. We find the perturbed signalling within these patients results in dismal outcomes (progression-free survival). We need new treatment approaches for these patients. By simulating the impact of inhibitors within these molecular networks we find we can predict which inhibitors are most effective in each lymphoma cell population. Validating these predictions in the lab demonstrates how computational systems biology approaches are unlocking a personalized medicine approach to getting the right drugs to the right patients.
Hosted by:
March 5, 2024
The Ersilia Open-Source Initiative is a non-profit organisation whose mission is to equip laboratories, clinics and universities in lower and middle income countries (LMICs) with artificial intelligence (AI) tools for infectious disease research. The goal of our mission is to strengthen the research capacity in those countries where these diseases are predominant, supporting the in-country drug discovery pipelines for neglected and infectious diseases. Since its foundation in 2020, we have collaborated with several institutions in the Global South as well as international consortiums. In this introductory talk we will present our computational approach and infrastructure, including the Ersilia Model Hub, a unified platform offering ready- to-use AI models to researchers worldwide, and how we have used it across multiple projects, offering a perspective on how AI/ML can transform drug discovery and contribute towards a more egalitarian world of biomedical research.
Hosted by:
April 3, 2024
Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability within protein families [1]. Here we present, FrustraEvo [2], a novel methodology to analyze local frustration patterns within protein families that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We have analyzed these signals in very well studied cases such as PDZ, SH3, alpha and beta globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We have applied our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as emergent pathogens. Our approach will be valuable to explore the dynamic emergence of functional patterns in protein families.
[1] Freiberger MI. et al. Local Energetic Frustration Conservation in Protein Families and Superfamilies. Nature Comms 2023. https://www.nature.com/articles/s41467-023-43801-2
[2] Parra RG & Freiberger MI et al. Frustraevo: A Web Server To Localize And Quantify The Conservation Of Local Energetic Frustration In Protein Families. BiorXiv 2023. https://doi.org/10.1101/2023.11.29.569273
Hosted by:
April 25, 2024
The horizontal movement of genes is a crucial driver in the evolution of viral and bacterial pathogens. It enables pathogens to, for example, make large jumps in fitness space, adapt to new host species, or gain novel genes, such as acquiring plasmids carrying determinants for antibiotic resistance. Phylogenetic methods are often used to reconstruct evolutionary events but mostly assume that a phylogenetic tree can describe the shared evolutionary history of pathogens. This assumption—that phylogenetic trees accurately represent that history—is challenged when genes move horizontally, necessitating the use of phylogenetic networks instead.
In this talk, I will first present recent work on inferring phylogenetic networks using a Markov chain Monte Carlo approach. This approach models the horizontal movement of genes using coalescent models, allowing us to quantify reassortment, recombination, or plasmid transfer rates. I will then showcase multiple applications of phylogenetic network inference. First, I will demonstrate how we can use the coalescent with reassortment to infer reassortment rates across different influenza viruses. Next, I will discuss how phylogenetic network inference allows us to infer the complex evolutionary history of human coronaviruses, including MERS and SARS-like viruses such as SARS-CoV-1 and 2. Lastly, I will present work on reconstructing the gain and loss of small plasmids and the recent dissemination of a multidrug-resistance plasmid between Shigella sonnei and Shigella flexneri lineages. This includes multiple independent events and steady growth in prevalence since 2010 and quantifies the rates at which different plasmids move between bacterial lineages.
Hosted by:
April 29, 2024
The advent of high-dimensional multimodal omics technologies has revolutionized the landscape of drug discovery and clinical development by enabling a comprehensive understanding of biological systems be it oncology or chronic diseases. In early drug discovery, such varied form of omics like genomics, transcriptomics, epigenomics, proteomics alongside clinical phenotypic measurements facilitate the identification of known and novel targets by elucidating disease mechanisms at an unprecedented molecular level. As early potential drug targets progresses to late-stage clinical development, multi-omics approaches can often provide critical insights into drug efficacy, safety, and patient stratification, thereby enhancing the precision of therapeutic interventions. Integration of omics and clinical data with advanced statistical and Machine Learning models are also increasingly becoming a part of the drug discovery and development value chain to predict potential adverse events, dosing regimens and mechanisms of action. Ultimately, the application of omics technologies throughout the drug development pipeline promises to accelerate the delivery of personalized medicine and improve patient outcomes.
Hosted by:
May 21, 2024
Chemical exposures exert a significant impact on individual and public health, yet no unifying view exists on how diverse chemical compounds may interfere with biological processes and contribute to disease risk. Here, we adopt a network-based approach to construct a comprehensive map connecting 9,887 exposures through their shared genetic impact. This map can be used to define classes of exposures that affect the same biomolecular processes, even if they are chemically distinct. We found that exposures target specific modules within the human interactome of protein-protein interactions and that their harmfulness is related to their interactome connectivity. A systematic comparison between the interactome modules affected by exposures and disease-associated modules suggested that their interactome proximity can be used to predict exposure-disease relationships. We have validated our predictions through nationwide disease prevalence data. As a case study, we discuss the potential health implications of Endrin, a pesticide prevalent in Italian agricultural soil. Taken together, our study provides a blueprint for the systematic investigation of the pathobiological impact of chemical exposures ranging from the molecular to the population level.
Hosted by:
September 5, 2024
In this tutorial, we present IntelliGenes, a novel Artificial Intelligence (AI) and machine learning (ML) pipeline to discover biomarkers significant in disease prediction with high accuracy. IntelliGenes is based on a novel approach, which consists of nexus of conventional statistical techniques and cutting-edge AI/ML algorithms using multi-genomic, clinical, and demographic data. By integrating these approaches, we outperformed single algorithms, resulting in enhanced accuracy, deeper insights, and more precise predictions, essential for personalized early disease-risk detection in individuals. IntelliGenes introduces a new metric i.e., Intelligent Gene (I-Gene) score to measure the importance of individual biomarkers for prediction of complex traits. I-Gene scores can be utilized to generate I-Gene profiles of individuals to comprehend the intricacies of ML used in disease prediction. IntelliGenes is user-friendly, portable, and a cross-platform application, compatible with Microsoft Windows, macOS, and UNIX operating systems. IntelliGenes not only holds the potential for personalized early detection of common and rare diseases in individuals, but also opens avenues for broader research using novel ML methodologies, ultimately leading to personalized interventions and novel treatment targets. We are proud to share that IntelliGenes is the first peer reviewed published AI/ML pipeline for biomarker discovery and predictive analysis using integrated clinical and multi-genomic profiles. It is recently published in the Bioinformatics journal by Oxford University Press and the International Society for Computational Biology (ISCB). [PMID: 38096588, and DOI: 10.1093/bioinformatics/btad755].
Hosted by:
November 4, 2024
As macromolecular structures available through the Protein Data Bank (PDB) archive continue to grow in complexity and size, traditional text data formats like PDBx/mmCIF and the legacy PDB file format are becoming increasingly inefficient for transfer and parsing. To support scalable data analysis, binary formats and compression techniques are now essential.
Join our one-hour workshop to future-proof your data analysis with BinaryCIF, a fully interchangeable yet drastically more efficient flavor of the PDBx/mmCIF format. BinaryCIF not only boosts storage efficiency, but also substantially improves parsing speed, making it ideal for large-scale analyses. BinaryCIF is supported by resources such as RCSB PDB, PDBe, and AlphaFold DB.
This webinar will benefit bioinformaticians, data scientists, and structural biologists who want to
• Understand the basics of the PDBx/mmCIF schema
• Access BinaryCIF files and related APIs on RCSB.org
• Programmatically consume BinaryCIF data and convert between formats
• Compute archive-wide statistics across the entire PDB
• Gain hands-on experience with our Python parser
An institutional email address for registration is preferred.
You will receive confirmation and a Zoom link by email before the event.
Hosted by:
November 7, 2024
This tutorial provides an introduction to genome-wide association study (GWAS) and genotype imputation, key methodologies in the field of computational genomics.
GWASes have revolutionized our investigation of the genetic basis of Mendelian and complex diseases as well as that of human traits by systematically examining millions of genetic variants across the entire genome. In recent years GWASes have been empowered by genotype imputation, a statistical technique which allows to accurately infer untyped genotypes and dramatically increase the number of genomic markers to test. Despite the usage of GWAS and genotype imputation in everyday genomic workflows, limited resources are available to understand the principles of genetic association analysis and perform genomic scans on imputed data.
The tutorial covers the fundamental principles of GWAS and genotype imputation in the context of human genetics. It offers a comprehensive overview of the association analysis on genetic data, which turns out to be effective not only on human datasets: genetic association tests are totally applicable and transferable to any other organisms, in which genomic positions are tested against a phenotype of interest to unravel the underlying genetic causes associated with a specific trait.
The tutorial addresses the following topics:
•introduction to genetic data from genotyping and sequencing technologies;
•the principal statistical methodologies to perform GWAS;
•quality controls and filtering options, on both variants (e.g., minor allele frequency) and samples (e.g., relatedness and ancestry);
•introduction to genotype imputation: main methods, popular tools and reference panels;
•solving the problem of missingness in genetic data with genotype imputation;
•perform a GWAS on typed and imputed data
Through a step-by-step approach, participants will gain both theoretical and practical insights into the implementation of a pipeline to perform GWASes with genotype imputation. Together with a notional explanation, the attendees will have the chance to take part in three hands-on sessions, covering:
•quality control and filtering of genetic data;
•data handling and genotype imputation;
•GWAS and results visualization;
The hands-on sessions will be performed on a publicly available and widely-used genetic dataset from the 1000 Genomes Project. The whole workflow will be carried out by using state-of-the-art toolsets recognized by the scientific community and above all easily understandable and accessible to beginners, such as plink and bcftools for quality controls and association studies and the Michigan and TOPMed imputation servers for the genotype imputation. Data visualization will be performed in the R statistical environment.
By the end of this tutorial, attendees will possess a general overview of the main concepts regarding genome-wide association studies and genotype imputation methodologies, boosting their knowledge in undertaking basic genetic association studies to unravel the genetic underpinnings of diverse phenotypes, either on human samples or other organisms.
Hosted by:
December 6, 2024
As the availability of genomic data from across the tree of life has increased, the extent of heterogeneity in phylogenomic datasets has become increasingly evident. The diverse processes shaping genomic variation necessitate increasingly complex models that cannot always be accommodated in standard likelihood or Bayesian frameworks. In light of this heterogeneity, machine learning has emerged as a particularly promising approach. First, I’ll describe our applications of supervised machine learning to infer phylogenetic relationships and demographic histories. While promising, these approaches rely on the use of data simulated under the models of interest to train machine learning algorithms. When the models used to simulate these data do not include processes important in shaping genetic variation in our focal systems, it leads to a mismatch between training data and empirical data. Our results indicate that such model violations can mislead inferences of introgression. However, domain adaptation approaches aim to overcome this limitation of supervised machine learning. Using domain adaptation, we demonstrate that accurate inferences of introgression are possible, even in the presence of complex processes not modelled in the training data.
Hosted by: