Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in SAST
Monday, April 14th
17:15-17:30
Welcome and Opening Remarks
Format: In person


Authors List: Show

17:30-18:30
Invited Presentation: Translating computational biology to the clinic
Format: In person

Moderator(s): Alan Christoffels


Authors List: Show

  • Chandra Verma

Presentation Overview: Show

Structural biology complemented with physics based models has been immensely successful in understanding the links between biomolecular structure, dynamics and function. This has resulted in biotechnological and therapeutic developments especially with recent boost from the application of Machine Learning/AI inspired methods. This pipeline is now being extended to the clinic and will be discussed with some examples and future directions towards informing clinical decision making.

Tuesday, April 15th
9:00-10:00
Invited Presentation: The South African experience of sequencing respiratory and meningitis pathogens: from vaccine-preventable pathogens to antimicrobial resistance to outbreaks
Format: In person

Moderator(s): Faisal M. Fadlelmola


Authors List: Show

  • Anne Gottberg

Presentation Overview: Show

This lecture will cover our experience in sequencing viral and bacterial causes of pneumonia and meningitis from routine surveillance platforms. The lecture will also include outbreak investigations, including South Africa’s recent diphtheria outbreak. Viral infections currently being monitored include influenza and respiratory syncytial virus (RSV), keeping in mind the possibility of a potential influenza pandemic, but also maternal vaccines and monoclonal antibody interventions coming to Africa to prevent RSV in infants. In addition, updated data from surveillance for SARS-CoV-2 variants will be presented. Antimicrobial resistance (AMR) surveillance includes the bacterium pneumococcus (AMR driven by both vaccine-serotype and non-vaccine-serotype lineages), and diphtheria (high-level resistance to first-line antimicrobials [beta-lactams and macrolides] reported recently).
Data will be presented from whole genome sequencing of these pathogens, but focusing on current global and local priorities for each, highlighting important factors to keep in mind before planning sequencing from representative samples of cultures or specimens, to ensuring treatment and clinical outcome metadata are captured for analyses. Quality control of sequence data will be discussed, including the correlation of phenotypic testing results with findings from genomic analyses.

10:30-10:45
Early Biomarkers of Long Covid (PASC)
Confirmed Presenter: Malika Aid Boudries, Harvard School of Medicine, United States

Format: In Person

Moderator(s): Ruben Cloete


Authors List: Show

  • Malika Aid Boudries, Harvard School of Medicine, United States
  • Malika Aid, CVVR, United States
  • Malika Aid, CVVR, United States

Presentation Overview: Show

Long Covid, or Post-Acute Sequelae of COVID-19 (PASC), encompasses a range of chronic symptoms that persist after acute SARS-CoV-2 infection. The proposed mechanisms underlying Long Covid include factors such as persistent viral presence, reactivation of latent viruses, tissue damage, immune system dysregulation, and inflammatory responses. In a study involving 142 participants—spanning uninfected controls, acutely infected individuals, convalescent controls, and Long Covid patients—integrative bioinformatics and machine learning analyses were applied to immunologic, virologic, transcriptomic, and proteomic data. This approach revealed that Long Covid patients exhibited chronic immune activation, upregulated proinflammatory pathways (e.g., JAK-STAT and IL-6), and metabolic and T cell exhaustion signatures, differentiating them from convalescent patients six months post-infection.
Advanced computational tools, particularly machine learning models, played a pivotal role in identifying these biological patterns, offering insights into biomarkers such as plasma IL-6R levels for Long Covid diagnosis. By leveraging these tools, we identified novel therapeutic targets to treat long Covid, such as JAK inhibitors, which are now under clinical investigation (NCT06597396). This integration of bioinformatics and machine learning not only accelerates discovery but also refines our understanding of complex disease mechanisms, underscoring their importance in shaping future medical research.

10:45-11:00
Augmented kurtosis-based projection pursuit: a novel, advanced machine learning approach for multi-omics data analysis and integration
Confirmed Presenter: Tobias Karakach, Dalhousie University, Faculty of Medicine, Department of Pharmacology, Canada

Format: In Person


Authors List: Show

  • Tobias Karakach, Dalhousie University, Faculty of Medicine, Department of Pharmacology, Canada
  • Fabian Bong, Dalhousie University, Faculty of Computer Science, Canada
  • Nithya Ramakrishnan, Institute of Bioinformatics and Applied Biotechnology (IBAB), India
  • Karla Valenzuela, Dalhousie University, Faculty of Medicine, Department of Pharmacology, Canada
  • Peter Wentzell, Dalhousie University, Faculty of Science, Department of Chemistry, Canada
  • Jasmine Barra, Dalhousie University, Faculty of Medicine, Department of Microbiology and Immunology, Canada

Presentation Overview: Show

Due to the heterogeneity of multi-omics data, obtaining information from them remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with these data, while others require large biological effect sizes to produce meaningful models. Here, we (a) perform a comprehensive benchmarking of multi-omics data analysis tools, and (b) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement approach to model multi-omics data that are derived from next generation sequencing (NGS) and mass spectrometry (MS). Most of the available methods for unsupervised multi-omics integration suffer from an inability to model low-intensity (low count) features and instead focus on high variable (dominant) ones. While low-count features, such as genes involved in signaling, and non-coding RNAs (ncRNA) are associated with high analytical uncertainty, they exhibit significant biological impact upon perturbation.

Methodologically, kPPA is an “unsupervised” data exploration approach that finds patterns in input data without a priori knowledge of class membership. The output of kPPA is projections of the original samples into “interesting” directions, which, when plotted against each other, show clustering of (dis)similar samples. We augment kPPA’s clustering with classification and regression trees (CART), which takes cluster identities derived from k-means classification as input to perform a quasi-supervised classification and decipher feature importance.

Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity features. Moreover, when effect sizes (expected biological differences between conditions) are small, we show that kPPA-CART can recover important biological information better than available approaches. To provide biological context, we have re-analyzed prominent Breast Cancer (BC) data from The Cancer Genome Atlas (TCGA) and show that kPPA-CART identifies novel gene transcripts that provide a classification of BC into Basal, Her2, Luminal A, and Luminal B subclusters better than the original PAM50 panel. We validate these genes with an external set of data and show that the top kPPA-CART panel of genes is associated with poor overall survival for patients with BC for whom these genes are dysregulated. Finally, we provide an R package and an online implementation of kPPA-CART.

11:00-11:15
Using explainable machine learning to enhance breast cancer biomarker discovery in precision medicine
Confirmed Presenter: Ouso Daniel, Centre for Research Training in Genomics Data Science, University College Dublin, Kenya

Format: In Person


Authors List: Show

  • Ouso Daniel, Centre for Research Training in Genomics Data Science, University College Dublin, Kenya
  • Annabelle Nwaokorie, Accenture Labs Bioinnovation, Ireland
  • Luca Costabello, Accenture Labs Bioinnovation, Ireland

Presentation Overview: Show

Breast cancer remains the leading cause of cancer mortality in women despite available interventions; however, information on its major genetic drivers is incomplete. We aimed to identify and quantify the impact of the critical genes in breast cancer (BC) pathology for tailored patient management. Numerous signal transduction networks (STNs) in BC have cross-cutting associations. Until now, survival studies and management interventions often consider a few STNs or genes, thus missing a global perspective integral to BC understanding. We hypothesised that integrating STNs information across major BC networks can improve disease understanding and provide application in precision medicine. We included all known major BC pathology STNs to maximise disease heterogeneity in identifying the critical genes. A bi-directional Kaplan-Meier (KM) survival scanning with log-rank statistics was used to triage genes by their expression patterns and select a statistically significant subset of all pathway genes. Moreover, we evaluated the triaged genes, including clinical features, by modelling overall survival (OS) using Cox’s proportional hazard (CPH) regression – 79.2% accuracy for the best model. The SHapley Additive exPlanations (SHAP) then quantified feature contributions to model overall survival risk (OSR) predictions. The result is 28 most impactful genes, ranked by relevance, from three gene sets corresponding to the different expression patterns. The top three genes per category were validated through literature and databases. Among them were relatively less-studied but potentially critical genes in BC pathology. For example, DKK4 and KREMEN1. Both belong to families negatively regulating the growth-promoting Wnt/β-catenin pathway. A broadened scope of BC heterogeneity was captured by including all known major networks. Ultimately, we demonstrated important implications in BC clinical management by showcasing a quick, intuitive, and robust overview of patient monitoring for potential healthcare applications.

11:15-11:30
Leveraging Artificial Intelligence for Predicting Human-Viral Protein-Protein Interactions: A Benchmarking Study to Address Key Challenges
Confirmed Presenter: Chaima Hkimi, Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR20IPT09), Pasteur Institute of Tunis, Tunisia

Format: In Person


Authors List: Show

  • Chaima Hkimi, Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR20IPT09), Pasteur Institute of Tunis, Tunisia
  • Selim Kamoun, Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR20IPT09), Pasteur Institute of Tunis, Tunisia
  • Oussema Khamessi, Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR20IPT09), Pasteur Institute of Tunis, Tunisia
  • Kais Ghedira, Laboratory of Bioinformatics, Biomathematics and Biostatistics (LR20IPT09), Pasteur Institute of Tunis, Tunisia

Presentation Overview: Show

Introduction: Viral infections pose significant global health challenges, with human-viral protein-protein interactions (HV-PPIs) playing a central role in infection mechanisms and host immune responses. While experimental methods for studying HV-PPIs are resource-intensive, computational approaches, particularly machine learning (ML), offer scalable and efficient alternatives.
Methodology: Here, we present a benchmarking study evaluating the performance of various ML models in predicting HV-PPIs, focusing on three viruses: West Nile Virus (taxon ID: 11082), HIV-1 (taxon ID: 11676), and SARS-CoV-2 (taxon ID: 2697049). We curated positive and negative interaction datasets from six public databases and employed five sequence-based feature encoding methods to represent protein sequences. Six ML classifiers, including SVM and RF, were trained and evaluated using metrics such as accuracy and F1-score.
Results: Our results reveal that dataset imbalance significantly impacts model performance, with balanced datasets (1:1 positive-to-negative ratio) yielding more reliable predictions, emphasizing the value of techniques like SMOTE for handling imbalanced real-world data. Encoding methods significantly influence outcomes, with pseudo-amino acid composition (PAAC) (type I), quasi-sequence-order (QSO), and conjoint-triad (CT) encodings showing better generalization for taxon ID ""11676"". Overfitting was observed in models like GBM, particularly for specific taxonomy IDs, underscoring the need for practices like limiting tree depth and hyperparameter tuning. The primary goal of HV-PPI models is to identify novel interactions. In this study, the SVM model using combination-set features identified 333 human-SARS-CoV-2 interactions, including 75 shared with experimental studies and 82 newly predicted ones. Although SARS-CoV-2 interacts with various host receptors, including ACE2, NRP-1, AXL, CD147, and heparan sulfate, as well as host proteases like FURIN, TMPRSS2, and cathepsins, our interactome revealed potential interactions between the spike (S) protein and TLR4, suggesting a role in antiviral immunity. Additionally, TRIM7 was predicted to interact with NSP12 and NSP7, possibly targeting them for ubiquitination and degradation, which could suppress viral replication. Another key finding was the predicted interaction between ACTN4 and ORF6, which may counteract the antiviral effects of ACTN4-NSP12 binding and facilitate immune suppression and viral replication.
Conclusion: These findings highlight the potential of ML in uncovering new HV-PPIs, offering insights into viral pathogenesis and therapeutic targets. However, challenges such as overfitting and small dataset sizes underscore the need for further refinement of ML models and exploration of alternative learning approaches to enhance predictive accuracy and generalizability.

11:30-11:45
Machine Learning for Hypertension Genetics in African Populations
Confirmed Presenter: Peace Bassey Osim, University of Calabar Nigeria, Nigeria

Format: In Person


Authors List: Show

  • Peace Bassey Osim, University of Calabar Nigeria, Nigeria
  • Blessing Ekpenyong, University of Calabar, Nigeria
  • Anita Yemi-Odae Nelson, University of Calabar Nigeria, Nigeria
  • Bede Anwan, University of Calabar Nigeria, Nigeria
  • Mary E. Kooffreh, University of Calabar, Nigeria

Presentation Overview: Show

Hypertension is a significant public health concern worldwide, affecting over 1.4 billion people and contributing to cardiovascular diseases, stroke, and kidney failure (World Health Organization, 2021). In Africa, the prevalence of hypertension has been increasing, with estimates suggesting that nearly 40% of adults are hypertensive, largely due to genetic predispositions and environmental factors (Adeloye et al., 2021). Despite its growing burden, the genetic underpinnings of hypertension remain poorly understood, especially in African populations that are underrepresented in global genomic studies.
This study leverages machine learning algorithms to identify genetic variants associated with hypertension in African populations. We analyzed genomic data from 1,000 African individuals diagnosed with hypertension and 1,000 normotensive controls. Genotyping was conducted using the Illumina OmniExpress array, capturing approximately 700,000 single nucleotide polymorphisms (SNPs). Various machine learning models, including random forest, support vector machine (SVM), and gradient boosting, were implemented to identify key genetic variants predictive of hypertension.
Our results demonstrate that machine learning models effectively predict hypertension risk based on genetic information, with the random forest model achieving the highest classification accuracy of 85.2%, outperforming both gradient boosting (82.7%) and SVM (79.5%). Notably, the analysis identified several hypertension-associated variants, particularly within the NOS3, AGT, and ACE genes, which have well-established roles in blood pressure regulation. These findings underscore the utility of artificial intelligence in detecting complex genetic patterns that contribute to hypertension susceptibility.
The study highlights the potential of integrating machine learning with genomic research to enhance disease risk prediction and inform personalized medicine strategies tailored to African populations. Unlike traditional genome-wide association studies (GWAS), which primarily focus on linear associations, machine learning algorithms can capture complex, nonlinear interactions among genetic variants, enabling more robust disease modeling. The clinical implications of this research suggest that incorporating machine learning-driven genetic risk assessment into public health frameworks could improve hypertension prevention and treatment strategies, particularly in resource-limited settings.
However, further research is necessary to validate these findings using larger, more diverse datasets and functional analyses of the identified variants. Future studies should explore how environmental factors interact with genetic predispositions to influence hypertension risk and evaluate the translational potential of these predictive models in clinical settings. Additionally, the inclusion of multi-omics data, such as transcriptomic and epigenomic profiles, may further enhance the accuracy of hypertension risk prediction.
Overall, this study underscores the transformative role of artificial intelligence in genomic medicine and emphasizes the need for increased representation of African populations in genetic research. By leveraging machine learning approaches, researchers can uncover novel genetic markers of hypertension and contribute to the development of targeted therapeutic interventions that address the unique genetic architecture of African populations.

11:45-12:00
Integrating multi-omics datasets with machine learning algorithms in developing clinical decision support systems for cancer management
Confirmed Presenter: Itunuoluwa Isewon, Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria, Nigeria

Format: In Person


Authors List: Show

  • Itunuoluwa Isewon, Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria, Nigeria
  • Emmanuel Alagbe, Department of Computer and Information Sciences, Covenant University, P.M.B. 1023, Ota, Ogun State, Nigeria, Nigeria
  • Solomon Rotimi, Department of Biochemistry, Covenant University, P.M.B. 1023, Ota, Ogun State, Nigeria, Nigeria
  • Jelili Oyelade, Department of Computer and Information Sciences, Covenant University, P.M.B. 1023, Ota, Ogun State, Nigeria, Nigeria

Presentation Overview: Show

Multi-omics strategies hold great promise for disease prognosis and diagnosis, offering a more comprehensive understanding of biological systems than single-omics approaches. By integrating multiple layers of biological information, multi-omics analyses enable better identification of disease mechanisms, biomarker discovery, and personalized treatment strategies. Machine learning (ML) algorithms are increasingly applied to these datasets to extract meaningful insights, improve disease detection, predict treatment responses, and identify biomarkers inferring susceptibility to diseases. However, despite the growing interest in multi-omics and ML integration, there is a lack of systematic investigation into how different combinations of omics datasets affect ML model performance in clinical decision support systems.

This study explores the integration of ML algorithms with multi-omics datasets to predict prostate cancer (PCa) treatment outcomes and biochemical recurrence (BCR) using The Cancer Genome Atlas (TCGA) dataset. We evaluated the predictive performance of nine ML algorithms across 63 possible omics combinations, incorporating six omics data types: single nucleotide variation (SNV), copy number variation (CNV), DNA methylation, RNA sequencing (RNA-seq), microRNA sequencing (miRNA-seq), and reverse-phase protein array (RPPA) datasets. To rank these models and omics combinations, we developed a multi-criteria decision scoring system based on key performance metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC).

Our results demonstrate that selective multi-omics integration outperforms indiscriminate aggregation. For PCa treatment outcome prediction, the best-performing combinations were CNV + SNV + DNA methylation + miRNA-seq, SNV + DNA methylation, and CNV + DNA methylation + RPPA. For BCR prediction, SNV + DNA methylation ranked highest, followed by SNV + DNA methylation + miRNA-seq and CNV + SNV + DNA methylation + miRNA-seq + RPPA. Notably, while multi-omics generally improved ML model performance compared to single-omics, the combination of all six omics datasets did not yield the best predictive power. Instead, targeted integration of specific omics types proved more effective. XGBoost (xGB) algorithm consistently outperformed other ML models across both tasks. Feature selection (FS) using elastic-net penalized regression yielded superior results compared to feature extraction (FE) via autoencoders.

To validate our methodology, we applied the same ML framework to TCGA breast cancer (BRCA) multi-omics datasets for PAM50 subtyping. The best-performing omics combinations for BRCA were SNV + DNA methylation + miRNA-seq, CNV + SNV + DNA methylation + RPPA, and SNV + DNA methylation + miRNA-seq + RPPA. Notably, SNV + DNA methylation alone ranked 30th, reinforcing the importance of carefully integrating complementary omics layers. The BRCA validation confirmed that while multi-omics strategies enhance predictive power, dataset size and phenotype balance play crucial roles. XGBoost emerged as the best-performing algorithm, followed by gradient boosting and support vector machines.

In conclusion, this study provides a large-scale investigation into multi-omics data integration with ML for precision oncology. It highlights the need for careful omics selection rather than arbitrary multi-omics aggregation and underscores the importance of addressing class imbalance and feature representation challenges in clinical ML applications. Our findings contribute to the development of more reliable and interpretable AI-driven clinical decision support systems for cancer management.

12:00-12:15
AI in Regulatory Genomics: The Role of Silencer Variants in Human Diseases
Format: In person


Authors List: Show

  • Di Huang, National Institutes of Health (NIH), United States
  • Ivan Ovcharenko, National Institutes of Health (NIH), United States

Presentation Overview: Show

Although disease-causal genetic variants have been found within silencer sequences, we still lack a comprehensive analysis of the association of silencers with diseases. Here, we profiled GWAS variants in 2.8 million candidate silencers across 97 human samples derived from a diverse panel of tissues and developmental time points, using deep learning models.

We show that candidate silencers exhibit strong enrichment in disease-associated variants, and several diseases display a much stronger association with silencer variants than enhancer variants. Close to 52% of candidate silencers cluster, forming silencer-rich loci, and, in the loci of Parkinson's-disease-hallmark genes TRIM31 and MAL, the associated SNPs densely populate clustered candidate silencers rather than enhancers displaying an overall twofold enrichment in silencers versus enhancers. The disruption of apoptosis in neuronal cells is associated with both schizophrenia and bipolar disorder and can largely be attributed to variants within candidate silencers. Our model permits a mechanistic explanation of causative SNP effects by identifying altered binding of tissue-specific repressors and activators, validated with a 70% of directional concordance using SNP-SELEX. Narrowing the focus of the analysis to individual silencer variants, experimental data confirms the role of the rs62055708 SNP in Parkinson's disease, rs2535629 in schizophrenia, and rs6207121 in type 1 diabetes.

In summary, our results indicate that advances in deep learning models for the discovery of disease-causal variants within candidate silencers effectively ""double"" the number of functionally characterized GWAS variants. This provides a basis for explaining mechanisms of action and designing novel diagnostics and therapeutics.

12:15-12:30
Identifying genetic biomarkers of dilated cardiomyopathy using whole exome sequencing data from Southern African patients
Confirmed Presenter: Phelelani T. Mpangase, University of the Witwatersrand, South Africa

Format: In Person


Authors List: Show

  • Phelelani T. Mpangase, University of the Witwatersrand, South Africa
  • Minenhle P. Mayisela, University of the Witwatersrand, South Africa
  • Dineo Mpanya, University of the Witwatersrand, South Africa
  • Megan Shuey, Vanderbilt University, United States
  • Roy Zent, Vanderbilt University, United States
  • Quinn Wells, University of the Witwatersrand, United States
  • Nqoba Tsabedze, University of the Witwatersrand, South Africa

Presentation Overview: Show

The underlying genetic architecture of dilated cardiomyopathy (DCM) in Southern Africa has not been described despite the high prevalence of this condition in patients residing in this region. The availability of multiple “omics” techniques for genomics sequencing, including whole exome sequencing (WES), at reduced costs is slowly enabling the study of many diseases affecting African populations in under-resourced settings. This study was aimed at determining the underlying genetic aetiology of DCM in patients from Southern Africa using WES. A cohort of 100 unrelated patients with heart failure of unknown origin were recruited from Charlotte Maxeke Johannesburg Academic Hospital (CMJAH) and subjected to WES. The cohort consisted of participants of ages between 16 and 77 years (47 years average), of whom 67% were males and 92% identified as black. The median left ventricular ejection fraction was 26.5% (interquartile range: 16 – 37), and late gadolinium enhancement was visualised in 42% of participants. Variant calling was carried out on the WES data following the Genome Analysis Toolkit (GATK) Best Practices for WES data analyses, and the resulting variants annotated using Ensemble’s Variant Effect Predictor (VEP). Through various bioinformatics techniques, in combination with genetic- and clinical-guided interpretations, we identified and prioritised several genetic variants in BAG3, FLNC, DSP, MYH7 and TTN genes that have potential roles in the pathogenicity of DCM. This study not only presents potential DCM causal variants but also lays foundation for WES data analyses workflows for similar studies utilising WES to determine the underlying genetic aetiology of diseases in the under-resourced African settings.

12:30-12:45
GEMINI: A Breakthrough System for Robust Gene Regulatory Network Discovery, Enabling the Application of GRNs to Industrial Level Genetic Engineering
Confirmed Presenter: Ridhi Gutta, Academies of Loudoun, United States

Format: Live Stream


Authors List: Show

  • Ridhi Gutta, Academies of Loudoun, United States

Presentation Overview: Show

In order to resolve crucial global issues, the widespread application of genetic engineering at an industrial level is key. Effective genetic engineering at an industrial scale hinges heavily on precise cellular control of the microorganism at hand. However, the majority of synthetically engineered strains fail at the industrial level due to disruptions in gene regulation. This stems from a lack of understanding and usage of gene regulatory networks (GRNs), which control cellular processes and metabolism. Research shows that effective manipulation of host GRNs and effective introduction of synthetic GRNs can improve product yield and functionality significantly. However, current GRN inference tools are extremely slow, inaccurate, and incompatible with industrial scale processes, because of which there are no complete expression based GRNs for any commonly used organism, limiting the application of GRNs as a practical tool in genetic engineering at the industrial level. This research proposes a novel computational system, GEMINI, to enable fast and efficient GRN inference for integration into industrial scale pipelines. GEMINI consists of two main parts. First, I create a novel information theoretic algorithm that replaces traditional sequential inference and calculation methods, ensuring compatibility with parallel processing. Second, I integrate a novel GNN architecture based on spectral convolution to bypass intensive eigenvalue computation and efficiently learn global and local regulatory structures. On the DREAM4 and DREAM5 in silico benchmarks, GEMINI outperforms all industry leaders in terms of AUROC and AUPRC, achieving a nearly 300% increase in AUPRC compared to the industry leading method, GENIE3. When applied on a real biological E. coli dataset, GEMINI not only recovered 98% of existing interactions, but discovered 468 novel candidate interactions, which were validated against literature. Thus, GEMINI was able to construct the most complete expression based GRN of E. coli to date, providing a novel biological blueprint for genetic engineers to use at the industrial level. GEMINI removes reliance on expensive computing equipment and enables fast and accurate GRN inference for the first time, opening doors to more efficient gene expression control and metabolic pathway manipulation for more effective application of genetic engineering at an industrial level.

12:45-13:00
African Bioinformatics Institute Engagement Session
Format: In person


Authors List: Show

  • African Institute

Presentation Overview: Show

In this session we will provide a brief overview of the ABI and current status. This will be followed by Q&A and open discussion

13:30-14:30
Poster Presentation
Format: In person


Authors List: Show

14:45-15:00
Emergence of novel mosaic G9P[6] rotaviruses through multiple intragenogroup reassortment events post vaccine introduction in Blantyre Malawi
Confirmed Presenter: Chimwemwe Mhango, Malawi-Liverpool-Wellcome Programme, Malawi

Format: In Person


Authors List: Show

  • Chimwemwe Mhango, Malawi-Liverpool-Wellcome Programme, Malawi
  • End Chinyama, Malawi-Liverpool-Wellcome Programme, Malawi
  • Ernest Matambo, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, UK., Malawi
  • Landilani Gauti, Malawi University of Science and Technology, Malawi
  • Flywell Kawonga, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, UK., Malawi
  • Benjamin Kumwenda, School of Life Sciences and Allied Health Professions, Kamuzu University of Health Sciences, Blantyre, Malawi., Malawi
  • Arox Kamng'Ona, School of Life Sciences and Allied Health Professions, Kamuzu University of Health Sciences, Blantyre, Malawi., Malawi
  • Celeste Donato, The Peter Doherty Institute for Infection and Immunity, 792 Elizabeth Street , Melbourne, Victoria , Australia, Malawi
  • Khuzwayo Jere, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, UK., Malawi

Presentation Overview: Show

Background: Rotavirus remains a leading cause of severe gastroenteritis in children under five, particularly in low- and middle-income countries (LMICs). In Malawi, G9P[6] strains re-emerged in 2017, five years after the introduction of Rotarix rotavirus vaccine, necessitating an in-depth investigation of their genetic diversity, evolutionary origins, and public health implications.

Methods: Using whole-genome sequencing (WGS), we analysed and assigned complete genotype constellations and employed phylogeographic and phylogenetic network analyses to trace the evolutionary pathways of G9P[6] strains (n=11) between 2017 to 2022.

Findings: The re-emergent G9P[6] strains were characterised by a DS-1-like G9-P[6]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Phylogeographic analysis of the VP7 gene revealed monophyletic clustering with contemporary G9P[6] strains from Mozambique. Phylogenetic network analysis demonstrated high genetic similarity of the inner capsid and non-structural genes of G9P[6] strains to previously circulating Malawian G2P[4], G2P[6], G3P[4], and G3P[6] strains. Time-resolved phylogenies dated the most recent common ancestor for the inner capsid and non-structural genes between 2009–2015. Evolutionary analysis suggested lineage spillover events associated with the VP6 segment

Conclusion: This study, for the first time in Malawi, elucidates the role of reassortment and zoonotic transmission in the re-emergence of G9P[6] strains. These findings highlight the evolutionary dynamics of rotaviruses and the need for continuous genomic surveillance. Considering the limited heterotypic protection provided by the Rotarix (G1P[8] strain) vaccine, tailored vaccination strategies and ongoing vaccine effectiveness studies are critical to addressing the emergence of novel rotavirus strains and improving vaccine performance in LMICs

15:00-15:15
Association analyses reveal susceptibility variants linked to Parkinson's disease in the South African population
Confirmed Presenter: Kathryn Step, Division of Molecular Biology and Human Genetics, Stellenbosch University, South Africa, South Africa

Format: In Person


Authors List: Show

  • Kathryn Step, Division of Molecular Biology and Human Genetics, Stellenbosch University, South Africa, South Africa
  • Thiago Peixoto Leal, Genomic Medicine, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States, United States
  • Emily Waldo, Genomic Medicine, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States, United States
  • Lusanda Madula, Division of Molecular Biology and Human Genetics, Stellenbosch University, South Africa, South Africa
  • Yolandi Swart, Division of Molecular Biology and Human Genetics, Stellenbosch University, South Africa, South Africa
  • Carlos F. Hernández, Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Santiago, Chile, Chile
  • Sara Bandres-Ciga, Center for Alzheimer's and Related Dementias (CARD), National Institutes of Health, Bethesda, USA, United States
  • Jonggeol Jeffrey Kim, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States
  • Ignacio F. Mata, Genomic Medicine, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States, United States
  • Soraya Bardien, Division of Molecular Biology and Human Genetics, Stellenbosch University, South Africa, South Africa

Presentation Overview: Show

Stemming from a complex etiology that includes a strong genetic component (1), Parkinson's disease (PD) is a neurodegenerative disorder characterized by a wide range of both motor and non-motor symptoms (2). The burden of PD is increasing rapidly within the aging Sub-Saharan African populations, ranking as the 11/12th most prevalent nervous system disorder in the region (3). Despite the rise in disease prevalence, the representation of African populations in PD genetic research remains limited. Allele frequencies vary across genomes due to factors such as natural selection, genetic drift, and differing exposures to environments and pathogens. These variations in allele frequencies can help identify population-specific disease risk variants in admixed individuals while simultaneously uncovering risk variants relevant to multiple populations (4).

Genome-wide association studies (GWAS) have successfully identified susceptibility variants linked to PD (5,6). However, the majority of these studies have focused on European cohorts with few including diverse ancestries. Using genotyped and imputed data from 1,516 South African participants, we conducted a GWAS using SAIGE software, which includes the genetic relationship matrix as a random effect, allowing for the inclusion of related individuals. Moreover, we inferred global and local ancestry for the cohort to better understand the genetic admixture in the South African population and further investigate the GWAS results. Our GWAS findings were replicated using a Latin American cohort. The ancestry inference showed the South African cohort to be five-way admixed between the European (EUR; 56%), African (AFR; 18.8%), indigenous Khoe-San Nama ancestry (NAMA; 13%), South Asian (SAS; 6.9%), and Malaysian (MAL; 5.2%) ancestries. The GWAS identified one variant with a genome-wide significance and 351 variants with a suggestive significance. Of these, 14 variants replicated in the Latin American cohort. In the local ancestry window containing the top GWAS hit, 86.7% of the variant carriers were inferred to have AFR, 11% NAMA, and 2.2% MAL ancestries. No carriers exhibited EUR or SAS inferred ancestry. This suggests that the variant is ancestry-specific and highlights the value of including populations previously underrepresented in PD genetic research to reveal novel susceptibility variants. Our findings contribute to a global understanding of the complex genetic etiology of PD.

1. Trevisan, L. et al. Genetics in Parkinson’s disease, state-of-the-art and future perspectives. Br. Med. Bull. 149, 60–71 (2024).
2. Armstrong, M. J. & Okun, M. S. Diagnosis and treatment of Parkinson disease: A review: A review. JAMA 323, 548–560 (2020).
3. GBD 2021 Nervous System Disorders Collaborators. Global, regional, and national burden of disorders affecting the nervous system, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Neurol. 23, 344–381 (2024).
4. Swart, Y. et al. Local ancestry adjusted Allelic association analysis robustly captures tuberculosis susceptibility loci. Front. Genet. 12, 716558 (2021).
5. Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
6. Kim, J. J. et al. Multi-ancestry genome-wide association meta-analysis of Parkinson’s disease. Nat. Genet. 56, 27–36 (2024).

15:45-16:00
Biphasic Middle East respiratory syndrome coronavirus incidence in dromedary camels in Northern Kenya, 2022 - 2023
Confirmed Presenter: Brian Ogoti, University of Nairobi, Center of Epidemiological Modelling and Analysis, Kenya

Format: In Person


Authors List: Show

  • Brian Ogoti, University of Nairobi, Center of Epidemiological Modelling and Analysis, Kenya
  • Victor Riitho, University of Nairobi, Center of Epidemiological Modelling and Analysis, Kenya
  • Johanna Wildemann, Charité –Universitätsmedizin Berlin, Germany
  • Nyamai Mutono, University of Nairobi, Center of Epidemiological Modelling and Analysis, Kenya
  • Julia Tesch, Charité –Universitätsmedizin Berlin, Germany
  • Jordi Rodon, Charité –Universitätsmedizin Berlin, Germany
  • Kaneemozhe Harichandran, Charité –Universitätsmedizin Berlin, Germany
  • Jackson Emanuel, Charité –Universitätsmedizin Berlin, Germany
  • Elisabeth Möncke-Buchner, Charité –Universitätsmedizin Berlin, Germany
  • Stella Kiambi, Food and Agriculture Organization, Kenya
  • Julius Oyugi, University of Nairobi Tropical and Infectious diseases, Kenya
  • Marianne Mureithi, University of Nairobi, Kenya
  • Victor Corman, Charité –Universitätsmedizin Berlin, Germany
  • Christian Drosten, Charité –Universitätsmedizin Berlin, Germany
  • Samuel Thumbi, University of Nairobi, Center of Epidemiological Modelling and Analysis, Kenya
  • Marcel Müller, Charité –Universitätsmedizin Berlin, Germany

Presentation Overview: Show

Introduction
Middle East respiratory syndrome coronavirus (MERS-CoV) is endemic in dromedary camels from the Arabian Peninsula and Africa with comparably high seroprevalence of >75%. High camel population density and the loss of maternal antibodies in farmed camel calves are linked to acute MERS-CoV outbreaks. Investigations into MERS-CoV outbreak patterns in nomadic camels are challenged by limited infrastructures in remote and resource-restricted camel migration regions.

Study Objective
We performed a continuous 12-month study at an abattoir hub for nomadic camels in Northern Kenya. We investigated MERS-CoV incidence in migrating camels and determined genomic diversity of contemporary MERS-CoV variants.

Methods
We collected nasal swabs from 10-15 camels 4-5 days per week from September 2022 to September 2023, totalling 2711 camels sampled during the period in the main abattoir in Isiolo County, Kenya. The samples were tested for MERS-CoV RNA using UpE and ORF1a RT-qPCR. Genomic diversity was assessed using Illumina next-generation sequencing (NGS) and ORF1ab domain assembly for RNA samples with >1x106 genome copies/ml.

Results
MERS-CoV RNA was detected in 36/2711 (1.3%) nasal swabs. MERS-CoV incidence was biphasic with detection peaks in the respective first week of October 2022 (7/60, 11.7%) and February 2023 (7/58, 12.1%). The cumulative MERS-CoV RNA positivity rate was higher in September–October 2022 with 19/381 (5.0%) compared to 17/727 (2.3%) in January–March 2023. For 9/36 MERS-CoV RNA-positive samples ORF1ab sequences were obtained, and phylogenetic analysis were performed. The sequences formed a distinct clade from other Clade C viruses but clustered with Clade C2.2, mostly prevalent in East Africa. The 9 ORF1ab sequences were highly similar (>99.93% nucleotide identity) and had 99.75–99.78% nucleotide identity with the closest MERS-CoV relative identified in Akaki, Ethiopia, in 2019.

Conclusion
The biphasic MERS-CoV incidence in nomadic camels may be linked to seasonality factors, such as the biannual alternating wet and dry seasons in Northern Kenya. Interestingly, camel calves are primarily born during the two wet seasons and maternal antibody loss coincides with the observed two MERS-CoV RNA detection peaks. Phylogenetic analysis suggests that we identified at least 3 MERS-CoV clusters over 3 different weeks in dromedaries originating from different locations.

16:15-16:30
PREMEDIT: A Centralized Platform for Genetic Diseases and Therapeutic Solutions in Tunisia
Confirmed Presenter: Nessrine Mezzi, Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Tunis El Manar University, Tunisia

Format: In Person


Authors List: Show

  • Nessrine Mezzi, Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Tunis El Manar University, Tunisia
  • Imen Abdallah, Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Tunis El Manar University, Tunisia
  • Maroua Louati, Technologies et Services de l’Information, Technoriat, Tunisia
  • Nejla Abassi, Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Tunis El Manar University, Tunisia
  • Abdelwaheb Sifi, MO2 Advisory & Invest, Tunisia
  • Ridha Mrad, Department of Congenital and Hereditary Diseases, Charles Nicolle Hospital, Tunis, Tunisia, Tunisia
  • Mediha Trabelsi, Department of Congenital and Hereditary Diseases, Charles Nicolle Hospital, Tunis, Tunisia, Tunisia
  • Lilia Romdhane, Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis / Faculty of Sciences of Bizerte, Tunisia

Presentation Overview: Show

Genetic diseases (GDs) pose a significant public health challenge as they are a leading cause of morbidity and premature death. In Tunisia, there are 589 reported GDs, with more than 60% being autosomal recessive. In 40% of these cases, the molecular etiology is unknown, highlighting the urgent need for advanced genetic research and diagnostic tools. Addressing this gap is crucial for improving patient outcomes and guiding therapeutic interventions. A combination of manual and AI-assisted text mining from the literature is used to collect complex genetic data on GDs in Tunisia, ensuring data integrity. Python and R scripts are employed for data validation and biological database queries. Bioinformatic approaches, including AI, are being utilized for in-silico drug (re)discovery. Cutting-edge technologies support the development of the PREMEDIT platform. To date, more than 600 GDs have been identified in Tunisian patients, with approximately 1,000 mutations, representing the most comprehensive mutatome of the Tunisian population. Data analyses revealed the scarcity of epidemiological data and treatments for rare GDs. The genetic, epidemiological, and pharmaceutical data have been integrated into a centralized platform: PREMEDIT. By consolidating comprehensive data on genetic mutations and their correlation with specific treatments, PREMEDIT aims to enhance diagnosis and tailor therapeutic strategies for the Tunisian population. The integration of AI not only refines data accuracy but also facilitates the efficient identification of complex genetic patterns, empowering the platform to provide more precise diagnostic and therapeutic recommendations. This platform serves as a crucial resource for healthcare professionals, researchers, and policymakers, bridging the gap between genetic research and clinical practice. PREMEDIT will contribute to innovation across biomedical communities as well as pharmaceutical companies, improving the quality of life for patients.

17:00-17:15
Integrating Multiple Learning Strategies into the PHA4GE Wastewater Surveillance Bioinformatics Course
Confirmed Presenter: Keaghan Brown, Public Health Alliance for Genomic Epidemiology, South Africa

Format: In Person

Moderator(s): Faisal M. Fadlelmola,


Authors List: Show

  • Keaghan Brown, Public Health Alliance for Genomic Epidemiology, South Africa
  • Farzaana Diedericks, Public Health Alliance for Genomic Epidemiology, South Africa
  • Tracey Calvert-Joshua, Public Health Alliance for Genomic Epidemiology, South Africa

Presentation Overview: Show

One of the main objectives of the Public Health Alliance of Genomic Epidemiology (PHA4GE) is to equip those working in the public health sector with the necessary skills and knowledge to efficiently and effectively respond to disease outbreaks. A key component of our courses is the inclusion of domain experts from diverse communities, whose insights and experiences provide invaluable perspectives that enrich the learning process and ensure that what is being integrated is relevant for the target group. For our wastewater surveillance bioinformatics course, participants are presented with an outbreak scenario which imitates the process of forming a response in the case of a real-world event, with the overarching aim of using the skills and knowledge acquired to build a wastewater surveillance health dashboard.

The foundational concepts such introduction to wastewater surveillance epidemiology, and waste water surveillance bioinformatics use a pedagogical approach (Bansal et al., 2020), where lessons are broken down into short, modular presentations. A combined direct learning model and cooperative learning model (Guzmán and Payá, 2020) allow participants to rapidly engage with the material to grasp concepts such as public health and community impact and to develop fundamental analytical skills. Andragogical strategies (Bansal et al., 2020) such as outbreak case studies and discussions dispersed between modules, further enhance the peer learning (Tullis and Goldstone, 2020) process by promoting teamwork and idea sharing through the application of acquired knowledge.

The main learning approach involves following an avatar who needs to solve day-to-day bioinformatics problems. This incorporates an interactive and playful learning experience, where tasks become progressively more challenging, thus incorporating both narrative and game-based learning approaches (Breien and Wasson, 2021).

For the final project, participants engage in a heutagogical exercise, where they are required to build a dashboard in pairs, incorporating all their accumulated skills and knowledge. Participants have to decide which of the principles, techniques and tools they have learned throughout the course are relevant to the assignment. Heutagogy is also a key component of online curriculum, empowering participants to take increasing responsibility for their own learning (Mwinkaar and Lonibe, 2024). This not only develops technical proficiency but also facilitates self-directed learning and problem-solving. Additionally, this reinforces the effective application of bioinformatics solutions in wastewater surveillance.

These are just some of the learning approaches which have consolidated diverse teaching strategies and models, facilitating the inclusivity of people with different learning styles, varying attention spans, and time constraints. The course’s modular design allows for focused exploration of topics relevant to public health practitioners and academics. The training program is designed so that participants are expected to systematically and progressively develop expertise in public health and computational techniques.

17:15-17:30
Surveillance Capacity Building through Pathogen Genomics and Bioinformatics Training Across Africa
Confirmed Presenter: Siddiqah George, NGS Academy for the Africa CDC, South Africa

Format: In Person


Authors List: Show

  • Siddiqah George, NGS Academy for the Africa CDC, South Africa
  • Kirsty Lee Garson, NGS Academy for the Africa CDC, South Africa
  • Tony Yiqun Li, NGS Academy for the Africa CDC, South Africa
  • Perceval Maturure, NGS Academy for the Africa CDC, South Africa
  • Nicola Mulder, NGS Academy for the Africa CDC, South Africa

Presentation Overview: Show

Abstract: The recent emergence and re-emergence of infectious diseases in Africa highlight the critical need for robust pathogen genomic surveillance systems across the continent. Effective surveillance depends on comprehensive training and capacity development in pathogen genomics and bioinformatics, as rapid public health responses to disease outbreaks rely on continuously enhancing these essential skills. To ensure quality and consistency in training, the development and implementation of a standardised curriculum are crucial; enabling uniform skill-building and knowledge dissemination across diverse regions.
Over the past four years, we have delivered hybrid training in pathogen genomic surveillance and bioinformatics to over 290 participants from 36 African countries. These initiatives, tailored to diverse personas in national public health institutions, leveraged trainers and facilitators from across the continent to address varying competency levels. We have also developed and implemented resources to support our training initiatives, including a user-friendly helpdesk ticketing system, a robust trainer database, and intuitive websites hosting training materials. These tools work jointly to ensure that training and related resources are widely accessible, while also providing participants with support and engagement opportunities long after receiving training.
To ensure consistency in the training of public health staff in Africa, a standardised pathogen genomics surveillance training curriculum has been developed. The curriculum is designed to serve as a comprehensive resource for trainers, encompassing content that ranges from foundational courses in generic, wet-lab, and bioinformatics topics to advanced pathogen-specific courses that include tailored genomic surveillance workflows. The next step is implementing this curriculum in future training initiatives across African public health institutes. Additionally, we are exploring the integration of AI in pathogen genomics curriculum development and training.
Our training efforts have highlighted the need for ongoing training and capacity building in pathogen genomic surveillance in Africa. A standardised curriculum can be used in addressing this need and facilitate consistent skills development and collaboration across the continent’s public health institutes. Implementing this curriculum and exploring AI-driven training and decision-making will enhance preparedness for future disease outbreaks and public health responses.

17:30-17:45
Ethics and governance of AI-powered genomics
Confirmed Presenter: Tendayi Mutangadura, University of Cape Town, South Africa

Format: In Person


Authors List: Show

  • Tendayi Mutangadura, University of Cape Town, South Africa
  • Nchangwi Munung, University of Cape Town, South Africa
  • Nicola Mulder, University of Cape Town, South Africa

Presentation Overview: Show

The integration of artificial intelligence (AI) into genomics promises substantial advancements in personalised medicine, diseases prediction, gene editing but it also presents critical ethical and governance challenges. This study explores these challenges by addressing three main research questions: (1) What are the primary ethical concerns related to AI applications in genomics, including privacy, consent, and bias? (2) How are current governance structures addressing or failing to address these issues? and (3) How can effective governance frameworks be established to ensure responsible, equitable, and transparent use of AI in this field? Using a mixed-methods approach that includes a systematic literature review, expert interviews, and case analysis, the study examines the ethical risks and governance gaps in AI-driven genomic research. Findings indicate significant concerns around data privacy, potential misuse of genetic information, and the exacerbation of existing health disparities due to biased data and algorithms. Additionally, existing regulatory frameworks lack sufficient guidelines to address algorithmic accountability, data ownership, and inclusive representation within genomic datasets.The study concludes by recommending a multi-stakeholder governance model that emphasizes transparency, fairness, and adaptability. This framework would involve guidelines for data handling, bias mitigation, and global collaboration among governments, private sectors and global health organizations. It provides actionable steps to establish ethical oversight in the evolving landscape of AI-driven genomics.These recommendations aim to enhance public trust and ensure that AI’s role in genomics aligns with ethical standards that protect individual rights and foster equitable health outcomes.

17:45-18:00
Developing an information system for integrating clinical and genomic infectious disease data in Tanzania
Confirmed Presenter: Melkiory Beti, Kilimanajaro Clinical Research Institute (KCRI), Tanzania

Format: In Person


Authors List: Show

  • Melkiory Beti, Kilimanajaro Clinical Research Institute (KCRI), Tanzania
  • Patrick Kimu, Kilimanjaro Clinical Research Institute (KCRI), Tanzania
  • Boaz Wadugu, Kilimanjaro Clinical Research Institute (KCRI), Tanzania
  • Willfred Senyoni, University of Dar es Salaam, Tanzania
  • Tolbert Sonda, Kilimanjaro Clinical Research Institute (KCRI), Tanzania

Presentation Overview: Show

Background
Infectious diseases continue to present significant public health issues in low- and middle-income countries like Tanzania, where the integration of clinical and genomic data is important for better disease diagnosis and surveillance. However, existing health information systems mostly operate in different sources limiting the ability to connect clinical data with genomic data for better patient diagnosis and infectious disease control. To tackle these challenges we developed an integrated information system that combines clinical data collected in a customized District Health Information System2 (DHIS2) with genomic data generated from Nanopore sequencing. The system aims to integrate these data, aiding clinicians and laboratory scientist in identifying multiple pathogens from a single patient sample and public health researchers in viewing infectious disease patterns.
Methods
Clinical data, including patient demographics and symptoms such as fever and diarrhoea, were collected from healthcare facilities using a customised DHIS2, an open-source software widely used for health data collection in Tanzania. R programming language scripts were used to securely fetch clinical data from DHIS2 using the DHIS2 API and integrate it with genomic data results that were produced from the analysis of the cgetools bioinformatics pipeline. This pipeline uses tools such as KmerFinder for pathogen identification, supporting the detection of the diverse pathogens from a single sample. R's shiny web framework was used to build an interactive web interface allowing the user to search for patient IDs on the system to view detailed clinical data alongside genomic data that displays the identified pathogens.
Results
The developed system successfully processed and integrated 21 datasets, connecting clinical information with genomic output results. The datasets included key clinical variables such as patient symptoms like fever and diarrhoea, gender, and the region of origin, while linked genomic data showed pathogens identified from patient samples. Proving an interactive web interface for users to search for patient IDs to view detailed clinical records alongside genomic data and also has features for interactive data visualisation capabilities, including bar graphs that show trends in pathogen occurrence according to Tanzania regions, enabling epidemiological monitoring and outbreak
Discussion
The developed system for integrating data shows several critical insights regarding the potential of clinical-genomic data integration in infectious disease control. The use of an open-source health information system like DHIS2 demonstrated the feasibility of leveraging existing digital health data collection software to enhance data integration in healthcare. Additionally, the application of the cgetools pipeline for pathogen detection proved effective in identifying multiple pathogens from a single sample. The integration process shows the significance in support real-time clinical decision-making. The interactive visualization tools provided valuable information on pathogen distribution patterns, emphasising their important role in outbreak detection and response.
Conclusion
By integrating clinical data from DHIS2 with genomic sequencing outputs, this system offers a powerful tool for infectious disease surveillance in Tanzania. It supports the identification of different pathogens, enabling timely diagnosis and supporting infectious disease control. The system’s flexible, scalable design makes it suitable for applications in infectious disease management across healthcare settings.

Wednesday, April 16th
8:30-9:00
ASBCB Town Hall
Format: In person


Authors List: Show

9:00-9:15
textToKnowledgeGraph: Generation of Molecular Interaction Knowledge Graphs Using Large Language Models for Exploration in Cytoscape
Confirmed Presenter: Favour James, Department of Electronic and Electrical Engineering, Obafemi Awolowo University, Ile-Ife, Osun, Nigeria., Nigeria

Format: Live Stream

Moderator(s): Dominique Anderson


Authors List: Show

  • Favour James, Department of Electronic and Electrical Engineering, Obafemi Awolowo University, Ile-Ife, Osun, Nigeria., Nigeria
  • Christopher Churas, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Trey Ideker, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Dexter Pratt, Department of Medicine, University of California San Diego, La Jolla, CA, United States., United States
  • Augustin Luna, National Library of Medicine and National Cancer Institute, Bethesda, MD, USA, United States

Presentation Overview: Show

Motivation
Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to intuitively represent data and improve query performance across heterogeneous datasets. However, constructing KGs from unstructured scientific literature remains challenging due to the high cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process but have limitations that impact their ability to capture complex biological interactions fully.
Traditional text-mining methods struggle with understanding context across sentences. Additionally, these methods lack expert-level background knowledge, making it difficult to infer relationships that require awareness of biological concepts indirectly described in the text.
Large Language Models (LLMs) present an opportunity to overcome these challenges. LLMs are trained on large amounts of diverse biological literature, equipping them with contextual knowledge that enables more accurate extraction. Additionally, LLMs can process the entirety of an article’s text, capturing relationships across several sections rather than analyzing sentences in isolation; this allows for more precise extraction.

Results
To address these challenges, we present textToKnowledgeGraph (https://pypi.org/project/texttoknowledgegraph), an artificial intelligence (AI) tool using LLMs to extract interactions from individual publications directly in Biological Expression Language (BEL). BEL was chosen for its compact and detailed representation of biological relationships, allowing for structured and computationally accessible encoding. The tool provides two usage modes: 1) a Python package usable through the command line or within other projects, or 2) an interactive application within Cytoscape Web to simplify extraction and online exploration. In the text processing pipeline, we leverage LangChain with GPT-4o for information extraction using a predefined schema implemented with Pydantic to ensure structured outputs for BEL generation. The extracted BEL statements are outputted in CX2 format, enabling visualization and exploration within the Cytoscape ecosystem. Additionally, the ndex2 package is used for CX2 conversion and to support optional storage and sharing of extracted networks on NDEx. In this initial version of textToKnowledgeGraph, we only support the extraction of interactions into BEL. Future updates will enable greater customization, making it more adaptable for broader applications.
To evaluate the accuracy of extracted interactions, we applied textToKnowledgeGraph to various published articles. The extracted interactions were manually reviewed by BEL experts, ensuring the biological accuracy and completeness of captured relationships. Finally, we present a use case example in which a topic-specific BEL knowledge graph provides relevant information to augment queries to an LLM using a technique known as Graph Retrieval Augmented Generation (Graph RAG).

9:15-9:30
Prelude - Building Research Data Platforms Incrementally: A Guide for Teams of All Sizes
Confirmed Presenter: Mitchell Shiell, Ontario Institute of Cancer Research (OICR), Canada

Format: In Person


Authors List: Show

  • Mitchell Shiell, Ontario Institute of Cancer Research (OICR), Canada
  • Jon Eubank, Ontario Institute of Cancer Research (OICR), Canada
  • Justin Richardsson, Ontario Institute for Cancer Research (OICR), Canada
  • Leonardo Rivera, Ontario Institute for Cancer Research (OICR), Canada
  • Brandon Chan, Ontario Institute for Cancer Research (OICR), Canada
  • Robin Haw, Ontario Institute for Cancer Research (OICR), Canada
  • Lincoln Stein, Ontario Institute for Cancer Research, Canada
  • Melanie Courtot, Ontario Institute for Cancer Research, Canada
  • Overture Team, Ontario Institute of Cancer Research (OICR), Canada

Presentation Overview: Show

Large-scale data platforms enable researchers and the public to access, manage and study massive amounts of genomics data. While small research teams can generate these massive datasets, they often struggle to build the platforms needed for transparent and reproducible FAIR data management and sharing.

We built Overture, a suite of reusable, open-source software to develop reliable data management systems quickly, flexibly and at multiple scales. Overture successfully underpins many large-scale international data platforms, including ICGC-ARGO which aims to store genomic and clinical data for over 100,000 participants and VirusSeq which hosts data for over 500,000 pathogen genomes. Behind these platforms are large organizations with large teams that plan, develop and deploy the Overture suite with relative ease. Yet, this can be prohibitively demanding for smaller research groups. How can we help them build data platforms more efficiently and with fewer resources?

We address this challenge with Prelude, a tool that enables teams to incrementally build their data platforms by breaking down development into systematic phases. Prelude focuses on solving a specific challenge in platform adoption: the high technical overhead and configuration burden required during the planning and development stages. By breaking down data portal development into phased steps, teams can systematically verify requirements through hands-on testing, which provides clear insights into user workflows, data needs, and overall platform fit.

Prelude guides teams through three progressive phases of data platform development. Each phase builds upon the previous one's foundation and can be deployed locally with a single command:

- Phase one focuses on data exploration and theming, enabling teams to visualize and search their tabular data through a customizable portal UI;
- Phase two expands capabilities to enable tabular data management and validation with persistent storage;
- Phase three adds file management and object storage.

Prelude also includes configuration generation services that validates and transforms your data into key configuration files; this greatly reduces time spent doing tedious manual configurations.

With early adopters reporting significant reductions in configuration time, Prelude is enabling teams to transition through initial planning and development stages efficiently. Looking ahead, we are focused on enabling teams to independently transition to production settings. We are sharing this work to gather community feedback on our approach and learn from others' experiences. Prelude represents a practical step toward making data platform development accessible to research teams with limited resources, reducing initial barriers so teams can do more with less.

9:30-9:45
The African Population Ontology (AfPO): Building a Framework for representing African Populations
Confirmed Presenter: Melek Chaouch, Laboratory of BioInformatics, bioMathematics and bioStatistics (BIMS) Institut Pasteur de Tunis, Tunisia

Format: In Person


Authors List: Show

  • Melek Chaouch, Laboratory of BioInformatics, bioMathematics and bioStatistics (BIMS) Institut Pasteur de Tunis, Tunisia
  • Anita R. Caron, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, United Kingdom
  • Afrigen-D The African Population Ontology Project, AfriGen-D, Tunisia
  • Nicola Mulder, Computational Biology Faculty of Health Sciences University of Cape Town, South Africa
  • Danielle Welter, Luxembourg National Data Service, Luxembourg
  • Alia Benkahla, Laboratory of BioInformatics, bioMathematics and bioStatistics (BIMS) Institut Pasteur de Tunis, Tunisia

Presentation Overview: Show

Africa is woven together by population movements, ethno-linguistic diversity, and a unique genetic heritage. Recently, a number of genetics/genomics projects have arisen on the continent, such as those developed in the framework of the Human Heredity and Health in Africa (H3Africa) initiative. With significant ethnic diversity in the continent, researchers are faced with difficulties in defining a population or unit that represents a social group in a standardized format in Africa. Here we developed an African Population Ontology (AfPO) framework that aims to structure this knowledge in a harmonized and standardized way, in order to describe African populations and sub-populations. We used publicly available data related to African populations and their demography, geographic localization, spoken language and genetic background. The Webprotégé platform was used to design and implement this ontology. The country of origin and the populations were selected as descriptive classes. The AfPO was validated by the OBO foundry community and is available in both Github and EBI-OLS. The AfPO enables the annotation of African population groups, and brings together knowledge accumulated about existing populations with their genetic fingerprint in a standardized format; it can be employed to comprehensively annotate African participants in research studies. It can also be used to describe participants of past studies, by mapping them to population identifiers or synonyms. The ontology produced is essential to the study of the history of African populations and their genetics, and is therefore invaluable in addressing public health issues, promoting cultural preservation and fostering a more nuanced appreciation of Africa's unique place in human history.

9:45-10:00
Prostruc: An Open-source Tool for 3D Structure Prediction using Homology Modeling
Confirmed Presenter: Olaitan Awe, African Society for Bioinformatics and Computational Biology, Cape Town, South Africa, South Africa

Format: In person


Authors List: Show

  • Shivani Pawar, Department of Biotechnology and Bioinformatics, Deogiri College, Auranagabad, Maharashtra, India, India
  • Wilson S.K. Banini, Department of Theoretical and Applied Biology, Kwame Nkrumah University of Science and Technology,Ghana, Ghana
  • Shamsuddeen Musa, Faculty of Health Sciences, Department of Public Health, National Open University of Nigeria, Nigeria
  • Toheeb Jumah, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco, Morocco
  • Nigel Dolling, Department of Parasitology, Noguchi Memorial Institute for Medical Research, University of Ghana, Legon, Ghana
  • Abdulwasiu Tiamiyu, School of Collective Intelligence, University Mohammed VI Polytechnic, Rabat, Morocco, Morocco
  • Olaitan Awe, African Society for Bioinformatics and Computational Biology, Cape Town, South Africa, South Africa

Presentation Overview: Show

Introduction: Homology modeling is a widely used computational technique for predicting the three-dimensional (3D) structures of proteins based on known templates,evolutionary relationships to provide structural insights critical for understanding protein function, interactions, and potential therapeutic targets. However, existing tools often require significant expertise and computational resources, presenting a barrier for many researchers.
Methods: Prostruc is a Python-based homology modeling tool designed to simplify protein structure prediction through an intuitive, automated pipeline. Integrating Biopython for sequence alignment, BLAST for template identification, and ProMod3 for structure generation, Prostruc streamlines complex workflows into a user-friendly interface. The tool enables researchers to input protein sequences, identify homologous templates from databases such as the Protein Data Bank (PDB), and generate high-quality 3D structures with minimal computational expertise. Prostruc implements a two-stage vSquarealidation process: first, it uses TM-align for structural comparison, assessing Root Mean Deviations (RMSD) and TM scores against reference models. Second, it evaluates model quality via QMEANDisCo to ensure high accuracy.
Results: The top five models are selected based on these metrics and provided to the user. Prostruc stands out by offering scalability, flexibility, and ease of use. It is accessible via a cloud-based web interface or as a Python package for local use, ensuring adaptability across research environments. Benchmarking against existing tools like SWISS-MODEL,I-TASSER & Phyre2 demonstrates Prostruc's competitive performance in terms of structural accuracy and job runtime, while its open-source nature encourages community-driven innovation.
Discussion: Prostruc is positioned as a significant advancement in homology modeling, making high-quality protein structure prediction more accessible to the scientific community.

10:30-10:45
Reference Genome and Pangenome Construction of Wild Spotted Hyenas (Crocuta crocuta) from the Kruger National Park
Confirmed Presenter: Ansia van Coller, SAMRC Genomics Platform, South Africa

Format: In Person


Authors List: Show

  • Ansia van Coller, SAMRC Genomics Platform, South Africa
  • Brigitte Glanzmann, SAMRC Genomics Platform & Stellenbosch University Division of Molecular Biology and Human Genetics, South Africa
  • Nadia Carstens, SAMRC Genomics Platform & WITS Department of Human Genetics, South Africa
  • Victoria Cole, SAMRC Genomics Platform, South Africa
  • Craig Kinnear, SAMRC Genomics Platform & Stellenbosch University Division of Molecular Biology and Human Genetics, South Africa
  • Tanya Kerr, Stellenbosch University Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, South Africa
  • Giovanni Ghielmetti, Stellenbosch University Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, South Africa
  • Wynand Goosen, Stellenbosch University Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, South Africa
  • Michele Miller, Stellenbosch University Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, South Africa

Presentation Overview: Show

The spotted hyena (Crocuta crocuta) is a highly social carnivore with complex behavioural and ecological functions, making it an important model for studying genetic diversity, adaptation, and evolution. However, previous draft genomes for C. crocuta have been incomplete and derived from captive individuals, limiting insights into natural genetic variation. Here, we present a high-quality de novo genome assembly and the first draft pangenome of wild spotted hyenas sampled within the Kruger National Park in South Africa.

Using Oxford Nanopore Technologies (ONT) long-read sequencing, we generated a reference genome for a male individual, achieving a total assembly size of 2.39 Gb with a scaffold N50 of 19.6 Mb. Assembly completeness, assessed with BUSCO, revealed 98.5% completeness against the Mammalia_odb10 database and 98.2% against the Carnivora_odb10 database, confirming a high-quality assembly. This assembly is more contiguous and complete than previously published hyena genomes, which had lower N50 values and only 95% completeness. Additionally, our assembly is derived from a wild individual, providing a more ecologically relevant reference compared to those from captive specimens.

To investigate population-level diversity, we sequenced ten additional free-ranging individuals using MGI short-read sequencing at depths of 32X (two individuals) and 10X (eight individuals). This revealed approximately 4 million SNPs and 1 million INDELs across individuals. A draft pangenome was constructed using the Progressive Genome Graph Builder (PGGB), incorporating sequences from all individuals and capturing both conserved genomic regions and variation potentially associated with immune function, behaviour, and environmental adaptation. The draft pangenome comprises ~2.47 Gb, with 35.2 million nodes, 48.4 million edges, and 159,060 paths, providing a foundational resource for future comparative and population genomic studies

Our findings reveal fine-scale population structure and structural variations—including insertions, deletions, and duplications—that are often missed in reference-based approaches. Future efforts will focus on further refining the pangenome, with Hi-C as a potential strategy to enhance chromosome-level scaffolding.

This study represents a significant advancement in the genomic resources available for C. crocuta, offering the first wild-derived reference genome and draft pangenome for the species. These resources contribute to a deeper understanding of spotted hyena genetic diversity and evolution, with implications for conservation genetics, behavioural ecology, and comparative genomics

10:45-11:00
An assessment of the genomic structural variation landscape in Sub-Saharan African populations
Confirmed Presenter: Scott Hazelhurst, University of the Witwatersrand, South Africa

Format: In Person


Authors List: Show

  • Zane Lombard, University of the Witwatersrand, South Africa
  • Scott Hazelhurst, University of the Witwatersrand, South Africa
  • Emma Wiener, University of the Witwatersrand, South Africa
  • Laura Cottino, University of the Witwatersrand, South Africa
  • Gerrit Botha, University of Cape Town, South Africa
  • Oscar Nyangiri, Makerere University, Uganda
  • Harry Noyes, University of Liverpool, United Kingdom
  • Annette MacLeod, University of Glasgow, United Kingdom
  • David Jakubosky, University of California San Diego, United States
  • Clement Adebamowo, University of Maryland, United States
  • Philip Awadalla, Ontario Institute for Cancer Research. University of Toronto., Canada
  • Guida Landoure, University of Sciences, Techniques and Technology of Bamako, Mali
  • Mogomotsi Matshaba, Botswana-Baylor Children’s Clinical Center of Excellence. Baylor College of Medicine, Botswana
  • Enock Matovu, Makerere University, Uganda
  • Michele Ramsay, University of the Witwatersrand, South Africa
  • Gustave Simo, University of Dschang Dschang, Cameroon
  • Martin Simuunza, University of Zambia, Zambia
  • Caroline Tiemessen, National Institute for Communicable Diseases, National Health Laboratory Services. University of the Witwatersrand, South Africa
  • Ambroise Wonkam, Johns Hopkins University, United States
  • Venesa Sahibdeen, University of the Witwatersrand, South Africa
  • Amanda Krause, National Health Laboratory Service. University of the Witwatersrand, South Africa

Presentation Overview: Show

Structural variants (SVs) contribute significantly to human genomic diversity and are implicated in both common and rare diseases. As with most genomic data in the public domain, there is limited representation of SV datasets derived from African populations, creating a critical gap in our understanding of global genomic diversity. To address this underrepresentation, this H3Africa collaboration analysed 1,091 high-coverage African whole genomes, including 546 previously unanalysed genomes for structural variants.

We employed an ensemble approach for detecting SVs in whole genome sequencing data, combining five SV detection tools and then merging datasets jointly called through SURVIVOR. This conservative methodology identified 67,795 structural variants across the genome, with SVs observed to impact on 10,421 gene regions. By SV subtype, our analysis revealed 75% deletions, 19% duplications, 4% insertions and 2% inversions, though these proportions reflect algorithmic detection biases.

There was significant novelty in the data, 10% being obersved for the first time in this cohort of African individuals. Variants were distributed throughout the genome with 42% occurring in introns, 4% in coding regions and 53% in intergenic regions. Size distribution analysis showed that a third of SVs detected are over 800bp in length. We observed a higher proportion of common variants (17% occurring at >10% frequency) than previously reported in non-African populations, potentially representing a distinctive feature of African structural variant patterns.

The potential functional impact of the SVs detected were assessed according to ACMG/AMP classification guidelines. This analysis indicated that the majority of SVs (68%) were classified as variants of uncertain significance. A small portion of SVs were classified as likely pathogenic (0.2%) and only 15 pathogenic variants were identified. Of the latter, the majority (60%) were known African variants that were previously linked to disease. The variants described as pathogenic for the first time (5/15) require further investigation.

This study highlights the technical challenges in SV research, including computational intensity and the limitations of short-read sequencing technologies. Different detection algorithms showed complementary strengths across various SV types and sizes, reinforcing the value of ensemble approaches despite their computational demands. Our work provides a valuable resource for population genetics and health-related research, addressing the critical need for high-quality baseline data on structural variant diversity in African populations. This dataset will enhance interpretation of potentially pathogenic variants and improve our understanding of genetic diseases in understudied populations, contributing to more equitable genomic medicine.

11:00-11:15
SeqWord Motif Mapper: Unlocking Bacterial Epigenetic Insights
Confirmed Presenter: Oleg Reva, Centre for Bioinformatics and Computational Biology; University of Pretoria, South Africa

Format: In Person


Authors List: Show

  • Christophe Lefebvre, Centre for Bioinformatics and Computational Biology; University of Pretoria, South Africa
  • Rian Pierneef, Centre for Bioinformatics and Computational Biology; University of Pretoria, South Africa
  • Oleg Reva, Centre for Bioinformatics and Computational Biology; University of Pretoria, South Africa

Presentation Overview: Show

The SeqWord Motif Mapper (SWMM) is a newly developed tool designed to streamline the identification and visualization of complex patterns of epigenetic modifications in bacterial genomes using data obtained through single-molecule real-time (SMRT) sequencing technologies. Bacterial epigenetics, particularly through methylation, plays a crucial role in regulating processes such as gene expression, chromosome replication, symbiont-host interactions, and defense mechanisms against phages. However, there is a lack of computational tools for the detection, comparison, and visualization of patterns of epigenetically modified bases in bacterial genomes. SWMM addresses these challenges by providing a robust statistical framework and interactive visualization capabilities. Implemented in Python 3, the software utilizes input data from standard SMRT analysis pipelines, including GFF annotation files and reference genomes in GenBank format. The tool integrates advanced genomic analyses, such as motif distribution mapping and statistical assessment of the distribution of modified bases and motifs across coding, non-coding, and promoter regions; core and horizontally acquired regions; chromosomes and plasmids; leading and lagging replichores; and regions with alternative base composition. Its visualization outputs include circular and dot-plot representations, accompanied by statistical validation in both graphical and text formats. Applications of the tool have already yielded significant insights into epigenetic regulation mechanisms within various bacterial species, including motifs linked to antibiotic resistance and stress response [1-5].
SWMM can be deployed both locally and as a web application, making it accessible to users with varying levels of bioinformatics expertise. By offering a user-friendly interface and compatibility with multiple operating systems, it enables scalable and reproducible research. The program is freely available on GitHub (https://github.com/chrilef/BactEpiGenPro) and can also be accessed as a web application at http://begp.bi.up.ac.za. This tool represents a critical advancement in bacterial epigenetics, with promising implications for understanding bacterial adaptation, pathogenicity, and gene regulation in both clinical and environmental contexts.

References:
1. Reva ON, La Cono V, Crisafi F, et al. Interplay of intracellular and trans-cellular DNA methylation in natural archaeal consortia. Environ Microbiol Rep. 2024;16(2):e13258. doi: 10.1111/1758-2229.13258.
2. Korotetskiy IS, Shilov SV, Kuznetsova T, et al. Analysis of Whole-Genome Sequences of Pathogenic Gram-Positive and Gram-Negative Isolates from the Same Hospital Environment to Investigate Common Evolutionary Trends Associated with Horizontal Gene Exchange, Mutations and DNA Methylation Patterning. Microorganisms. 2023;11(2):323. doi: 10.3390/microorganisms11020323..
3. Korotetskiy IS, Jumagaziyeva AB, Shilov SV, et al. Transcriptomics and methylomics study on the effect of iodine-containing drug FS-1 on Escherichia coli ATCC BAA-196. Future Microbiol. 2021;16:1063-1085. doi: 10.2217/fmb-2020-0184.
4. Reva ON, Korotetskiy IS, Joubert M, et al. The Effect of Iodine-Containing Nano-Micelles, FS-1, on Antibiotic Resistance, Gene Expression and Epigenetic Modifications in the Genome of Multidrug Resistant MRSA Strain Staphylococcus aureus ATCC BAA-39. Front Microbiol. 2020;11:581660. doi: 10.3389/fmicb.2020.581660.
5. Reva ON, Swanevelder DZH, Mwita LA, et al. Genetic, Epigenetic and Phenotypic Diversity of Four Bacillus velezensis Strains Used for Plant Protection or as Probiotics. Front Microbiol. 2019;10:2610. doi: 10.3389/fmicb.2019.02610.

11:15-11:30
Making models work: Matching Human in vitro models to deliver precision medicine
Confirmed Presenter: Winston Hide, Harvard Medical School, United States

Format: In Person


Authors List: Show

  • Pourya Naderi, Harvard Medical School, United States
  • Sang Su Kwak, Harvard Medical School, United States
  • Mehdi Jorfi, Harvard Medical School, United States
  • Weiming Xia, Boston University, United States
  • Rudolph Tanzi, Harvard Medical School, United States
  • Doo Kim, Harvard Medical School, United States
  • Winston Hide, Harvard Medical School, United States

Presentation Overview: Show

Objectives
Successful Alzheimer’s disease (AD) interventions in preclinical models often fail in human trials. While preclinical models offer insights into AD mechanisms, there is no systematic approach to verify whether preclinical target mechanisms retain therapeutic relevance in humans. Bridging this preclinical-to-clinical translational gap accelerates therapeutic development by precisely addressing whether failures are due to testing ineffective drugs, targeting the wrong mechanism, or relying on unrepresentative models.

Methods
We have developed a novel bioinformatics platform, named Integrative Pathway Activity Analysis (IPAA), that maps pathway activity from omics data. IPAA precisely captures the degree to which disease functions in models match those in human brains and prioritizes targetable pathways in the most representative models. We assessed the mechanistic similarities between the transcriptomes of three AD brain regions and multiple 2D/3D human AD cellular models to define targetable functions. We performed phosphoproteomics analysis and compared pathway activity changes with transcriptomic findings. Top pathways were pharmacologically evaluated for their impact on AD pathology in 3D models.

Results
IPAA found high correlation of pathway dysregulation between brain regions (r=0.84, temporal cortex and parahippocampal gyrus), suggesting IPAA’s ability to detect conserved AD functions. IPAA found 83 dysregulated transcriptomic pathways shared between AD brains and a 3D model with a high Amyloid-beta (Aβ) 42/40 ratio. Shared dysregulated pathways included p38 MAPK, YAP1/TAZ, E-cadherin, CDC20, and APC/C, which were confirmed at the protein level. Elevated active p38 MAPK was observed in the 3D models, human AD brains, and 5XFAD mice, localized to presynaptic dystrophic neurites. Phosphoproteomic analysis confirmed an increase in p38 MAPK substrate phosphorylation driven by Aβ42 accumulation. Targeting p38 MAPK with a clinical p38α/β MAPK inhibitor (Losmapimod)– which has not been tested for AD– significantly reduced Aβ-induced tau, Aβ accumulation, neuronal loss, and microglial activation in 3D models and human microglia. We further found that MAPK-activated protein kinase 2 (MK2) plays crucial roles in mediating Aβ-induced tau pathology.

Conclusions
IPAA enables rapid preclinical assessment of target pathways with confidence for impact on AD pathology prior to clinical trials. Our findings highlight the critical role of protein kinase networks, particularly the p38 MAPK-MK2 axis, in driving AD pathology in humans.

11:30-11:45
Afrigen-D Imputation Service: A Comprehensive Platform for African-Specific Genotype Imputation and Polygenic Risk Score Calculation
Confirmed Presenter: Mamana Mbiyavanga, Afrigen-D, University of Cape Town, South Africa

Format: In Person


Authors List: Show

  • Mamana Mbiyavanga, Afrigen-D, University of Cape Town, South Africa
  • Nicola Mulder, Afrigen-D, University of Cape Town, South Africa
  • Ayton Meintjes, Afrigen-D, University of Cape Town, South Africa

Presentation Overview: Show

Over the past decade, the Human Heredity and Health in Africa (H3Africa) initiative has driven the development of genomic research for human health in Africa through its bioinformatics network (H3ABioNet). Through collaborative efforts, H3ABioNet has established robust frameworks for data processing, quality control, and imputation pipelines specifically optimized for African populations. An African genotype imputation service with a comprehensive reference panel is indispensable for accurate genetic analyses tailored to the continent's diverse genetic landscape.

We developed an imputation platform (Afrigen-D Imputation Service, https://impute.afrigen-d.org) that leverages the high-quality H3Africa reference panel, comprising 8,894 high-coverage haplotypes from 48 populations worldwide, with 50% of African ancestry. The service implements established guidelines and workflows while addressing data privacy challenges by maintaining genetic data within continental boundaries. It utilizes the validated software stack and workflow architecture of the Michigan Imputation Server and TopMed Imputation Service, ensuring methodological consistency and standardization of genetic imputation procedures. This enables the combination of genotype data after imputation with multiple reference panels. Additionally, the platform integrates an HLA reference panel and incorporates polygenic score (PGS) calculation capabilities, enabling automated standardized computation of genetic risk scores from imputed genotypes.

The Afrigen-D Imputation Service facilitates efficient genotype imputation through a user-friendly interface, requiring minimal computational expertise and resources. The platform provides comprehensive preprocessing utilities for automated quality control and data preparation, adhering to established bioinformatics standards. Integration with population-specific reference panels and polygenic scoring capabilities provides a robust foundation for investigating complex diseases and genetic traits in African populations. Through ongoing development and community collaboration, this resource contributes significantly to advancing our understanding of African genetic diversity and its implications for health outcomes.

11:45-12:00
Shared Genetic Architecture between Suicidality and Subcortical Brain Volume: A Genome-Wide Association Study
Format: In person


Authors List: Show

  • Joel Defo, University of Cape Town, Cameroon
  • Raj Ramesar, University of Cape Town, South Africa

Presentation Overview: Show

In recent years, suicidality has become a serious public health issue. Neuroimaging studies have suggested pathological and etiological influences based on brain volumetric abnormalities in suicidal individuals as well as upon post-mortem brain tissue samples of suicide victims. There have been advances in understanding the genetic underpinnings of suicidality, however, the shared genetic configuration between suicidality and subcortical brain volume is poorly understood. Based on Genome-Wide Association Studies, we aim to explore the shared genetic architecture between suicidality and subcortical brain volume. We obtained summary statistics of suicidal behaviour, notably Suicide Attempts (n = 50,264), Ever-Self Harmed (n = 117,733), and Thoughts of Life Not Worth Living (n = 117,291) from the UK Biobank as well as Suicide or other intentional self-harm (n = 342,499) from the FinnGen Biobank. Additionally, summary-level data of seven subcortical brain volumes and the Intracranial Volume were sourced from the ENIGMA2 study. Linkage Disequilibrium score regression was deployed to ascertain the genetic relationship between suicidality and subcortical brain volume. Genomic Structural Equation Modelling analyses were deployed to identify common factor patterns among them. Our Genomic Structural Equation Modelling analyses outcome led to a series of GWAS meta-analyses at variant, gene/sub-network levels. Our results detected a nominal genetic correlation between the Suicide cohort from FinnGen and Intracranial Volume, as well as a common genetic factor divided into two categories encompassing Suicide Attempt, Ever-Self Harmed, and Thoughts of Life Not Worth Living from the UK Biobank on one side, and Suicide from Finngen, Intracranial Volume and the subcortical brain volume, phenotypes on the other side. Network, pathway and Gene Ontology analysis of the joint sets of disorders uncovered enriched pathway/biological processes connected to the blood-brain barrier/permeability.
Furthermore, our findings indicate that the presence and severity of suicidality are associated with an inflammatory signature detectable in both blood and brain tissues. This suggests a biological continuity underlying suicidality, potentially pointing to a common heritability. These results support the role of brain and peripheral blood inflammation in suicide risk. These findings hold promise for developing targeted interventions and personalized treatment strategies to mitigate the risk of suicidality in vulnerable individuals.

12:00-12:15
First Continental African Genome-Wide Association Study Identifies Novel Genetic Loci Associated with Blood Urea Nitrogen and Kidney Function
Confirmed Presenter: Gloria Kirabo, MAKERERE UNIVERSITY, Uganda

Format: In Person


Authors List: Show

  • Gloria Kirabo, MAKERERE UNIVERSITY, Uganda
  • Opeyemi Soremekun, HELMONTZ-MUNICH, Germany
  • Segun Fatumo, QUEEN MARY UNIVERSITY OF LONDON, United Kingdom

Presentation Overview: Show

Chronic kidney disease (CKD) is a critical global health concern with high mortality rates and severe complications, particularly in Africa, yet the underlying molecular mechanisms remain poorly understood. We conducted a Genome-Wide Association Study (GWAS) using Blood Urea Nitrogen (BUN) levels, a key biomarker of kidney function, in 5,910 Ugandan participants to identify single nucleotide polymorphisms (SNPs) associated with CKD risk. Our analysis identified 13 SNPs reaching a suggestive significance threshold (p < 5×10⁻⁷), refined to five independent lead SNPs through LD clumping. Notably, rs73309776 in the GALNT6 gene suggests potential pathways linking breast cancer and kidney function, while rs145326389, an intronic variant in LOC105374218, is associated with traits related to the RAAS pathway and blood pressure regulation. rs142038911 is a synonymous variant in TRIM11, TRIM17, and LOC124904537, and may play a role in regulating serum creatinine and protein binding, which are crucial in kidney disease. Bayesian fine mapping highlighted rs1286795408 on chromosome 7 as a strong candidate with a posterior probability of 84% with a 99% credible set, warranting further investigation. Functional annotations using MAGMA and GTEx revealed gene expression in the pituitary gland and kidney medulla, though these did not reach statistical significance. Replication in European, East Asian, and Latin American populations validated associations with genes such as HOXD11, BCAS3, and TFCP2L1, which are involved in kidney development and function, emphasizing shared genetic factors across ancestries. Rigorous quality control measures, including filtering for Hardy-Weinberg equilibrium, sex discrepancies, and minor allele frequency, ensured robust results. This study, the first GWAS of BUN in a continental African population, underscores the importance of inclusive genetic research and contributes to understanding CKD's genetic underpinnings, paving the way for precision medicine and potential targeted treatments for underrepresented populations.

12:15-12:30
Menopause-related changes in the gut microbiome and their association with cardiometabolic diseases in women from four sub-Saharan African countries
Confirmed Presenter: Phehello Chauke, Sydney Brenner Institute for Molecular Biosciences, University of the Witwatersrand, South Africa

Format: In Person


Authors List: Show

  • Phehello Chauke, Sydney Brenner Institute for Molecular Biosciences, University of the Witwatersrand, South Africa
  • Luicer Olubayo, Sydney Brenner Institute for Molecular Biosciences, University of the Witwatersrand, South Africa
  • Dylan Maghini, Sydney Brenner Institute for Molecular Biosciences and Stanford University Department of Hematology, United States
  • Scott Hazelhurst, Sydney Brenner Institute for Molecular Biosciences, University of the Witwatersrand, South Africa

Presentation Overview: Show

Background: The menopausal transition has been associated with changes in the gut microbiome (GM). This compositional shift is likely related to changing hormone levels during menopause: the GM harbours bacterial taxa that can deconjugate estrogen and other sex hormones, allowing reabsorption of sex hormones. Estrogen also maintains gut homeostasis by influencing the intestinal barrier function and microbial composition. The altered GM composition accompanying the decreased hormone levels may be partially responsible for the onset of menopause-related health conditions, including cardiometabolic diseases (CMDs). However, it is unclear which microbiome features are associated with menopause-related health outcomes particularly in the context of African populations. This study is the first investigation of menopause-related changes in the GM and their association with CMDs in African women.

Aim and objectives: This study investigated compositional differences in the GM between pre- and postmenopausal women in sub-Saharan Africa and their association with CMDs by characterising alpha and beta microbial diversity patterns, identifying differentially abundant bacterial taxa between menopausal groups and determining how these microbial taxa may be linked to CMDs.

Methods: The cross-sectional analysis included 1,801 women from Burkina Faso, Ghana, Kenya and South Africa that were selected from the Africa-Wits INDEPTH partnership for Genomic studies (AWI-Gen) wave 2. Shotgun metagenomic sequencing was performed on DNA extracted from faecal samples using Illumina technology. The metagenomic reads underwent quality control processing and alignment, followed by taxonomic profiling. Microbial diversity and composition were assessed using Inverse Simpson index and Bray-Curtis dissimilarity between menopausal groups. Linear discriminant analysis Effect Size was used to identify differentially abundant taxa between menopausal groups.

Results: Our analysis revealed that CMD status emerged as a stronger determinant of gut microbial diversity than menopausal status. Women with CMDs showed significantly lower microbial diversity regardless of menopausal status. Geographic location also significantly influenced GM composition, with substantial variations across study sites. Taxonomically, premenopausal women were enriched with beneficial short-chain fatty acid-producing bacteria, including Faecalibacterium prausnitzii, Bacteroides fragilis, and Prevotella, while postmenopausal women exhibited both beneficial (Ruminococcus champanellensis) and potentially harmful species (Collinsella bouchesdurhonensis).

Conclusions: Our findings contrast with prior research in non-African populations by demonstrating that rather than menopause directly altering the GM and subsequently increasing CMD risk, the higher prevalence of CMDs in postmenopausal women may be driving the observed microbiome changes. Geographic location emerged as another significant determinant, highlighting the importance of regional factors in shaping microbial communities. While we observed distinct taxonomic differences between pre- and postmenopausal women, these patterns varied by location and included both beneficial and potentially harmful bacteria in postmenopausal women. This study underscores the complex interplay between hormonal status, geographic factors, and metabolic health, emphasizing the need for population-specific approaches to women's health research and clinical interventions targeting the gut-hormone-metabolism axis in African women.

12:30-12:45
The African Pangenome Reference Graph Project
Confirmed Presenter: Mohammed Farahat, Computational Biology Division, IDM, University of Cape Town, South Africa, South Africa

Format: In person


Authors List: Show

  • Mohammed Farahat, Computational Biology Division, IDM, University of Cape Town, South Africa, South Africa
  • Shaun Aron, Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, South Africa, South Africa
  • Nicola Mulder, Computational Biology Division, IDM, University of Cape Town, South Africa, South Africa

Presentation Overview: Show

The conventional human reference genome, though essential for variant calling, lacks the genetic diversity needed to represent global populations, particularly African populations with high genetic variability. Use of a linear reference introduces both reference and allele bias, obscuring population-specific insights. To address this, we are constructing an African Pangenome Reference Graph, enabled by advancements in long-read sequencing and graph-based reference models.
Our project leverages ~60X PacBio HiFi sequencing data from 27 individuals across Burkina Faso, Kenya, and South Africa, capturing a significant proportion of genetic diversity of Africa. This data allows us to build a pangenome graph that more accurately represents African genomes. Unlike traditional linear references, the pangenome graph integrates diverse sequence paths, improving variant calling for both single nucleotide and structural variants. The goal is to improve exploration of African-specific genetic variation and enhance variant discovery in related populations.
To achieve this, we developed workflows for generating high-quality de novo assemblies and pangenome graphs using both reference-free (PanGenome Graph Builder, PGGB) and the reference-derived (Minigraph-Cactus) algorithms. A third workflow is under development to extract and analyze African variation within specific regions and call variants using the graph as a reference. Preliminary analysis based on a 30x coverage dataset has yielded high-quality assemblies with contig N50s between 31-49 Mb. While the recent dataset is 60x coverage.
Supported by H3ABioNet and the eLwazi Consortium, the African pangenome graph provides a valuable resource advancing population-specific genomics. Initial applications include comparing variants called using the African graph versus linear and global references, investigating complex regions, and benchmarking graph-based variant calling.
As part of our efforts to advance pangenome research and analysis, we hosted the Human Pangenome Bring Your Own Data (BYOD) Workshop in October 2024 in collaboration with the eLwazi Open Data Science Platform. During this hands-on workshop, participants explored methods for variant calling, graph-based analysis, and personalized genome graph creation, comparing results between linear reference and pangenome-based approaches.
A second phase will generate assemblies from seven samples from the Democratic Republic of Congo, expanding the resource. This collaborative initiative is a crucial step toward a more inclusive genomic reference, enabling equitable genomic studies across African and global populations.

12:45-13:00
African Bioinformatics Institute Engagement Session
Format: In person


Authors List: Show

  • African Institute

Presentation Overview: Show

In this session we will provide a brief overview of the ABI and current status. This will be followed by Q&A and open discussion

13:30-14:30
Poster Presentations
Format: In person


Authors List: Show

14:30-14:45
Computational algorithms to identify mechanism-centric biomarkers of treatment response in cancer
Confirmed Presenter: Antonina Mitrofanova, Rutgers University, Rutgers Health, SHP, United States

Format: Live Stream


Authors List: Show

  • Sukanya Panja, Rutgers University, Rutgers Health, SHP, United States
  • Mihai Ioan Truica, Northwestern University Feinberg School of Medicine
  • Christina Yu, Rutgers University, Rutgers Health, SHP
  • Vamshi Saggurthi, Rutgers University, Rutgers Health, SHP
  • Michael W. Craige, Rutgers University, Rutgers Health, SHP
  • Katie Whitehead, Rutgers University, Rutgers Health, SHP
  • Mayra Tuiche, Rutgers University, Rutgers Health, SHP
  • Aymen Al-Saadi, Rutgers School of Engineering
  • Riddhi Vyas, Rutgers University, Rutgers Health, SHP
  • Shridar Ganesan, Rutgers Cancer Institute of New Jersey
  • Suril Gohel, Rutgers University, Rutgers Health, SHP
  • Frederick Coffman, Rutgers University, Rutgers Health, SHP
  • James S. Parrott, Rutgers University, Rutgers Health, SHP
  • Songhua Quan, Northwestern University Feinberg School of Medicine
  • Shantenu Jha, Rutgers School of Engineering
  • Isaac Kim, Rutgers Cancer Institute of New Jersey
  • Edward Schaeffer, Northwestern University Feinberg School of Medicine
  • Vishal Kothari, Northwestern University Feinberg School of Medicine
  • Sarki A. Abdulkadir, Northwestern University Feinberg School of Medicine
  • Antonina Mitrofanova, Rutgers University, Rutgers Health, SHP, United States

Presentation Overview: Show

We have developed a novel computational algorithm TR-2-PATH that reconstructs first-of-its kind mechanism-centric regulatory network, which connects molecular pathways to their upstream transcriptional regulatory programs, and prioritizes them as markers of therapeutic resistance in cancer. Such network offers a novel way to identify biomarkers that are mechanisms-centric, rather than based on individual genes or alterations - a new way to identify functional interactions and valuable therapeutic targets. As a proof of concept, we have applied TR-2-PATH to metastatic castration-resistant prostate cancer (mCRPC). Network mining step addressed a knowledge gap of multi-collinearity among upstream transcriptional regulators (TRs) and identified TR groups that collaborate to regulate downstream pathways. Interrogating this network with signatures of resistance to Enzalutamide, a second-generation androgen-deprivation drug commonly administered to mCRPC, identified a collaboration between NME2 TR program and MYC molecular pathways as a biomarker of primary resistance to Enzalutamide. In vitro and in vivo experimental validation confirmed cooperation of these mechanisms and demonstrated that their joined therapeutic targeting is not only effective to prevent resistance to Enzalutamide, but also re-sensitizes Enzalutamide resistant tumors in vivo, allowing Enzalutamide to work longer. We propose to use MYC and NME2 as markers to identify patients at risk of Enzalutamide resistance and as effective therapeutic targets for patients that failed Enzalutamide. Our novel algorithm is generalizable and could be applied to study a multitude of biologically and clinically important questions, including (but not limited to) therapeutic resistance, metastatic progression, tumor heterogeneity and plasticity across cancer types and in other diseases. TR-2-PATH was published in Nature Communications in 2024. We are now expanding this algorithm to include regulatory relationships with long non-coding RNAs.

14:45-15:00
Acquisition and persistence of Extended-spectrum beta-lactamase (ESBL) and Carbapenem resistant (CRE) Escherichia coli carriage in hospitalized Kenyan children
Confirmed Presenter: Caroline Tigoi, KEMRI/Wellcome Trust, Kenya

Format: In person


Authors List: Show

  • Caroline Tigoi, KEMRI/Wellcome Trust, Kenya
  • James Berkley, KEMRI/Wellcome Trust, Kenya
  • Nicole Stoesser, Nuffield Department of Medicine, Oxford University, Oxford, United Kingdom, United Kingdom

Presentation Overview: Show

Introduction: Antimicrobial Resistance (AMR) and lack of new drugs poses a serious public health threat. Carriage of AMR may be important drivers of inpatient and post-discharge mortality risk in Low Middle-Income countries (LMICs) despite following guidelines. ESBL and CRE are important as proxies for broad multi-class resistance spread on mobile genetic elements that promote horizontal gene transfer intra- and inter-species in hospitals and communities. We hypothesise that intestinal colonisation and carriage is a possible means of transmission of AMR and a precursor to invasive disease.
Methods: This was a prospective cohort study enrolling children admitted to 3 Kenyan hospitals followed for 6 months after discharge and well community controls. Detailed demographic, clinical, and antimicrobial use data were collected along with blood and rectal swab culture. We carried out short and long read whole genome sequencing of 486 E.coli isolates to detect AMR and virulence genes and assess genetic relatedness at gene, mobile genetic element, and strain level through core genome phylogeny.
Results: Of the 804 inpatient participants, 291 (36%) carried ESBL-E at admission, 447/630 (71 %) at discharge, 199/455 (44%) at day 45, 152/457 (33%) at day 90 and 120/452 (27%) at 180 days post-discharge from hospital. The baseline ESBL-E carriage prevalence among healthy community participants was 65/404 (16%). Acquisition of ESBL carriage in hospital was associated with prior hospitalization, prior use of antibiotics, prolonged stay in hospital and antimicrobial classes use; and with outcomes of post-discharge death or readmission after adjusting for potential confounders. CPE of up to 26 (6%) and 4 (8%) during readmission were seen in Nairobi site. E. coli isolates were diverse across pathotypes with 12 of the 14 E. coli phylogroups identified globally present including those associated with invasive disease; D3, B1, B2 and D1. Sequence types linked to invasive disease like ST 131, ST 410 and ST 38 were also identified and concordance in ST types among invasive and carriage isolates seen. Several AMR genes cutting across all classes of antibiotics and virulence genes were identified with the leading ESBL gene being blaCTX-M-15 and CRE gene blaNDM-5.
Conclusions: There was significant AMR acquisition before and during hospitalisation that took more than six months to return to community level. Carriage and invasive ST types were similar. Further genomic studies and antimicrobial trials to monitor changes on the whole microbiome and calculation of invasiveness of the ST types and phylogroups should be conducted for infection control.

15:00-15:15
Leveraging Data Balancing and Chemical Encoding Strategies for Robust AI-Based Drug Discovery Pipeline
Confirmed Presenter: Ons Masmoudi, Laboratory of Molecular Epidemiology and Experimental Pathology, Institut Pasteur de Tunis, Tunisia

Format: In Person


Authors List: Show

  • Ons Masmoudi, Laboratory of Molecular Epidemiology and Experimental Pathology, Institut Pasteur de Tunis, Tunisia
  • Afef Abdelkrim, Research Laboratory Smart Electricity & ICT, National Engineering School of Carthage, University of Carthage, Tunisia
  • Emna Harigua-Souiai, Laboratory of Molecular Epidemiology and Experimental Pathology, Institut Pasteur de Tunis, Tunisia

Presentation Overview: Show

Artificial intelligence (AI) has emerged as a revolutionary approach in the field of drug discovery, with the increased availability of large datasets for training AI models to predict the properties and potential biological activities of chemical compounds. The AI-driven framework essentially consists of three main components: the dataset, the combination encoding system-model, and the prediction task. The present work introduces an AI-based Ligand-Based Drug Design approach focused on optimizing the different components of such a pipeline to provide robust predictive tools of chemical compound activities against various diseases.
In this study, we investigated the impact of class imbalance on the performance of various classifiers in predicting the biological activity of chemical compounds. We trained two machine learning models, four graph-based models, and two pre-trained models on highly imbalanced bioassay datasets. To address the class imbalance, we first employed two oversampling methods namely Random Oversampling (ROS) and SMOTE and two undersampling methods namely Random Undersampling (RUS) and NearMiss. Additionally, we proposed a novel strategy called K-Ratio Undersampling. Through this approach, based on RUS, we created three specific ratios (1:50, 1:25, and 1:10) for each dataset. The impact of these ratios on model performances was evaluated using F1-scores. To ensure the robustness of our models, we conducted an external validation on unseen data. As a last step, we performed an analysis of each dataset content to better understand the factors behind the models' misclassifications.
Across all simulations, the comparison of the classical resampling techniques revealed that RUS outperforms ROS across various evaluation metrics, supporting our hypothesis that reducing majority class instances through undersampling improves model performance. Through the investigation of the impact of the various imbalance ratios on the ML and DL models, we demonstrated that moderate imbalance ratios of (1:25 - 1:10) significantly enhanced the models performances, achieving higher F1-scores compared to previous results. Among the evaluated models, the top-performing models for each dataset were optimized through hyperparameter tuning.
The external validation step confirmed that the 10-RUS configuration yielded the best configuration in achieving a good balance between true positive and false positive rates. Although no particular model showed optimal performances on all datasets. Through the previous results, the HIV dataset was particularly challenging. The analysis of the similarity between active and inactive compounds through a chemical space network showed that high similarity between both classes reduced predictive accuracy.
Our findings highlighted the importance of optimizing both the chemical data content and the class imbalance to improve the model performances in predicting the biological activity of chemical compounds.

15:15-15:30
Autophosphorylation and Ca2+-binding alter the conformational landscape of Plasmodium falciparum Ca2+ dependent protein kinase 1, impacting stability and ligand binding: contextualising PTM using computational modelling
Confirmed Presenter: Charmaine Chido Matimba, University of the Witwatersrand, Zimbabwe

Format: In Person


Authors List: Show

  • Charmaine Chido Matimba, University of the Witwatersrand, Zimbabwe
  • Ikechukwu Achilonu, University of the Witwatersrand, South Africa

Presentation Overview: Show

Malaria remains a significant global public health issue, causing over 600,000 deaths annually. One promising research direction is disrupting crucial pathways, such as the cell signalling mechanisms enabling malaria parasites to grow and survive. By focusing on these pathways, scientists aim to develop a new generation of antimalarial drugs capable of effectively addressing drug resistance and improving treatment options for vulnerable populations worldwide. Post-translational modifications (PTMs), such as phosphorylation, can significantly change protein structures. This poses challenges for computational approaches like computer-aided drug design (CADD), where even slight structural changes can affect ligand binding and functionality. This study investigates how autophosphorylation and Ca2+ binding influence the conformational dynamics of PfCDPK1 using computational modelling, mainly through molecular dynamics simulations (MDS) and high-throughput virtual screening (HTVS). By analysing the changes in protein dynamics, the research may reveal important insights into the druggability of protein kinases, facilitating the design of more effective drugs. The results demonstrated notable variations in the dynamic behaviour of the four systems with or without the ligand (BKI-1294) based on metrics such as Cα-RMSD, Cα-RMSF, radius of gyration (RoG), and ligand properties. The findings suggest that Ca2+ binding alone results in structural changes in the conformity of the protein over time and Ca2+ binding and autophosphorylation enhances structural stability. While phosphorylation alone leads to significant structural deviations, with statistically significant differences observed amongst all systems. Phosphorylation, particularly autophosphorylation, and Ca2+ binding to CDPK1 may reshape the conformational landscape of the enzyme. Such structural changes could influence its functionality, including substrate binding and allosteric inhibition. Ultimately, this study elucidated how these modifications affect the structure and function of PfCDPK1, providing insights into the molecular mechanisms that regulate enzyme activity and calcium homeostasis in Plasmodium falciparum.

15:30-15:45
Automated Molecular Docking (AMD) Platform for Ligand-Protein Interaction Studies (LPIS) in Cancers
Format: In person


Authors List: Show

  • Rihab Mahjoub, Pasteur Institute of Tunis, Tunisia. /Institut de Biotechnologie de Sidi Thabet, Université de la Manouba,Tunisia., Tunisia
  • Ghada Mahjoub, Pasteur Institute of Tunis, Tunisia. /Institut de Biotechnologie de Sidi Thabet, Université de la Manouba,Tunisia., Tunisia
  • Oussema Khamessi, Pasteur Institute of Tunis, Tunisia. /Institut de Biotechnologie de Sidi Thabet, Université de la Manouba,Tunisia., Tunisia
  • Kais Ghedira, Pasteur Institute of Tunis, Tunisia, Tunisia

Presentation Overview: Show

Cancer diseases pose significant challenges due to their complex molecular mechanisms and resistance to conventional therapies. Bioinformatics techniques such as molecular docking and molecular dynamics offer an unprecedented opportunity to accelerate the identification of cancer therapeutic targets and drug design. By developing explainable and integrated platforms, it is possible to meet the growing innovation needs in the field of oncology while reducing the time and costs associated with the development of new treatments. Herein, we present the development of an integrated and automated molecular docking platform designed to study ligand-protein interactions in cancers. The platform leverages bioinformatics to streamline the drug discovery process, utilizing Python scripts to automate the preparation of protein and ligand structures, performing docking using the Vina library, and providing visualizations of the results. The workflow includes the creation of OncoligandDB, a structured database that centralizes information on more than 100 anticancer ligands, organized in tables classified by cancer type. OncoligandDB includes details such as the commercial name of the product, year of production, SMILES structure, and direct downloadable links to PDB formats, facilitating the docking process. The platform also features DockSmart, an intuitive web interface that integrates a direct link to OncoligandDB, enabling users to easily access and utilize the PDB formats for docking simulations. DockSmart generates affinity scores and RMSD values for analysis, offering a comprehensive tool for researchers. The results provided by DockSmart were validated by comparing docking outcomes with the established tool SeamDock, demonstrating comparable or superior performance in terms of docking scores and computational efficiency.
DockSmart modular design, conviviality, and automation significantly reduce manual intervention, improve reproducibility, and accelerate the discovery of potential therapeutic candidates. Looking ahead, future perspectives include the integration of molecular dynamics simulations and advanced AI tools, such as Graph Neural Networks (GNN), a deep learning algorithm designed to calculate and identify interaction points between protein and ligand graphs. DockSmart in its current version highlights the potential of bioinformatics to advance cancer research and drug development, offering a powerful tool for researchers in the field of oncology and bioinformatics.

15:45-16:00
SAHMI and PRISM identify tumor microbiome and link it to cellular programs and immunity cancer
Format: In person


Authors List: Show

  • Subhajyoti De, Rutgers Cancer Institute, Rutgers University, United States
  • Bassel Ghaddar, Rutgers Cancer Institute, Rutgers University, United States
  • Martin Blaser, Rutgers University, United States

Presentation Overview: Show

Microorganisms are detected in multiple cancer types, including in putatively sterile organs, but the contexts in which they influence oncogenesis or anti-tumor responses in humans remain unclear. Despite increasing research into the human microbiome, however, a number of basic questions remain unanswered, including questions about its size, distribution, and presence in various human tissues, exemplified by recent controversies around the fetal microbiome and cancer microbiome. We developed single-cell analysis of host-microbiome interactions (SAHMI), a computational pipeline to recover and denoise microbial signals from single-cell sequencing of host tissues. More recently, we developed a companion framework PRISM, a computational approach for precise microorganism identification and decontamination from low-biomass sequencing data. Using these resources, we identified rich microbiomes in gastrointestinal tract tumors and identify bacteria in a subset of pancreatic tumors that are associated with altered glycoproteomes, more extensive smoking histories, and higher tumor recurrence risk. We find relatively sparse microbes in other cancer types that grow in more sterile environment, which we demonstrate may reflect differing sequencing parameters. Overall, these resources present applicable guidelines that do not replace gold-standard controls, but it enables higher-confidence analyses and reveals tumor-associated microorganisms with potential molecular and clinical significance.

References:
1. Ghaddar B et al. Tumor microbiome links cellular programs and immunity in pancreatic cancer. Cancer Cell. 2022 Oct 10;40(10):1240-1253.e5. PMID: 36220074.
2. Ghaddar B, Blaser MJ, De S. Denoising sparse microbial signals from single-cell sequencing of mammalian host tissues. Nat Comput Sci. 2023 Sep;3(9):741-747. PMID: 37946872.
3. Ghaddar B, Blaser MJ, De S. Revisiting the cancer microbiome using PRISM. (submitted)

16:00-16:15
The Gene catalogue and functional analysis of the gut microbiome of lions in Etosha National Park
Confirmed Presenter: Carl Belger, School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg, South Africa, South Africa

Format: In Person


Authors List: Show

  • Carl Belger, School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg, South Africa, South Africa
  • Robyn Hetem, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand, New Zealand
  • Scott Hazelhurst, Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa, South Africa
  • Dylan Maghini, Department of Medicine (Hematology), Stanford University, Stanford, CA, USA, United States of America
  • Jakob Wirbel, Department of Medicine (Hematology), Stanford University, Stanford, CA, USA, South Africa
  • Ansia van Coller, South African Medical Research Council SAMRC, South Africa
  • Nadia Carstens, South African Medical Research Council SAMRC, South Africa

Presentation Overview: Show

The gut microbiome is important for the health of all animals. Very little is known about the microbiome of large carnivores, and lion in particular. This study presents the first comprehensive microbiome classification of African lions (Panthera leo melanochaita). Our study used shotgun metagenomics data (Illumina short-reads) of DNA extracted from faecal samples from 20 lions from Etosha National Park, Namibia. Three of the lions were sampled twice, in different seasons. In addition, 10 of the samples were sent for long-read sequencing (Oxford Nanopore). Our findings illuminate potential connections between gut microbiome composition and social structure, diet, and pack-hunting in carnivores, with potential implications for wildlife conservation and veterinary medicine.

We discovered distinct microbial profiles in the African lion gut, dominated by the genera Bacteroides and Phocaeicola. In particular, we note similar abundances of Bacteroides in other pack-hunting carnivores such as black-backed jackals, wolves and dholes. Solitary hunters like cheetahs on the other hand, have a relatively low abundance of Bacteroides. The high abundance of this genera is possibly caused by the high interaction (and therefore transmission of bacteria) between pack-hunters compared to solitary carnivores. Alternatively, Bacteroides abundance could be attributed to differences in diet: solitary hunters consume the most nutritious portions of prey immediately, while pack hunters usually distribute resources based on social hierarchies.

Moreover, links were drawn between pregnancy and inflammation to the gut microbiome of female lions via the genus Fusobacterium. This genus is seen in high abundance in post-natal pigs and is also linked to gut inflammation in humans and pigs, indicating that postnatal lions may experience similar gut inflammation during and after pregnancy.

For comparison, we analyzed three additional samples from Asiatic lions (Panthera leo leo) collected from previous research in India and found that these sub-species had similar abundances of bacterial phyla but differences in bacterial genera and species. We attribute the similarities in bacterial phyla to common evolutionary ancestry and the differences in bacterial genera to allopatric separation causing minor changes in bacterial composition over time.

Finally, a large proportion of DNA in the lion gut was unclassified, representing new species of microorganisms not present in current databases. We were able to create 272 metagenome assembled genomes (MAGs) the majority of which represent new species which will contribute to current knowledge. The identification of novel microbial species highlights the importance of expanding microbial databases and the need for further research into host-microbe interactions in wildlife conservation contexts. Our plan for future research is to leverage long-read data to supplement databases and improve microbial classification.

16:15-16:30
Metagenomic profiling of the gut microbiome in African populations: The AWI-Gen 2 Microbiome Study
Confirmed Presenter: Luicer Anne Ingasia Olubayo, University of the Witwatersrand, South Africa

Format: In Person


Authors List: Show

  • Dylan G. Maghini, Stanford University, United States
  • Ovokeraye H. Oduaran, University of the Witwatersrand, South Africa
  • Luicer Anne Ingasia Olubayo, University of the Witwatersrand, South Africa
  • Jane A. Cook, Stanford University, United States
  • Natalie Smyth, University of the Witwatersrand, South Africa
  • Carl W. Belger, University of the Witwatersrand, South Africa
  • Furahini Tluway, University of the Witwatersrand, South Africa
  • Michelé Ramsay, University of the Witwatersrand, South Africa
  • Jakob Wirbel, Stanford University, United States
  • Ami Bhatt, Stanford University, United States
  • Scott Hazelhurst, University of the Witwatersrand, South Africa

Presentation Overview: Show

Background: Despite its critical role in human health, the gut microbiome remains understudied in underrepresented populations, particularly in low- and middle-income countries. Large-scale gut microbiome research has historically focused on high-income, industrialized populations, limiting our understanding of microbial diversity, adaptation, and health implications across different environmental and lifestyle contexts. The AWI-Gen 2 Microbiome Project addresses this gap by investigating how geography, industrialization, lifestyle, and health status shape gut microbiome diversity in six study sites across Burkina Faso, Ghana, Kenya, and South Africa.

Methods: A total of 1,801 women aged 41–84 years were enrolled from rural, semi-rural, and urban communities spanning distinct environmental and socioeconomic settings. Shotgun metagenomic sequencing was performed to generate high-resolution taxonomic and functional profiles of gut microbial communities. Metagenome-assembled genomes (MAGs) were reconstructed to expand microbial reference catalogues and uncover novel species. Statistical analyses assessed associations between microbiome composition, dietary patterns, antibiotic use, and disease states, including HIV infection.

Results: Geography was the primary driver of microbiome composition, with distinct microbial transitions observed along an industrialization gradient. Rural populations exhibited higher microbial diversity, with a notable enrichment of Treponema species, while urban populations showed reduced Treponema and Cryptobacteroides abundance alongside a relative increase in Bifidobacterium species. Nairobi’s informal settlements exhibited a unique hybrid microbiome signature, reflecting a mix of rural and urban microbial traits, challenging conventional rural–urban microbiome models. The study significantly expanded global microbial reference datasets, identifying 1,005 novel bacterial species and 40,135 previously uncharacterized viral genomes. The absence of Treponema succinifaciens in urban populations correlated with higher antibiotic exposure and lower dietary fiber intake, suggesting that antimicrobial-driven microbiome shifts may be occurring in transitioning populations. Additionally, a distinct HIV-associated microbiome signature was characterized, featuring taxa not previously linked to HIV in high-income cohorts, including Dysosmobacter welbionis and Enterocloster species. These findings underscore the need for population-specific microbiome research to better understand host-microbiome interactions in infectious diseases.

Conclusion: This study provides critical insights into the diversity and adaptation of the gut microbiome in African populations, challenging existing models of industrialization-driven microbial shifts. By leveraging shotgun metagenomics, this work contributes to a more representative and equitable global microbiome atlas, expanding the known diversity of bacterial and viral species. These findings highlight the need for inclusive microbiome research that reflects diverse global populations and informs precision medicine approaches. Beyond advancing microbiome science, this study prioritizes community engagement, participant education, and the dissemination of findings. Future work will integrate participant feedback and explore the implications of microbiome shifts for public health. Ongoing research will investigate longitudinal microbiome dynamics and microbiome-host interactions, while planned follow-up analyses will assess microbiome stability over time in previously sampled participants.

16:45-17:45
Invited Presentation: RNA structure prediction: existing approaches, strengths, limitations, and computational challenges
Confirmed Presenter: Aïda Ouangraoua, University of Sherbrooke, Canada

Format: In Person


Authors List: Show

  • Aïda Ouangraoua, University of Sherbrooke, Canada

Presentation Overview: Show

RNA structure prediction is a challenging problem in computational biology, as the three-dimensional structure of RNA molecules is intrinsically related to their function. Predicting RNA structure is important for understanding gene expression regulation, disease mechanisms, and for the development of RNA-based therapeutics. This presentation will focus on the algorithms that have been developed to predict RNA secondary and tertiary structures from sequence data. We will discuss classical approaches, such as dynamic programming and energy minimization, as well as more recent approaches that rely on machine learning and deep learning models. We will also discuss the integration of multiple data sources, such as experimental structures to enhance prediction accuracy. We will explore the strengths, limitations, and computational challenges of each category of methods.