Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


YBS 2021 | May 23, 2021 | Virtual Event

Detailed Agenda

Schedule subject to change
All times listed are in CAT
Tuesday, June 8th
Keynote: Studying African Genetic Diversity and Applications in Health
  • Nicola Mulder, University of Cape Town, South Africa

Presentation Overview: Show

African populations are known to harbour the greatest genetic diversity on earth. African genomes have been shaped by many factors, including infectious diseases, diet and exposure to unique environmental conditions. Studying African genetic diversity has the potential to improve our understanding of the genetic basis for many diseases not only in Africa, but for global populations. However, data from African populations is still relatively sparse, and the tools and skills for analyzing the data have been lacking on the continent. The H3Africa consortium, established more than 8 years ago, has changed the landscape of African genomics through the generation of large datasets to study the genetic and environmental basis for diseases. H3ABioNet, the consortium’s pan African bioinformatics network, has helped to build the capacity and tools to enable analysis of the data on the continent. This talk will discuss some of the challenges in studying African genomics, how these are being overcome, and some of the new tools being developed to interrogate the data. Analysis of an H3Africa dataset of whole genome sequences will be presented, along with a discussion on the potential health applications.

Pathogen COSI Speaker
  • Placide Mbala, METABIOTA, Democratic Republic of the Congo
A novel database dedicated to drug discovery research against infectious diseases: Leishmaniases as a paradigm
  • Emna Harigua-Souiai, Institut Pasteur de Tunis, Tunisia
  • Yosser Zina Abdelkrim, Institut Pasteur de Tunis, Tunisia
  • Rafeh Oualha, Institut Pasteur de Tunis, Tunisia
  • Sara Hamdi, Institut Pasteur de Tunis, Tunisia
  • Souiai Oussama, Institut Pasteur de Tunis, Tunisia
  • Ikram Guizani, Institut Pasteur de Tunis, Tunisia

Presentation Overview: Show

Drug discovery research is a complex field with a high attrition rate. It suffers data availability challenges. The low-data issue is even worse for Neglected Tropical Diseases (NTDs). We herein, focus on Leishmaniases.
We conducted an extensive search of the literature to retrieve data on molecules with validated effects against Leishmania species. Data on biochemical and biological validation, parasite species, pharmacokinetic and pharmacodynamic properties of the molecules were included and chemical structures were encoded using SMILES. From 753 scientific publications, we collected data on 1145 molecules that have reached different stages of validation (in vitro, in cellulo, in vivo) against one or more Leishmania species, with either unknown or known molecular target(s). Then, using RDkit, we calculated a multitude of molecular descriptors and fingerprints of the anti-Leishmania molecules to be used to train machine learning algorithms (ML) able to predict further anti-Leishmania effectors. Best performers were Random Forests and Support Vector Machines, with 97% and 95% accuracy, respectively. We used these ML algorithms to predict within the FDA approved drugs those that are most likely to have anti-Leishmania effects. Fourteen molecules are under validation against Leishmania promastigotes viability. Moreover, a database was implemented to store this dataset and future ones on other infectious diseases. A web application is under development to enable the access to this novel database.
The anti-Leishmania dataset is of high interest to the scientific community focused on drug discovery against Leishmaniases and other infectious diseases. It will be open access for wider outreach and impact.

Within-host microevolution of Streptococcus pneumoniae during natural colonisation
  • Chrispin Chaguza, Wellcome Sanger Institute, United Kingdom
  • Madikay Senghore, MRC Unit The Gambia at LSHTM, Gambia
  • Ebrima Bojang, MRC Unit The Gambia at LSHTM, Gambia
  • Rebecca Gladstone, Wellcome Sanger Institute, United Kingdom
  • Stephanie Lo, Wellcome Sanger Institute, United Kingdom
  • Peggy-Estelle Tientcheu, MRC Unit The Gambia at LSHTM, Gambia
  • Rowan Bancroft, MRC Unit The Gambia at LSHTM, Gambia
  • Archibald Worwui, MRC Unit The Gambia at LSHTM, Gambia
  • Ebenezer Foster-Nyarko, MRC Unit The Gambia at LSHTM, Gambia
  • Fatima Ceesay, MRC Unit The Gambia at LSHTM, Gambia
  • Catherine Okoi, MRC Unit The Gambia at LSHTM, Gambia
  • Lesley McGee, Centers for Disease Control and Prevention, United States
  • Keith Klugman, Emory University, United States
  • Robert Breiman, Emory University, United States
  • Michael Barer, University of Leicester, United Kingdom
  • Richard Adegbola, RAMBICON Immunisation & Global Health Consulting, Nigeria
  • Martin Antonio, MRC Unit The Gambia at LSHTM, Gambia
  • Stephen Bentley, Wellcome Sanger Institute, United Kingdom
  • Brenda Kwambana-Adams, MRC Unit The Gambia at LSHTM, United Kingdom

Presentation Overview: Show

Genomic evolution, transmission and pathogenesis of Streptococcus pneumoniae, an opportunistic human-adapted pathogen, is driven principally by nasopharyngeal carriage. However, little is known about genomic changes during natural colonisation. Here, we use whole-genome sequencing to investigate within-host microevolution of naturally carried pneumococci in ninety-eight infants intensively sampled sequentially from birth until twelve months in a high-carriage African setting. We show that neutral evolution and nucleotide substitution rates up to forty-fold faster than observed over longer timescales in S. pneumoniae and other bacteria drives high within-host pneumococcal genetic diversity. Highly divergent co-existing strain variants emerge during colonisation episodes through real-time intra-host homologous recombination while the rest are co-transmitted or acquired independently during multiple colonisation episodes. Genic and intergenic parallel evolution occur particularly in antibiotic resistance, immune evasion and epithelial adhesion genes. Our findings suggest that within-host microevolution is rapid and adaptive during natural colonisation.

Using Deep Learning in Lyme Disease Diagnosis
  • Tejaswi Koduru, Thomas Jefferson High School for Science and Technology, United States

Presentation Overview: Show

Untreated lyme disease can lead to neurological, cardiac, and dermatological complications. Rapid diagnosis of the erythema migrans (EM) rash, a characteristic symptom of Lyme disease, is therefore crucial to early diagnosis and treatment. In this study, we aim to utilize deep learning frameworks including Tensorflow and Keras to create deep convolutional neural networks (DCNN) to detect images of acute Lyme Disease from images of erythema migrans. This study uses a custom database of erythema migrans images of varying quality to train a DCNN capable of classifying images of EM rashes vs non-EM rashes. Images from publicly available sources were mined to create an initial database. Machine based removal of duplicate images was then performed, followed by a thorough examination of all images by a clinician. The resulting database was combined with images of confounding rashes and regular skin, resulting in a total of 683 images. This database was then used to create a DCNN with an accuracy of 93% when classifying images of rashes as EM vs non EM. Finally, this model was converted into a web and mobile application to allow for rapid diagnosis of EM rashes by both patients and clinicians. This tool could be used for patient prescreening prior to treatment and lead to a lower mortality rate from Lyme disease.

Population Genomics COSI Speaker: H3-Africa consortium studies : genomics resources and insights
  • Ananyo Choudhury, University of Witwatersrand, South Africa

Presentation Overview: Show

I plan to introduce some of the key genomics resources and insights that have been generated by the H3-Africa consortium. This would include snapshots of the H3-Africa genotyping array, imputation facility, whole genome and exome sequences and large-scale genotype-array datasets. I will demonstrate with a couple of examples how such resources are enhancing our understanding of population and disease genetic at the continental level as well as at the level of individual countries/regions.

MicroRNA childhood Cancer Catalog (M3Cs): a resource for translational bioinformatics toward health informatics in Pediatric Cancer
  • Wafaa Rashed, Children's Cancer Hospital Egypt 57357, Egypt
  • Fatima Adel, Children's Cancer Hospital Egypt 57357, Egypt
  • Mohamed Rezk, Armed Forced College of Medicine-Egypt, Egypt
  • Lina Basiouny, Children's Cancer Hospital Egypt-57357, Egypt
  • Ahmed Rezk, Armed Forces College of Medicine-Egypt, Egypt
  • Ahmed Abdel-Razek, Armed Forces College of Medicine-Egypt, Egypt

Presentation Overview: Show

Background: MicroRNA childhood Cancer Catalog (M3Cs) is high-quality curated collection of all published miRNA research studies on 16 pediatric cancer diseases. M3Cs scope was based on two approaches: data-driven clinical significance and data-driven human pediatric cell line models. The principle of this platform is, based on the translational bioinformatics spectrum, bring miRNA research into clinical significance in both pediatric cancer patient care and drug discovery toward health informatics in childhood cancer. Methods: M3Cs development passed with three phases: Phase 1: Literature Mining: It includes external database search and screening. Phase 2: Data processing that includes 3 steps: 1- Data Extraction, 2- Data Curation & annotation, 3- Web Development. Phase 3: Publishing: Shinyapps.io by Rstudio was used as web interface for deployment of M3Cs. M3Cs became online and can be accessed through https://m3cs.shinyapps.io/M3Cs/. Results: M3Cs has screened 6542 studies from PubMed between 2002 till the end of 2020. For Data-driven clinical significance Approach, 538 miRNAs from 268 publications were reported in clinical domain while 7 miRNAs from 5 publications were reported in clinical & drug domain. For data-driven human pediatric cell line models approach, 538 miRNAs from 1268 publications were reported in cell line domain while 211 miRNAs from 177 publications in cell line & drug domain. Conclusion: M3Cs acted to fill the gap by applying TBI general pathway to transfer data-driven research toward data-driven clinical care and/or hypothesis generation. Aggregated and well-curated data of M3Cs will enable stakeholders in health care to incorporate miRNA in the clinical policy.

Online learning methods for Next Generation Sequencing in Clinical Bioinformatics
  • Michael Cornell, University of Manchester, United Kingdom
  • Angela Davies, University of Manchester, United Kingdom
  • Andrew Brass, University of Manchester, United Kingdom
  • Andrew Devereau, University of Manchester, United Kingdom

Presentation Overview: Show

Over the last decade the development of Next Generation Sequencing (NGS) technologies has enabled an explosion in clinical genetic testing capability. While Sanger sequencing typically allowed testing of one or two genes, NGS enables large panels, whole exomes and whole genomes to be sequenced.
The new tests have massively increased the quantities of data that has to be analysed and stored, and has led to the development of a new clinical career path: the Clinical Bioinformatician. However, there is a worldwide shortage of bioinformaticians, creating a bottleneck in the delivery of clinical tests.
To address this shortage The University of Manchester in the UK have developed training courses for Clinical Bioinformaticians. Initially we developed an MSc to underpin the UK National Health Service Scientist Training Programme (STP) for Clinical Bioinformatics. The success of this course led us to extend training to those outside the UK via online learning, via a Massively Open Online Course (MOOC) and through a Postgraduate Certificate (PgCert) in Clinical Bioinformatics. The latter course covers NGS and the development of bioinformatics pipelines, along with variant classification, health informatics and programming. It is soon to begin its third iteration. During the global pandemic these online approaches have proved to be effective in delivering our normally on-campus STP programme at distance.
Here we discuss the methods and technologies that we have used to overcome the challenges encountered in teaching students how to analyse NGS data using online learning methods.

Meta-analysis and multivariate GWAS in 75000 individuals of African origin identify novel blood lipids loci
  • Jeffrey Shaffer, Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, United States
  • Sounkou Toure, African Center of Excellence in Bioinformatics, University of Sciences, Techniques and technologies of Bamako, Mali
  • Cheickna Cissé, African Center of Excellence in Bioinformatics, University of Sciences, Techniques and technologies of Bamako, Mali
  • Mamadou Welle, African Center of Excellence in Bioinformatics, University of Sciences, Techniques and technologies of Bamako, Mali
  • Aboubacrine Mahamane Toure, Faculty of Sciences and Techniques, University of Sciences, Techniques and Technologies of Bamako, Mali
  • Seydou Doumbia, Faculty of Medicine and Odonto-stomatology, University of Sciences, Techniques and Technologies of Bamako, Mali
  • Tinashe Chikowore, Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa
  • Segun Fatumo, Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, United Kingdom

Presentation Overview: Show

Despite an observed lack of transferability of genetic loci between Eurasians and Africans cohorts, European ancestry participants remain dominant in genome wide associations studies (GWAS). This is true in particular for loci associated with dyslipidemia, the dis-regulation of blood lipids such as cholesterol (TC), triglycerides(TG), low and high density lipoproteins cholesterol levels (LDL, HDL), a major risk factor of Cardiovascular Disease (CVD). Allele frequencies and linkage disequilibrium (LD) differences, admixture effects, natural selection and gene-environment interplay are possible explanation of the paucity of generalization between ancestry groups. Irrespective of its causes, the lack of generalization means that most dyslipidemia risk loci in African ancestry populations remain to be identified. This study meta-analyzed HDL, LDL, TC and TG GWAS summary statistics of the African Partnership for Chronic Disease Research comprising cohorts from Uganda, South Africa, Ghana, Nigeria, Kenya, African Americans of the One Million Veteran Program and African ancestry participants of UK Biobank (N~ 75 000) to identify lipids loci and explore heterogeneity between cohorts using Bayesian colocalization. The meta-analysis results were fine mapped using the multi-trait fine mapping method FLASHFM and jointly analyzed using Multi-Trait Analysis of GWAS (MTAG) to identify additional signals. 107 distinct loci were identified among which 15 were novel of which 8 were found with MTAG alone. Colocalization identified 3 regions with posterior probability of distinct causal variants > 80% and multivariate fine mapping produced smaller credible set intervals. These results confirm the potential of new discoveries using African ancestry cohorts and multivariate methods.

Wednesday, June 9th
Metaomics COSI Speaker: Metagenomic analysis as new approach for monitoring African freshwater microbiome
  • Sara Ettamimi, University Sidi Mohamed, Morocco

Presentation Overview: Show

Water quality in Africa is an important public health issue. Many studies have been conducted in the African continent in order to investigate microbial pathogens affecting human health. While traditional microbiological freshwater tests focus on the detection of specific bacterial indicator, metagenomic approach basing on a total DNA sequencing seems to be a good alternative to access the whole freshwater microbiome and his interaction with environmental parameters in situ.

Impact of delivery mode on gut microbiota of Tunisian newborns
  • Oussema Souiai, Laboratory of Bioinformatics, bioMathematics and bioStatistics, Institut Pasteur de Tunis, Tunisia, Tunisia
  • Mariem Hanachi, Laboratory of Bioinformatics, bioMathematics and bioStatistics, Institut Pasteur de Tunis, Tunisia
  • Olfa Maghrbi, Laboratory of Transmission, Control and Immunobiology of Infections, Institut Pasteur de Tunis, Tunis, Tunisia
, Tunisia
  • Haifa Bichiou, Laboratory of medical parasitology, biotechnology and biomolecules, Institut Pasteur de Tunis, Tunisia, Tunisia
  • Meriam Belghith, Laboratory of Transmission, Control and Immunobiology of Infections, Institut Pasteur de Tunis, Tunis, Tunisia, Tunisia
  • Mariem Baouendi, Laboratory of Bioinformatics, bioMathematics and bioStatistics, Institut Pasteur de Tunis, Tunisia, Tunisia
  • Lamia Guizani-Tabbane, Laboratory of medical parasitology, biotechnology and biomolecules. Institut Pasteur de Tunis, Tunisia, Tunisia
  • Alia Benkahla, Laboratory of Bioinformatics, bioMathematics and bioStatistics, Institut Pasteur de Tunis, Tunisia, Tunisia

Presentation Overview: Show

Microbiota colonization is a lifetime fluctuant process closely related to individual health status. The composition of newborns’ gut microbiota can be conditioned by multiple factors including the mode of delivery.
Herein, we aim to investigate the impact of the mode of delivery on the diversity and dynamics of early gut bacterial communities of Tunisian newborns.

Meconium samples were collected from 5 neonates born by unlabored cesareans (CS) and 5 others vaginally delivered (VD) at Day0, Day15, and Day30. Meta-information regarding mothers and newborns was collected. Shotgun sequencing was performed and data were analyzed by the suitable bioinformatic pipelines.

Analysis of bacterial diversity showed that the major changes in the bacterial community structures occur over time. Indeed, both groups showed an increase in the diversity of bacterial content of neonates fecal samples collected at D15 and D30 compared to D0.
While the gut microbiota of CSs neonates was colonized by Actinobacteria, Proteobacteria, and Firmicutes, the microbiota of VDs contains in addition Bacteroidetes phylum. Interestingly, CS newborns showed enrichment in opportunistic pathogens commonly found in the hospital environment.

Although no overall change was observed between CS and VD, our analysis detected differentially abundant key taxa related to health status.
To gain a broader picture of diversity, we intend to explore the diversity of bacteriophage and their interplay with their hosting bacteria in the gut microbiota ecosystem.

New neural network classification method for individuals ancestry prediction from SNPs data
  • Harouna Soumare, University of Tunis El manar, Tunisia
  • Sadok Rezgui, Adagos, Tunisia
  • Nabil Gmati, Imam Abdulrahman Bin Faisal University, Saudi Arabia
  • Alia Benkahla, Institut Pasteur de Tunis, Tunisia

Presentation Overview: Show

Artificial Neural Network (ANN) algorithms have been widely used to analyse
genomic data. Single Nucleotide Polymorphisms(SNPs) represent the genetic
variations, the most common in the human genome, it has been shown that they
are involved in many genetic diseases, and can be used to predict their development. Developing ANN to handle this type of data can be considered as a great success in the medical world. However, the high dimensionality of genomic data and the availability of a limited number of samples can make the learning task very complicated. In this work, we propose a New Classification Neural Network based on the perturbation of the input matrix. To address the problem of dimensionality, the training model is constructed in three steps followed by a test phase: (1) use SVD to reduce the dimension of the input data, (2) train a Multi-Layer Perceptron (MLP ) to perform classification tasks, (3) perturb the SVD projection matrix in the sense to minimize the training loss function. In the test phase, the test set is multiplied by the perturbed projection matrix to evaluate the performance of the classifier. The proposed method has been evaluated on data from individuals with different ancestral origins, the experimental results have shown the effectiveness of the proposed method. Achieving up to 96.23% of classification accuracy, this approach surpasses previous Deep learning approaches evaluated on the same dataset.

Combining expression evidence with predicted structures to enrich gene model repertoire in eukaryotes using FINDER – a fully automated gene annotator
  • Sagnik Banerjee, Iowa State University, United States
  • Priyanka Bhandary, Iowa State University, United States
  • Margaret Woodhouse, Iowa State University, United States
  • Taner Sen, Iowa State University, United States
  • Roger Wise, USDA-ARS, Iowa State University, United States
  • Carson Andorf, Iowa State University, United States

Presentation Overview: Show

Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of expression data. The presence of sequence repeats in genomes adds to this complexity, as do overlapping genes. Currently available software annotate genomes using full-length cDNA or a database of splice junctions which makes them susceptible to the errors in the input. We present FINDER, which automates downloading expression data from NCBI, optimizes read alignment, assembles transcripts and performs gene prediction. We configured FINDER to apply statistical changepoint detection to read coverage data which led to the discovery of an additional ~6000 overlapping genes on the same strand in Arabidopsis thaliana. FINDER integrates prediction results from BRAKER2 with assemblies constructed from expression data to approach the goal of exhaustive genome annotation. FINDER accurately reconstructed 22,198 and 25,156 transcripts in Arabidopsis thaliana and Zea mays respectively – about 4000 more transcripts than BRAKER2, MAKER2 and PASA. Even in different groups like transcripts with micro-exons, overlapping transcripts, etc., FINDER reported a superior performance. In addition to generating accurate genes in model plants, FINDER outperformed other methods in the non-model cereal barley. FINDER further reports transcripts and recognizes genes that are expressed under specific conditions or in particular tissue types. The pipeline scores genes as high confidence or low confidence based on the available evidence. It is can process eukaryotic genomes of all sizes and requires no manual supervision – ideal for bench researchers with limited experience in handling computational tools.

Structuralbio COSI Speaker: Understanding the effects of mutations on protein drug targets of pathogens for rational drug research and development
  • Ozlem Tastan Bishop, Rhodes University, South Africa

Presentation Overview: Show

Computational drug discovery has been successfully used for drug design. However, consideration of the impact of evolutionary mutations of pathogens, including those linked to drug resistance, is mostly undetermined. This talk focuses on understanding the effects of variations in drug targets for designing novel therapies, via three examples: 1. HIV protease drug resistance (PMID: 30560871); 2. Resistance mutations of Mycobacterium tuberculosis pyrazinamidase (PMID: 32489525); 3. Impact of mutations on SARS-CoV-2 Mpro (PMID: 32853525).

Virtual screening of three SARS-Cov2 coronavirus proteins to identify novel inhibitors
  • Ruben Cloete, University of the Western Cape, South Africa

Presentation Overview: Show

COVID-19 infections, caused by the novel SARS-Cov2, are on the rise and warrants the identification of novel drugs to treat and prevent further infections. We investigated three SARS-Cov2 proteins experimentally resolved structures with PDBIDs: 6m71 (RNA polymerase), 6w02 (Non-structural phosphate polyprotein (nsp3)) and a homology model of the Nucleocapsid phosphate protein based on template (6m3m) by performing molecular docking, pharmacophore modelling and simulation studies of protein-drug complexes and drug interactions. The compound databases screened consisted of the Prestwick chemical library (PCL), FDA approved via DSSTOX, African natural products (Afrodb) and Maybridge database to identify novel putative inhibitors. Pharmacophore modelling using docked complexes of known antiviral inhibitors (Remdesivir, Danoprevir, substrate Adenosine-5-diphosphoribose (APR) and Duranavir) allowed the identification of 269 compounds from the PCL database, ZINC database and Maybridge database. Molecular docking identified eight compounds with higher binding affinity compared to the known antiviral drugs and binding free energy calculations confirmed four compounds with strong binding affinity for these proteins. Interaction analysis also indicated a higher number of interactions for these four compounds compared to the known antivirals Remdesivir, Danoprevir and Duranavir. We propose these four compounds ZINC03831623, ZINC01047180, ZINC04368856 and ZINC08397657 for in vitro and in vivo experimental testing against the novel Coronavirus.

Structural Bioinformatics Analysis of CYP2D6 pharmacogenetic variation relevant to Sub-Saharan African populations
  • Blessing Sitabule, Sydney Brenner Institute For Molecular Bioscience (University of the Witwatersrand), South Africa
  • Houcemeddine Othman, Sydney Brenner Institute For Molecular Bioscience, South Africa
  • Scott Hazelhurst, School of Electrical and Information Engineering (University of the Witwatersrand), South Africa

Presentation Overview: Show

Pharmacogenomics is a field of study that involves the association of genes involved in drug metabolism with drug response. The CYP2D6 gene is an important gene in drug metabolism. Different studies have identified a number of variations in the CYP2D6 gene in African populations with potential functional impact.
To gain insights of how missense variants found in sub-Saharan African populations may potentially impact the functionality of the CYP2D6 enzyme, with focus on a drug commonly used in Africa.
Fifty missense variants found in sub-Saharan African populations from the H3A/Wits GSK projects and the PharmVAR, PharmGKB and GnomAD databases were selected. Missense variants were prioritized for molecular dynamics using the Structural Workflow for Annotating ADME gene Targets (SWAAT) to identify variants with potential significant impact on drug metabolism. The 3TBG was selected as a template for modelling. Topology generation of the mutant and wildtype enzymes followed by the minimisation, heating and equilibration of the system was performed prior to performing molecular dynamics.
Preliminary Results:
Two missense variants (Y355C and R365H) were prioritised based on the SWAAT results. These variants were predicted to have a destabilising effect based on a machine learning prediction and a ∆∆G > 1.0.
Next steps:
These variants and other variants selected from *17, *29, *73 and *74 are being assessed through molecular dynamics to determine the potential impact on the functionality of the enzyme. The findings of this study will provide an understanding of how missense variants may potentially affect the functionality of CYP2D6.

Genome-wide Analysis of Cytochrome P450 Genes of Xylaria sp. FL1777 for Bioremediation: Annotation and Evolutionary Relationships
  • Wadzani Palnam Dauda, Federal University, Gashua, Nigeria
  • Elkanah Glen, Department of Biochemistry, Federal University Lokoja, Kogi State, Nigeria, Nigeria
  • Peter Abraham, Department of Horticulture, Federal College of Horticulture, Dadin Kowa, Gombe, Nigeria,, Nigeria
  • Charles Oluwaseun Adetunji, Department of Microbiology, Edo University Iyamho, PMB 04, Auchi, Edo State, Nigeria, Nigeria
  • Daji Morumda, Department of Microbiology, Federal University Wukari, Taraba State,, Nigeria
  • Shittu Emmanuel Abraham, Department of Agronomy, Bayero University Kano, Nigeria, Nigeria
  • Grace Peter Wabba, Department of Biochemistry, Ahmadu Bello University, Zaria, Nigeria
  • Israel Ogra Ogwuche, Department of Biochemistry, Ahmadu Bello University, Zaria, Nigeria
  • Mawuli Kwamla Azameti, Division of Molecular Biology and Biotechnology, Indian Agricultural Research Institute, New Delhi, India, India

Presentation Overview: Show

Bioremediation using fungi has been identified as a sustainable biotechnological technique mainly through the application of their enzymes for effective removal of these pollutants through the process of mineralization or absorption. Cytochrome P450s (CYPs) are part of the numerous families of genes in eukaryotes with varying intra and inter-species complexity. They used molecular oxygen to alter a substrate's configuration, which constitutes a major mechanism of action through which they perform their toxicological, physiological and ecological processes. Xylariaceae forms part of the largest and most varied families of filamentous Ascomycota; it has significant roles as saprotrophs of litter, dung, wood and soil. Genome-wide annotation analysis of Xylaria sp. was carried out to understand the bioremediation potential of its CYPs. The evolutionary relationship of taxa has divided the 214 CYPs into 15 clades. The genes were classified into 47 clans and 86 families. Gene structure analysis was carried out, and 10 conserved motifs were identified using the MEME suite intron, and exon analysis showed high dynamic intron-exon organization. The predicted subcellular localization has shown most of the genes to be localized in the endoplasmic reticulum. This study provides an opportunity to further explore the Xylaria sp and its application in bioremediation through the degradation of environmental pollutants

Thursday, June 10th
SysAdmin COSI Speaker: Handling 500000 SARS-CoV-2 samples or: How I learned to stop worrying and love the storage
  • Radoslaw Poplawski, University of Birmingham, United Kingdom

Presentation Overview: Show

At the beginning of the COVID pandemic the UK launched a nationwide genome sequencing effort: the COVID-19 Genomic UK Consortium. In order to handle the deluge of data coming in (over 10000 samples per week) a bespoke system needed to be developed. In this talk, I’ll go over the infrastructure we setup including storage, compute and supporting services. I’ll highlight the quirks and challenges we encountered over the last year.

Delivering a federated infrastructure for transcontinental human data exchange and analysis
  • Nicola Mulder, University of Cape Town, South Africa
  • Mamana Mbiyavanga, University of Cape Town, South Africa

Presentation Overview: Show

Human cohort studies and national healthcare initiatives are now generating large biomolecular datasets. Researchers and clinicians require access to these datasets if we are to fully realise the potential to positively impact human health. It is now evident that the centralised data storage, sharing and analysis approach is no longer practical for data harmonisation, sharing or analysis and ethical, legal, social reasons. CINECA’s [Common Infrastructure for National Cohorts in Europe, Canada and Africa] approach is to develop a federated cloud-based infrastructure for making population-scale genomic and biomolecular data accessible to scientists across international borders.

CINECA has assembled a virtual cohort of 1.4 million individuals from diverse cohorts and projects such as the EGA, CanDIG and H3Africa, leveraging existing investments in human cohort studies. CINECA has already contributed to harmonising metadata using open global standards such as the Global Alliance for Genomics and Health (GA4GH) standards and FAIR principles, which will drive variant and sample discovery in transcontinental virtual cohorts.

The H3ABioNet project is deploying a research infrastructure using the federated solutions proposed by CINECA. H3africa datasets served through EGA are mapped to the GA4GH Data Access Ontology (DUO) to increase its FAIRness. Several H3Africa datasets will be served through the EGA Beacon under controlled access, and eventually through an H3Africa Beacon using GA4GH standards for discovery to the scientific community. The GA4GH AAI and passports specifications are being implemented throughout the H3ABioNet services to enable more secure and interoperable federated data sharing and analysis to advance benefits to patients.

The role of death in the rise of eukaryotes: the case for archaeal caspase homologs
  • So Ri La, Evolutionary Studies Institute, South Africa
  • Andrew Ndhlovu, Evolutionary Genomics Group, University of Stellenbosch, South Africa
  • Pierre Durand, Evolutionary Studies Institute, South Africa

Presentation Overview: Show

Eukaryogenesis is the evolutionary process leading to the origin of eukaryotic cells. The most widely accepted hypothesis is that eukaryotes originated from endosymbiosis between an archaeon and an α-proteobacterium. However, this endosymbiosis would have resulted in a levels-of-selection conflict due to selfish behavior between the individuals. Programmed cell death (PCD), a genetically controlled process of cell suicide, is hypothesized as one of the conflict mediators. Caspases are members of the C14 peptidase family that are essential for the PCD pathway and assumed to have originated from the α-proteobacterium via horizontal gene transfer. Caspase-like homologs have been hard to identify in Archaea and empirical evidence of PCD is absent, raising questions about the evolution of caspases and PCD during eukaryogenesis. Using Hidden Markov Model profile searches of protein sequence databases, I investigated caspase-like proteases in Archaea and found 230 sequences: metacaspase I (16,43%) and orthocaspases (83,57%), represented in all major superphyla of Archaea. Phylogenetic analyses under maximum likelihood using MEGA X revealed a scattered distribution in deep-branching bacteria and archaea. The tree topology does not match the prokaryote 16S phylogeny suggesting either a common ancestry, possibly since the Last Universal Common Ancestor, or the result of massive horizontal gene transfer. Pfam domain analysis identified PCD-associated protein domains within caspase-like homologs highlighting the potential role of these enzymes in archaeal PCD. This work provides evidence supporting the hypothesis that caspase-like homologs evolved earlier than previously thought, acting as a conflict mediator during eukaryogenesis, and supports the original sin hypothesis of PCD evolution.

Tackling the Moving Mutant Target: Taxifolin Derivatives as Novel Putative E571K Exportin-1 Inhibitors for KRAS-mutant Lung Adenocarcinoma Therapy
  • Temidayo Adigun, University of Ilorin, Nigeria

Presentation Overview: Show

Background: The missense single nucleotide gain-of-function E571K mutation of exportin-1 gene is a critical driver of the highly untreatable KRAS-mutant lung adenocarcinoma and its associated mortality, thus, developing effective selective inhibitor of nuclear export signals for tackling the isolated E571K-mutated exportin-1 is a viable strategy for improving the prognosis of this condition, in spite of potency and safety limitation of current inhibitors of both wild type and mutated XPO1.
Aim: To establish, through molecular modelling, safe and clinically acceptable putative antagonists of E571K-mutated exportin-1 among the bioactive compounds in various parts of Juglans mandshurica.
Methods: The bioactive compounds were subjected to compendium of druglikeness and lead-likeness filter workflows prior to docking of the resultant compounds into E571K exportin-1 active site using PyRx AutoDock vina to establish their binding affinity and interaction profile. The evolutionary algorithm of Osiris property explorer DataWarrior software as well as lead-likeness filter were employed for generation of novel non-promiscuous analogues of the lead compound with better putative selectivity and clinical acceptability as E571K Exportin-1 antagonists.
Results: The findings of this study present taxifolin as the putatively effective and lead-like E571K Exportin-1 inhibitor with high potential of qualifying for clinical evaluation but is associated with high promiscuity tendency in high throughput screening. The evolutionary derivation of novel analogues of the compound, however, results in the generation of putatively non-promiscuous, non-toxic, and lead-like E571K Exportin-1 antagonists with high synthetic accessibility and clinical developability for evaluation in the strategy for treatment of drug-resistant KRAS-mutant lung adenocarcinoma condition.

Agricultural Bioinformatics COSI Speaker: The genome landscape of African cattle
  • Olivier Hanotte, University of Nottingham, United Kingdom

Presentation Overview: Show

The increased availability of full genome sequences of African cattle combined with new bioinformatics analysis is opening new doors in our understanding of the origins of African cattle diversity, its evolution and the local adaptation of African cattle to environmental challenges. The findings have major implications in the design of new breeding programs combining productivity improvement with environmental sustainability in the increasing worrying context of climate change. This talk will summarise here the latest research breakthrough from the past ten years.

Genomics- and Machine Learning-accelerated discovery of biocontrol bacteria
  • Matthew Biggs, AgBiome, United States
  • Kelly Craig, AgBiome, United States
  • Esther Gachango, AgBiome, United States
  • David Ingham, AgBiome, United States
  • Mathias Twizeyimana, AgBiome, United States

Presentation Overview: Show

Machine learning can accelerate the discovery process by making screening and discovery more efficient. We developed a novel machine-learning workflow to identify genomic features associated with fungicidal activity of bacteria, and leveraged those genomic features to discover additional bacteria with the desired activity. We applied our workflow to discover solutions to two problematic fungal diseases: Sorghum Anthracnose and Black Sigatoka of bananas. These diseases are problematic worldwide, with a particularly devastating impact on small-holder farmers in Sub-Saharan Africa. We screened a total of 1,227 bacterial isolates for antifungal activity against these pathogens using detached-leaf methods and identified 72 taxonomically-diverse isolates with robust activity against one or both of these pathogens. We identified biosynthetic gene clusters associated with activity against each pathogen. Machine-learning improved the discovery rate of our screen by 3-fold, and led to the discovery of a taxonomic group in which fungicidal activity has never been reported. This work highlights the wealth of biocontrol mechanisms available in the microbial world for management of fungal pathogens, generates opportunities for future characterization of novel fungicidal mechanisms, and provides a set of genomic features and models for discovering additional bacterial isolates with activity against these two pathogens. Finally, our workflow generalizes to any discovery effort where genomic information is available to guide candidate selection.

MinHash k-mer sketching highlights allopolyploid subgenome sequence differentiation
  • Gillian Reynolds, Montana State University, United States
  • Veronika Strnadova-Neeley, Montana State University, United States
  • Jennifer Lachowiec, Montana State University, United States

Presentation Overview: Show

Characterizing the intra- and inter-sequence variation between subgenomes and across species forms the basis of methods used to uncover the structure, function and evolution of polyploid crop genomes. However classic methods for sequence characterization are either quadratic in complexity, or do not work well for divergent genomes. In this work, we explore the linear MinHash k-mer sketching strategy to rapidly and efficiently determine the relationship between chromosomal sequences in a number of agronomically important allopolyploid species. We find that for nearly all allopolyploid species studied, there exists a range of k-mer values with which MinHash sketching produces results in line with current chromosomal subgenomic assignments. For some, that k-mer range is large and the subgenomic sequences cluster fairly stably across the sampled k-mer spectra; however, for others, the ranges are small, and the clusters are highly unstable. No such subgenomic clustering of chromosomes could be observed for autopolyploid genomes, although interestingly, both polyploid types appear to share a critical k-mer size with which chromosome sequence similarity begins to be discoverable. Further, we observe correct progenitor chromosomal clustering, demonstrating the accuracy of the MinHash sketch approach. As such, MinHash sketching shows promise for a range of applications in agricultural genomics, including the rapid determination of inter- and intra-subgenomic sequence similarity, assignment of sequences to subgenome of origin, identifying subgenomic progenitors and elucidation of genome ploidy type. Such discoveries are critical for characterizing the pool of genetic variation available for variety development and for furthering our understanding of polyploid evolution.

  • Temitayo Olagunju, University of Ibadan / IITA Ibadan, Nigeria
  • Angela Makolo, University of Ibadan, Ibadan, Nigeria
  • Andreas Gisel, Internatioonal Institute of Tropical Agriculture (IITA), Ibadan / ITB,NRC, Bari, Italy, Nigeria

Presentation Overview: Show

Gene regulatory networks (GRNs) perturbation is important to understanding the molecular mechanisms underlying observed phenotypes. But only a few genes can be studied at once using gene knock-out experiments in the laboratory. This is a limitation to studying complex traits such as disease resistance, controlled by cohorts of genes. Next generation sequencing (NGS) combined with network modeling open the possibility of large scale GRN perturbation studies.
In this work, we developed a perturbation simulation model of small RNA-mediated GRN for host-pathogen interaction studies. We applied the model to NGS data of five genotypes of Cassava with clearly different traits of resistance to Cassava Mosaic Disease (CMD) in order to elucidate the molecular mechanism of CMD resistance. Genome-wide small RNA profiling and differential expression analysis was used to identify CMD-responsive small RNAs whose gene targets were predicted. The small RNA-target data with gene co-expression were used to construct GRN models. The in vivo observed regulatory action of small RNAs on gene targets was used as the criterion for selection of node perturbation set. Network density was used as a measure of network robustness for each perturbation to the networks.
The results of this study showed a correlation between the network dynamics and the biological significance of the perturbations. In some networks, perturbation of genes involved in biotic stress response decreased the robustness of the networks. The model presented in this study can thus be applied for large scale studies of molecular mechanism of disease resistance, subject to further interpretation by biologists.

Poster Flash Talks

International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube