Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


YBS 2021 | May 23, 2021 | Virtual Event

Virtual Viewing Hall

View By Category


ISCB-Africa ASBCB Presentations

Presentation 01: mashphage: a fast and accurate application of mash distance for whole-genome scale clustering of actinobacteriophages

Keywords: comparative genomics bacteriophage alignment-free methods mash distance
  • Anicet Ebou, Bioinformatic team, Institut National Polytechnique Felix Houphouet-Boigny, Cote d'Ivoire
  • Dominique Koua, Bioinformatic team, Institut National Polytechnique Felix Houphouet-Boigny, Cote d'Ivoire

Short Abstract: The clustering of phage genomes into predefined clusters has been a must-see method in phage comparative genomics since the comparative genomic analysis of 60 mycobacteriophages by Hatful et al in 2010. Since then, multiple tools have been used to clusters a novel determined genome of phage. These tools either rely on alignment-based methods which are time-consuming or alignment-free methods that can be improved. Here we present, mashphage a tool for the rapid clustering of Actinobacteriphages genome following the scheme of clusters based on the mash distance. The test of mashphage using a dataset of 430 complete phage genomes showed a sensitivity of 99.45% and a specificity of 88.88 %. The application of Mashphage on a subset of the Pope et al's phage genomes enabled to correctly assign phage genomes in the reported percentage under one minute. Mashphage is available as a python 3 command-line tool at https://github.com/Ebedthan/mashphage executable on any operating system.

Presentation 02: SASCRiP: Sequencing Analysis of Single-Cell RNA in Python - A pre-processing workflow for UMI count-based scRNA-seq data

Keywords: single-cell RNA sequencing data pre-processing Python modular
  • Darisia Moonsamy, School of Molecular and Cell Biology, University of the Witwatersrand, South Africa
  • Nikki Gentle, School of Molecular and Cell Biology, University of the Witwatersrand, South Africa

Short Abstract: Single-cell RNA sequencing (scRNA-seq) is increasingly being used to study the transcriptome at single-cell resolution. However, extensive data pre-processing and quality control steps are required to ensure raw sequencing data are suitable for downstream analysis. At present, there are no available Python packages able to execute and visualise all the steps necessary for data pre-processing. Here we present SASCRiP - Sequence Analysis of Single-Cell RNA in Python. SASCRiP is a Python package developed to combine features of the widely used BUStools and Seurat tools, to provide a flexible, integrative single-cell pre-processing workflow for unique molecular identifier (UMI) count-based scRNA-seq data. SASCRiP consists of a series of parameterised wrapper functions that can be customised to accommodate all UMI count-based datasets. All quality control steps for scRNA-seq gene expression analysis can then be executed and visualised within SASCRiP, thus eliminating the need to combine multiple tools for data pre-processing. Here we demonstrate the utility of the SASCRiP workflow using three UMI count-based datasets derived from peripheral blood mononuclear cells (PBMCs), showing how our modular design allows data obtained using different sequencing chemistries to be seamlessly processed, as well as how quality control metrics can be visualised using custom plotting functions within SASCRiP. Clustering analysis was also then performed using Seurat (3.2.0), in order to demonstrate the effectiveness of the pre-processing procedures for subsequent downstream analyses. In conclusion, SASCRiP’s modular design provides a highly flexible, yet integrative workflow for preparing unprocessed UMI count-based scRNA-seq data for subsequent downstream analyses.

Presentation 03: Impact of gene annotation choice on the quantification of RNA-seq data

Keywords: RNA-seq Bioinformatics Genomics Annotation Genome
  • David Chisanga, Olivia Newton-John Cancer Research Institute, Australia
  • Wei Shi, Olivia Newton-John Cancer Research Institute, Australia

Short Abstract: RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from >800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.

Presentation 04: StellarPGx: A Nextflow Pipeline for Calling Star Alleles in Cytochrome P450 Genes

Keywords: Pharmacogenomics Cytochrome P450 genes Alleles Genotype Phenotype Haplotype Bioinformatics Algorithms Software
  • David Twesigomwe, University of the Witwatersrand, Johannesburg, South Africa
  • Britt Drögemöller, University of Manitoba, Winnipeg, Canada
  • Galen Wright, University of Manitoba, Winnipeg, Canada
  • Azra Siddiqui, University of the Witwatersrand, Johannesburg, South Africa
  • Jorge Da Rocha, University of the Witwatersrand, Johannesburg, South Africa
  • Zane Lombard, National Health Laboratory Service & University of the Witwatersrand, South Africa
  • Scott Hazelhurst, University of the Witwatersrand, Johannesburg, South Africa

Short Abstract: Genotype-guided therapy promotes drug efficacy and safety. However, accurately calling star alleles (haplotypes) in cytochrome P450 (CYP) genes, which encode over 80% of drug-metabolising enzymes, is challenging. Notably, CYP2D6, CYP2B6 and CYP2A6, which have neighbouring pseudogenes, present short-read alignment difficulties, high polymorphism and complex structural variations. We present StellarPGx, a Nextflow pipeline for accurately genotyping CYP genes by leveraging genome graph-based variant detection and combinatorial star allele assignments. StellarPGx has been validated using 109 whole genome sequence samples for which the Genetic Testing Reference Material Coordination Program (GeT-RM) has recently provided consensus truth CYP2D6 alleles. StellarPGx had the highest CYP2D6 genotype concordance (99%) to GeT-RM compared to existing callers namely, Cyrius (98%), Aldy (82%) and Stargazer (84%). The implementation of StellarPGx using Nextflow, Docker and Singularity facilitates its portability, reproducibility and scalability on various user platforms. StellarPGx is publicly available from https://github.com/SBIMB/StellarPGx.

Presentation 05: Comparative analyses of the diversity of the intestinal mycobiota of the newborn

Keywords: Metagenomics Shotgun Gut mycobiota Fungi Newborn baby
  • Oussema Souiai, Pasteur Institute of Tunis, Tunisia
  • Ichrak Ghouili, Pasteur Institute of Tunis, Tunisia
  • Mariem Hanachi, Pasteur Institute of Tunis, Tunisia
  • Alia Ben Kahla, Pasteur Institute of Tunis, Tunisia

Short Abstract: The intestinal microbiota is composed of bacteria, archaea, viruses and fungi. Most of the studies dealing with the microbiome concern only the bacterial microbiota. Fungi, have been largely neglected. Although they represent about 0.1% of the adult gut microbiota, various cases of dysbiosis are related to the overrepresentation of fungal species. Until recently, the majority of mycobiota analysis has relied on culture, which is less sensitive than sequencing approaches. Taxonomic characterization of fungi with ITS regions is still difficult due to incomplete reference databases available. In this study, we were interested in describing the taxonomic profile of the intestinal mycobiota and assessing the impact of delivery and lactation mode on the cn of these metagenomes during the first month of life. Sequencing data from stool samples of 20 newborns were downloaded from the SRA. Quality control was performed. Sequences were merged, filtered, assembled, and classified by taxonomy using an in-house pipeline. Statistical analysis was performed using various R scripts. Fungi represent 4.38% of the newborn gut microbiota divided into 2 phyla: Ascomycota and Basidiomycota. 54 genera were identified with a high prevalence of Candida (12.24%), Fusarium (7.62%), Pyricularia (6.11%), Naumovozyma (4.93%). No significant difference was observed between the mycobiota of vaginal delivery and that of cesarean delivery. The same trend was observed when comparing the composition at D4 and D21. The mycobiota seems to be influenced by the mode of feeding in the whole cohort. Further research is needed to address the methodological challenges of generating data on this mycobiome.

Presentation 06: Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation

Keywords: multiomics multimodal genomics single-cell genomics nuclear architecture differentiation early developmental
  • Giancarlo Bonora, Genome Sciences, University of Washington, United States
  • Vijay Ramani, Genome Sciences, University of Washington, United States
  • Ritambhara Singh, Brown University, United States
  • He Fang, Department of Laboratory Medicine and Pathology, University of Washington, United States
  • Dana Jackson, Genome Sciences, University of Washington, United States
  • Sanjay Srivatsan, Genome Sciences, University of Washington, United States
  • Ruolan Qiu, Genome Sciences, University of Washington, United States
  • Choli Lee, Genome Sciences, University of Washington, United States
  • Cole Trapnell, Genome Sciences, University of Washington, United States
  • Jay Shendure, Genome Sciences, University of Washington, United States
  • Zhijun Duan, Institute for Stem Cell and Regenerative Medicine, University of Washington, United States
  • Xinxian Deng, Department of Laboratory Medicine and Pathology, University of Washington, United States
  • William Stafford Noble, Genome Sciences, University of Washington, United States
  • Christine Disteche, Department of Laboratory Medicine and Pathology, University of Washington, United States

Short Abstract: Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X-chromosome inactivation (XCI) by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. Allele-specific single-cell Hi-C shows that the inactive X chromosome has a unique profile in differentiated cells that have undergone XCI that is lost at mitosis. Differentiation of embryonic stem cells collected at different time points shows that the onset of XCI is associated with global changes 3D conformation. Based on trajectory analyses three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes but also to that of autosomes during differentiation. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.

Presentation 07: Identifying a Biomarker for Systemic Sclerosis Using Existing Genomic Data

Keywords: Systemic Sclerosis Biomarker Microarray Skin Lungs Blood CCL2 COL18A1 COL4A1 COL5A2 PECAM1 JUN CDH5 IL7R MYLIP WSB1
  • Joyeeta Dutta, University of Toledo, United States
  • Sadik Khuder, University of Toledo, United States

Short Abstract: Systemic Sclerosis (SSc) is a complex autoimmune disease characterized by dysregulation of the immune system, a characteristic vasculopathy, and severe and often progressive fibrotic process that widely varies in extent. Its pathogenesis mechanism is unclear. Discovery of validated biomarkers is crucial for the diagnosis, disease classification, and evaluation of organ involvement and therapeutic response in SSc. We aimed to identify a biomarker for SSc using the publicly-available microarray profiles of skin, lungs and blood from SSc patients and healthy controls. We sought common gene expression patterns associated with SSc in skin, lungs, and blood. All data were downloaded from the Gene Expression Omnibus (GEO) database. The top 1000 differentially expressed genes (DEGs) from each dataset, and shared DEGs among datasets of each tissue type were identified using R. Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed. The protein-protein interaction (PPI) network was constructed and hub genes were identified. A total of 29,134 and 29 significant DEGs were shared in skin, lungs, and blood datasets respectively. Four statistically significant genes (CCL2, COL18A1, COL4A1, and COL5A2) from skin, five statistically significant genes (PECAM1, JUN, CDH5, IL7R, and WT1) from lungs, and three (MYLIP, UBE2D1, and WSB1) from blood were identified as hub genes with potential clinical implication for diagnosis or treatment of SSc. In conclusion, this study identified a number of significant DEGs and 12 hub genes that may serve as important biomarker for early detection and treatment of SSc.

Presentation 08: Genetic markers associated with anemia in individuals with sickle cell disease in Tanzania

Keywords: Sickle cell disease Genome Wide Association Studies (GWAS) Anemia
  • Liberata Mwita, Muhimbili University of Health and Allied Sciences, Tanzania
  • Upendo Masamu, Muhimbili University of Health and Allied Sciences, Tanzania
  • Raphael Sangeda, Muhimbili University of Health and Allied Sciences, Tanzania
  • David Dynerman, Public Health Company, United States

Short Abstract: Sickle cell disease (SCD) is a global health problem, a genetic disease which affects many people, particularly common among those whose ancestors came from sub-Saharan Africa. All SCD individuals experience anemia which increases the morbidity and mortality. This research aims to identify genetic variants associated with anemia in sickle cell individuals. In the long-term this will contribute towards efforts to improve the life expectancy of individuals by quickly identifying single nucleotide polymorphisms (SNPs) related to anemia in SCD and enabling better prediction of the severity of anemia that the individual will experience which enable better preventive treatment. Quality control (QC) of the Genome Wide Association (GWAS) data available for our cohort and association between anemia and the genotype data were performed using PLINK software and will be presented. Preliminary results show the SNPs associated with anemia are present at chromosome 3, 7 and 12. The details of the SNPs and genes linked to anemia in SCD individuals will be presented. Imputation and replication of the GWAS data is in progress.

Presentation 09: Phylogenetic and structural analysis of the PX domain containing PhosphoInositide (PI) binding proteins in Kinetoplastida

Keywords: Phosphoinositide binding proteins PX domain Kinetoplastida Phylogenetics
  • Marina V. Petsana, University of Thessaly, Dept. of Computer Science and Biomedical Informatics, Greece
  • Ahmed F. Roumia, University of Thessaly, Dept. of Computer Science and Biomedical Informatics, Greece
  • Pantelis G. Bagos, University of Thessaly, Dept. of Computer Science and Biomedical Informatics, Greece
  • Haralabia Boleti, Hellenic Pasteur Institute, Greece
  • Georgia G. Braliou, University of Thessaly, Dept. of Computer Science and Biomedical Informatics, Greece

Short Abstract: The phosphoinositide (PI) binding proteins play key roles in membrane trafficking by regulating phosphoinositide signaling and metabolism. A characteristic feature of these proteins is the PX domain (phox), which interacts directly with the PIs of the cellular membranes. Although these proteins are well characterized in higher eukaryotes, little is known about their homologues in protozoan parasites. Since parasites undergo transition stages involving multiple membrane changes until they finally become infectious, PI-binding proteins emerge as promising pharmacological targets to combat parasitic diseases. Using the pHMM of the PX domain (PF00787) from Pfam v. 32.1, the Kinetoplastida proteomes from Uniprot (Release 2020_04) and the HMMER v. 3.3.1 tool in Linux environment, we retrieved 137 PX domain containing proteins from 33 Kinetoplastida proteomes and the domain architectures for 131 of them. Sequence alignment and phylogenetic analysis of these PI-binding proteins using ClustalW and iTOL tools, helped understand their evolutionary relationship and define subfamilies. Clustering according to their domain architecture supported the existence of four main protein groups that besides PX, contain Vps5/BAR or different protein kinase domains. Moreover, comparison of the sequence motifs of the Kinetoplastida defining feature (core residues, within helices a1 through a2, of the PX domain that bind to Ptdlns(3)P), with that of humans, revealed high conservation of this structure. Our approach provides an efficient integrated methodology with evolutionary and structural insights to understand parasites’ transition stages so as to rationally design strategies for novel modulators of their life cycle with therapeutic potential.

Presentation 10: Refining a methodology for metaproteomic microbiome research

Keywords: Metaproteomics Mycobiome Mass Spectrometry Protein databases Taxonomic classification
  • Tamlyn Gangiah, University of Cape Town, South Africa
  • Matthys Potgieter, University of Cape Town, South Africa
  • Lindi Masson, Burnet Institute, Australia
  • Nicola Mulder, University of Cape Town, South Africa

Short Abstract: The microbiome comprises of a myriad of different microorganisms and creates a challenge for the taxonomic classification of mass spectrometry data. Accurate peptide identification requires representative sequence databases, which can significantly improve taxonomic and functional annotation results. However, mycobiome research is relatively new, and fungal databases are still in their infancy, with less than 1% of fungal species represented. Accordingly, metaproteomic methods are not optimized for fungal research. Thus, an improved method is needed to increase the sensitivity of analyses to obtain the maximum number of fungal hits during taxonomic classification. Our proposed approach to solve this challenge involved the manual curation of a fungal pan proteome database. We validated our Pan Proteome database against the results of four other databases, including a manually curated fungal Proteome sequence database, the Uniprot Swissprot database, and two databases created using the Metanovo software pipeline. The Pan Proteome database achieved a comparable MS/MS identification rate with a high percentage of identified peptides shared in common, indicating a high sensitivity and specificity using this method. Taxonomic analysis with the Pan Proteome database yielded more fungal assignments and Candida assignments in comparison to other databases. Therefore, the following search strategy to increase the number of identified fungal peptides is proposed; to use a selection of fungal pan proteomes known to be present in the microbiome of interest concatenated to a Metanovo fasta file as a database. This approach targets the problem of impairing peptide identification when the search sequence space becomes too large.

International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube