CONFERENCE SPONSORS

GOLD LEVEL:







TRAVEL FELLOWSHIPS:




ISCB Africa ASBCB Conference on Bioinformatics 2017

SESSION 1: H3ABioNet highlights
Oral Presentation Abstracts


Variants calling optimization: parameter sweep of the GATK best practices pipeline

Presenter:
Azza Ahmed
University of Khartoum

Additional authors:

Gloria Rendon
University of Illinois Urbana-Champaign
Institute for Genomic Biology

Liudmila Sergeevna Mainzer
University of Illinois Urbana-Champaign
Institute for Genomic Biology, National Center for Supercomputing Applications

Victor Jongeneel
University of Illinois Urbana-Champaign
Institute for Genomic Biology

Faisal M. Fadlelmola
University of Khartoum
Centre for Bioinformatics and Systems Biology

The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) had modified the landscape of human genetics. Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes. Exomes are of interest as they are helping us understand high-penetrance allelic variation and its relationship to phenotype.

Variant calling is a study of genomic sequences differences, between some samples of interest and the reference genome; with the purpose of aiding understanding disease (or phenotype mechanism) and ultimately designing optimal treatment targets (i.e. personalized medicine). Typically, this involves many wet lab assays and procedures for preparing the biological samples and intensive computational processing via many tools and software. Errors can creep into the analysis from any of these aspects.

When carried out in large cohorts, errors are exacerbated, and the variants observed at the level of the individual are lost in joint genotyping. Besides experimental wet lab errors, the called variants are subject to biases due to the choice of software, configuration of the analysis pipeline, and individual parameters of each tool used. Also intended as a pilot phase of a collaboration with Mayo Clinic in Florida, USA, this talk provides insights into the effect of the parameter configurations in a variant calling pipeline following GATK best practices from a mathematical point of view, along with experimental results, with the objective of identifying optimization targets in such a set up. Computational challenges relating to running the pipeline in this context are also highlighted.


Comparative analysis of existing pipelines for assessment of arbuscular mycorrhizal fungal biodiversity in natural and commercial Rooibos (Aspalathus linearis) and Honeybush (Cyclopia intermedia) soil samples

Presenter:
Herna de Wit
Rhodes University

A mutually beneficial association exists between the fungal phylum Glomeromycota and higher plant roots. Arbuscular mycorrhizal (AM) fungi are important in performing various ecological functions in exchange for host photosynthetic carbon contributing to host plant’s fitness. AM fungi are unculturable and although their many benefits have been well documented, their biodiversity is not studied in detail.

Two types of plant hosts were studied: Rooibos (Aspalathus linearis) and Honeybush (Cyclopia intermedia). Both are popular teas in South Africa and have a growing worldwide market due to their medicinal properties. Ribosomal sequences (18S), were generated using 454 pyrosequencing at Rhodes University. Soil samples were collected from both natural and commercial plant populations. The sequence data required rationalization with the identification of a suitable pipeline in order to determine changes in AM fungal biodiversity.

As AM fungi are sensitive to many agricultural practices, a higher diversity and abundance of AM fungi were expected in soils from natural rather than in commercial soil samples. None of the bioinformatics pipelines were able to provide suitable AM fungal biodiversity data, due to lack of available databases. A suitable pipeline, to effectively analyse 18S AM fungal data needs to be designed in order to further investigate changes in biodiversity of these important soil fungi.


A Metagenomic Approach to the Characterization of the Microbiomes of Sickle Cell Disease-associated Leg Ulcers

Presenter:
Deborah Fasesan
National Biotechnology Development Agency

Additional authors:

Jessica Holmes
University of Illinois at Urbana-Champaign
High Performance Biological Computing Group

Jenny Zadeh
University of Illinois at Urbana-Champaign
High Performance Biological Computing Group

Ayodele Ogunkeyede
University of Ilorin
Department of Surgery

Gladys Falusi
University of Ibadan
Institute for Advanced Medical Research and Training

Victor Jongeneel
University of Illinois at Urbana-Champaign
High Performance Biological Computing Group

Oyekanmi Nash
National Biotechnology Development Agency
Centre for Genomics Research and Innovation

Scientific improvements in microbial evaluation have led to more accurate analysis of microbiomes. Such improvements include metagenomics- a field which explores microorganisms in their natural habitat without the need for conventional plate cultivation. Microbiome studies involve the use of 16S rRNA analysis for bacterial composition and can also include examination of ITS regions for the determination of fungal community within an environment.<br>In this study, the microbiomes of leg ulcers of sickle cell disease subjects were determined in the outcome of compression therapy treatment (CT). DNA from wound samples were extracted, amplified using appropriate bacterial and fungal primers for selected regions (V3-V5 and V4 segments of the 16s rRNA bacterial region and ITS 3-4 region of the fungal genomes) and libraries were constructed for paired end sequencing. Generated sequences were then subjected to microbiome analysis using the metagenomic pipeline QIIME (Quantitative Insights Into Microbial Ecology).

Bacterial taxonomic classification indicated a decline in the occurrence of some pathogenic genera after four weeks of CT. Although, Stenotrophomonas, Bradyrhizobium and Burkholderia were seen in samples collected after treatment, results showed significant reduction in wound sizes after treatment. Fungal taxonomic classification revealed the dominance of Aspergillus, Candida and Wickerhamomyces but with the occurrence of plant like material seen in samples before treatment. This study highlights the possible positive contribution of CT on sickle cell leg ulcers. The association between these microorganisms, treatment method applied, as well as the effect of the combination of antibiotics and CT on sickle cell leg ulcers needs further determination.

A Uganda innovation hub for deep analytics, disease surveillance, discovery science and translation . In recent years, health data has seen a dramatic increase in volume as well as complexity and diversity; mining this vast pool of complex health data has the potential among others to provide better insights into disease outbreak patterns, inform therapy target discovery and support health policy by improving our understanding of health and disease at a population level. The generation of data resources within the continent has created the need for initiatives supporting the expansion of expertise on health data management and analytics to ensure that Africa can fully benefit from health data.

The MRC/UVRI Uganda Medical Informatics centre (UMIC) was created as an initiative to achieve the broader development goals by harnessing the data and analytics revolution in Africa; in this context, the MRC/UVRI UMIC was set forth to capitalise on the potential benefits of large-scale health data by combining expertise on genomics, analytics, High Performance Compute (HPC) and DevOps with the routine use of high-end HPC resources capable of handling the current complexity and size of health care data in Africa.

Under this framework the UMIC supports major Pan-African and local research initiatives in medical genomics and bioinformatics such as The Genome Diversity in Africa Project (GDAP), TrypanoGEN (TPG), The PANGEA-HIV Consortium, The International AIDS Vaccine Initiative and The MRC/UVRI Uganda Research Unit on AIDS (MRC/UVRI).

As Global Health paradigms begin to shift, moving the focus from treatment to prevention, the UMIC intends to provide the tools to impact both foreign as well as domestic policy development.

Comparative analysis of existing pipelines for assessment of arbuscular mycorrhizal fungal biodiversity in natural and commercial Rooibos (Aspalathus linearis) and Honeybush (Cyclopia intermedia) soil samples.

A mutually beneficial association exists between the fungal phylum Glomeromycota and higher plant roots. Arbuscular mycorrhizal (AM) fungi are important in performing various ecological functions in exchange for host photosynthetic carbon contributing to host plant’s fitness. AM fungi are unculturable and although their many benefits have been well documented, their biodiversity is not studied in detail.<br><br>Two types of plant hosts were studied: Rooibos (Aspalathus linearis) and Honeybush (Cyclopia intermedia). Both are popular teas in South Africa and have a growing worldwide market due to their medicinal properties. Ribosomal sequences (18S), were generated using 454 pyrosequencing at Rhodes University. Soil samples were collected from both natural and commercial plant populations. The sequence data required rationalization with the identification of a suitable pipeline in order to determine changes in AM fungal biodiversity.

As AM fungi are sensitive to many agricultural practices, a higher diversity and abundance of AM fungi were expected in soils from natural rather than in commercial soil samples. None of the bioinformatics pipelines were able to provide suitable AM fungal biodiversity data, due to lack of available databases. A suitable pipeline, to effectively analyse 18S AM fungal data needs to be designed in order to further investigate changes in biodiversity of these important soil fungi.


Integrative analysis of cervical cancer multi-omics data: Towards novel biomarkers discovery

Presenter:
Somia Mohammed
Centre for Bioinformatics & Systems Biology

Additional authors:

Faisal Fadlelmola
Centre for Bioinformatics & Systems Biology (CBSB)

Alia Benkahla
Institute Pasteur of Tunis
bioMathematics and bioStatistics (BIMS)

Mohssin Abdalla
Faculty of Mathematical Sciences
Computer Science

Cervical cancer is the second most common type of cancer in women worldwide. Due to poor access to screening and treatment services, more than 90% of deaths occur in women living in low- and middle- income countries. Although a diagnostic tool (PapTest) is widely available, cervical cancer incidence still remains high worldwide, and especially in developing countries, attributed to a large extent to sensitivities of the Pap test and unavailability of test in developing countries.

In this study, multi-omics data analysis was evaluated by integrating array-CGH, SNP array and gene expression data in cervical cancer. A meta-analysis approach was adopted whereby cervical cancer multi-omics datasets from Gene Expression Omnibus (GEO), aCGH, gene expression and SNP array, were selected to set up the analysis. Each data set was analyzed separately by implementing R. The dataset were subject to statistical tests (e.g. ANOVA and T-test).

Chromosomal regions showed significant amplification in most of the samples in chromosomes 1,3 and 19 whereas losses were found in 11, 4 and 13. The aCGH data is searched for genes with a known regulatory role whose copy number is altered in the samples. The matched transcriptomics data is then examined to see if a gene's altered copy number is associated with a concurrent change in the gene's expression, thus adding weight to the argument that the gene may be contributing to cervical cancer. The list of genes that show strong aCGH/expression is subjected to various public databases (e.g. OMIM) to perform further in silico analysis and validation.


DREAM of Malaria: an open ecosystem to predict drug sensitivity in malaria parasites

Presenter:
Amel Ghouila
Institut Pasteur de Tunis

Additional authors:

Katrina Button-Simons
University of Notre Dame
Eck Institute for Global Health

Jean-Baka Domelevo Entfellner
University of the Western Cape
South African National Bioinformatics Institute

Sumir Panji
University of Cape Town
Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine

Geoffrey Siwo
IBM Research Africa
Johannesburg Lab

Sage Davis
University of Notre Dame
Eck Institute for Global Health

Faisal Fadlelmola
University of Khartoum
Centre for Bioinformatics and Systems Biology, Faculty of Science

The DREAM of Malaria Hackathon Participants:

Michael Ferdig
University of Notre Dame
Eck Institute for Global Health

Nicola Mulder
University of Cape Town
Computational Biology Division
Department of Integrative Biomedical Sciences
Institute of Infectious Disease and Molecular Medicine

Collaborative analysis challenges such as DREAM have brought considerable computational expertise to cancer and autoimmune disease research, showing that aggregating results across methods outperforms single methods. Unfortunately, even interdisciplinary collaborations often lack the resources necessary to employ multiple complex analyses. To encourage individual labs to create datasets for challenges; H3ABioNet, IBM Africa and the Ferdig lab (University of Notre Dame) teamed up to conduct a hackathon to explore whether the transcriptomic profile can be used to predict drug sensitivity in malaria parasites. The hackathon had two goals: prepare data for analysis and detect the presence of sufficient signal in the dataset to predict drug susceptibility.

Twenty-three researchers and Ph.D. students from across Africa participated in the hackathon in September, 2016. Three teams were formed with complementary areas of expertise and diverse skills. Initial exploration of the data identified unexpected issues. One team identified eight outliers originating from the same microarray which were then re-hybridized correcting the issue. Another team demonstrated that genome-wide differences existed between and among isolates and that intra-erythrocytic developmental cycle (IDC) staging was a significant source of biological variation.

Teams designed regression models (ensemble partial least squares, regularized regression, random forests, etc) to predict drug sensitivity from gene expression features and covariates. Some predictive models gave promising results (e.g. training MSE = 0.072 and test MSE = 0.21, with 32 features selected). Overall, these results demonstrate it is challenging, but possible to extract signal from this dataset to predict the drug sensitivity of malaria isolates.


Designing a course model for distance-based online bioinformatics training in Africa: the H3ABioNet experience

Presenter:
Shaun Aron
University of the Witwatersrand
Sydney Brenner Institute for Molecular Bioscience

Additional authors:

Kim Gurwitz
University of Cape Town
Computational Biology Division

Sumir Panji
University of Cape Town
Computational Biology Division

Suresh Maslamoney
University of Cape Town
Computational Biology Division

Pedro L. Fernandes
Instituto Gulbenkian de Ciência Bioinformatics pt::

David P. Judge
Independent Bioinformatics Training Specialist
Bioinformatics uk::

Amel Ghouila
Institut Pasteur de Tunis Laboratory Transmission, Control and Immunobiology of Infections tn::

Jean-Baka Domelevo Entfellner
University of the Western Cape South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development

Fatma Z. Guerfali
Institut Pasteur de Tunis Laboratory Transmission, Control and Immunobiology of Infections

Colleen Saunders
University of the Western Cape South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development

Ahmed Mansour
Zagazig University
Genetics Department

Samson P. Salifu
Kwame Nkrumah University of Science and Technology
Department of Biochemistry and Biotechnology/Kumasi Centre for Collaborative Research

Rehab Ahmed
University of Khartoum/Future University of Suda
Centre for Bioinformatics and Systems Biology

Ruben Cloete
University of the Western Cape
South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development

Jonathan Kayondo
Uganda Virus Research Institute Entomology

A distance-based Introduction to Bioinformatics course is run by H3ABioNet - a pan African Bioinformatics network for H3Africa. In its second iteration this year, the free-of-charge, 3-month, skills-based course teaches the basics of various bioinformatics analyses. It makes use of multiple education delivery methods, namely: face-to-face learning; distance learning; and open educational resources, in order to increase access across Africa. Local classrooms - 27 classrooms hosting roughly 600 participants, in total, across 12 African countries (2017) - are attended face-to-face for additional support where there is interaction with volunteer, teaching assistants and with peers. During these face-to-face sessions, classrooms watch open access, pre-recorded and downloaded lecture recordings, prepared by expert African bioinformatics trainers. Classrooms also sign in to a virtual classroom that links them all to each other and to the trainer during biweekly contact sessions. Further, course participants and volunteer local staff engage via online forums hosted on the course management platform. Additional features of the course this year, developed out of an extensive review of last year’s iteration, include: staff training at local classrooms; promoting previous year attendees as trainee teaching assistants; consolidation sessions; and encouraging engagement within and across classrooms as well as with local bioinformatics communities.


- top -