ISCB Africa ASBCB Conference on Bioinformatics 2017

SESSION 5: Bioinformatics of human genetics and population studies
Oral Presentation Abstracts

Structural and functional effects of nucleotide variation on the Tuberculosis drug metabolizing enzyme human arylamine N-acetyltransferase 1

Ruben Cloete
University of the Western Cape

Additional authors:

Wisdom Akurugu
University of the Western Cape, SANBI

Cedric Werely
Stellenbosch University
Division of Molecular Biology and Human Genetic

Paul van Helden
Stellenbosch University
Division of Molecular Biology and Human Genetics

The human arylamine N-acetyltransferase 1 (NAT1) enzyme plays a vital role in determining the duration of action of amine-containing drugs such as para-aminobenzoic acid (PABA) by influencing the balance between detoxification and metabolic activation of these drugs. Recently, four novel single nucleotide polymorphisms (SNPs) were identified within a South African mixed ancestry population. Modeling the effects of these SNPs within the structural protein was done using molecular dynamics simulations and stability predictions which indicated less thermodynamically stable protein structures containing E264K and V231G, while the N245I change showed a stabilizing effect. Principal component analysis indicated that two amino acid substitutions (E264K and V231G) occupied less conformational clusters of folded states as compared to the WT and were found to be destabilizing (may affect protein function). Furthermore, two of the four novel SNPs that result in amino acid changes: (V231G and N245I) were predicted by both SIFT and POLYPHEN-2 algorithms to affect NAT1 protein function, while two other SNPs that result in R242M and E264K substitutions showed contradictory results based on SIFT and POLYPHEN-2 analysis. In conclusion, structural methods were able to verify that two non-synonymous substitutions (E264K and V231G) can destabilize the protein structure, and are in agreement with mCSM predictions, and should therefore be experimentally tested for NAT1 activity. These findings could inform a strategy of incorporating genotypic data (i.e., functional SNP alleles) with phenotypic information (slow or fast acetylator) to better prescribe effective treatment using drugs metabolized by NAT1.

Chemical Descriptors Feature Selection Method Using Genetic Algorithms For Building Efficient QSAR Models

Mahmoud ElHefnawy
National Research Center

Additional authors:

Nada Elhussieny
Nile University
Center for informatics

Mohamed Fares
National Research Center, Hydrobiology

QSAR or quantitative structure activity relationship is a machine learning model used in drug discovery to predict the activity of new drugs based on the experimental data of other compounds on that drug. It is typically built using a selected set of structural features called chemical descriptors, however the total number of these descriptors can be of very large size to be used in an efficient model. The aim of this work is to explore different methods for eliminating some descriptors that may be redundant, and retain those only found to be strongly correlated with the desired attribute to be estimated. In this paper we used genetic algorithm to identify the best subset of descriptors and hence reduce computational time or running through estimating the activity using irrelevant input attributes. For Model Prediction and Validation we used the Support vector machine model, where the chemical descriptors are the Xs and the P is the categorized predicted activity. We had 189 features and 119 observations. Support vector machine, SVM, was trained using about 81 observations and the rest were used for validation. This model achieved results of 71% accuracy. This can be regarded acceptable for the size of the dataset and the features available.

Application of a prognostic correlate of TB disease risk in a screen-and-treat clinical trial to prevent TB disease

Stanley Kimbung
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine, Division of Immunology, Department of Pathology, University of Cape Town

Additional authors:

Adam Penn-Nicholson
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine
Division of Immunology
Department of Pathology
University of Cape Town

Kate Hadley
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine
Division of Immunology, Department of Pathology
University of Cape Town

Chris Hikuam
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine, Division of Immunology, Department of Pathology
University of Cape Town

Mark Hatherill
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine, Division of Immunology, Department of Pathology
University of Cape Town

Thomas Scriba
South African Tuberculosis Vaccine Initiative and Institute of Infectious Disease and Molecular Medicine, Division of Immunology, Department of Pathology, University of Cape Town

There is an urgent need for a test that can identify persons at highest risk of tuberculosis disease amongst the large reservoir of people with latent Mycobacterium tuberculosis infection. We previously discovered and validated a whole blood transcriptomic signature (consisting 58 primer-probes detecting 10 reference transcripts and 48 transcripts representing 16 interferon response genes) that prospectively identifies risk of incident tuberculosis disease amongst HIV uninfected persons, up to one year before disease onset.

Here we aimed to optimize the PCR assay and improve assay throughput. The signature was trimmed to conveniently accommodate the primer-probes in duplicate in a typical 96.96 Fluidigm gene expression platform. Computation of the signature score from PCR cycle threshold results was incorporated into an automated workflow with extensive quality control steps, including data integrity, concordance between qRT-PCR replicates and detection of genomic DNA.

The performance of the reduced COR signature, comprising 48 primer-probes representing 11 IFN response genes, was equivalent (AUC 0.835; 95% CI 0.77-0.83) to the original 58 primer-probe version (AUC 0.831; 95% CI 0.77-0.83). The automated workflow effectively excluded deviant assays that did not meet pre-defined quality thresholds and enhanced robustness, either on the level of assay batch, individual sample or transcript. <br>We have successfully applied the optimized a TB risk signature to >5000 participant samples in the CORTIS clinical trial – Correlate of Risk Targeted Intervention Study - aimed to evaluate diagnostic and prognostic COR performance, and test efficacy of short course preventative therapy of TB disease. Trial results will be available in 2018.

Chemoinformatics and in-silico drug development workflow: Enhancing "hits" to "drug candidates" optimization in drug discovery

Samuel Egieyeh
University of the Western Cape

In drug discovery, the journey from “hits” to “drug candidates” may be tedious, long and expensive. A high quality “drug candidate” must exhibit a balance of many properties: potency, physicochemical properties and safety. Therefore, “hit” to “lead” to “drug candidate” optimization requires rational selection of compounds with the highest chance of success and implementation of multi-parameter optimization methods. Here we present two types of in silico drug development workflows that can characterize hits from bioactivity screening, conduct rational hit selection and prioritization, and multi-parameter hit to lead optimization.

The first workflow consists of a series of nodes compiled and executed in KNIME (Konstanz Information Miner). This platform has a user-friendly graphical interface. The second workflow consists of Python scripts on an interactive Jupyter notebook. The Python libraries used are all open-source and freely downloadable chemoinformatics toolkits e.g. RDKit, and machine-learning toolkit e.g. scikit-learn, pandas, numpy and scipy.

These workflows were used to analyze antimalarial hit compounds from high-throughput screens (HTS) available in public bioactivity databases. The outputs from the workflows include, a prioritized ‘hit’ list for a confirmation assay, unique molecular scaffolds for virtual compound enumeration, structural-activity relationship, drug target prediction and binding/activity efficiency metrics for “hit to lead” optimization. Conclusions

The chemoinformatics and in-silico drug development workflow provide an opportunity for researchers in drug discovery to analyze and mine useful data from their in-vitro experiments in order to make rational and viable drug design decisions.

Common sequence variants affect molecular function more than rare variants

Burkhard Rost
de Technical University of Munich

Yannick Mahlich
TUM Informatics

Jonas Reeb
TUM Informatics

Maria Schelling
TUM Informatics

Max Hecht
TUM Informatics

Tjaart Andries Petrus De Beer
Basel University

Yana Bromberg
Rutgers University

Any two unrelated individuals differ by about 10,000 single amino acid variants (SAVs). Do these impact molecular function? Experimental answers cannot answer comprehensively, while state-of-the-art prediction methods can. We predicted the functional impacts of SAVs within human and for variants between human and other species. Several surprising results stood out. Firstly, four methods (CADD, PolyPhen-2, SIFT, and SNAP2) agreed within 10 percentage points on the percentage of rare SAVs predicted with effect. However, they differed substantially for the common SAVs: SNAP2 predicted, on average, more effect for common than for rare SAVs. Given the large ExAC data sets sampling 60,706 individuals, the differences were extremely significant (p-value < 2.2e-16). We provided evidence that SNAP2 might be closer to reality for common SAVs than the other methods, due to its different focus in development. Secondly, we predicted significantly higher fractions of SAVs with effect between healthy individuals than between species; the difference increased for more distantly related species. The same trends were maintained for subsets of only housekeeping proteins and when moving from exomes of 1,000 to 60,000 individuals. SAVs frozen at speciation might maintain protein function, while many variants within a species might bring about crucial changes, for better or worse.

- top -