QUANTITATIVE STRUCTURE-TOXICITY RELATIONSHIP (QSTR) STUDY OF POLYCHLORINATED DIBENZOFURANS USING QUANTUM CHEMICAL DESCRIPTORSHTML Full Text
QUANTITATIVE STRUCTURE-TOXICITY RELATIONSHIP (QSTR) STUDY OF POLYCHLORINATED DIBENZOFURANS USING QUANTUM CHEMICAL DESCRIPTORS
Sabitu Babatunde Olasupo * 1, Adamu Uzairu 2 and Balarabe Sagagi 1
Department of Chemistry 1, Kano University of Science and Technology, Wudil Kano, Nigeria.
Department of Chemistry 2, Ahmadu Bello University Zaria, Nigeria.
ABSTRACT: One of the important aspect of modern toxicology research is the prediction of toxicity of environmental pollutants from their molecular structure A Quantitative Structure Toxicity Relationship (QSTR) study was applied to a dataset of 35 polychlorinated dibenzofurans (PCDFs) in order to investigate the relationship between toxicities of the compounds and their structures. The molecular descriptors were obtained by Density Functional Theory (DFT) (B3LYP/6-31G*) level of calculation. The QSTR model was built using Genetic Function Algorithm (GFA) method. The model with the best statistical significance (N = 24, Friedman LOF = 0.361, R2 = 0.963, R2adj. = 0.955 R2cv = 0.889, R2pred= 0.8286, P95% ˂ 0.05) was selected. The accuracy of the model was evaluated through Leave one out (LOOV) cross-validation, external validation using test set molecules, Y-randomization and applicability domain techniques. The prediction results are expected to be useful in predicting and identify structural features responsible for toxicity of the chemicals and other congeneric compounds that fall within the model’s applicability domain.
Keywords: GFA, DFT, PCDFs, QSTR, Toxicity, Descriptors.
INTRODUCTION: A quantitative risk assessment becomes increasingly important in the modern society and is slowly incorporated into legislation of different countries. For instance, the European Union (EU) has introduced the Registration, Evaluation and Authorization of Chemicals (REACH) program for assessment of human and environmental risk of all chemicals that are produced or imported in the amount greater than 1 ton per year.
One of the important aspects of modern toxicology research is the prediction of toxicity of environmental pollutants from their molecular structure. The potential toxicity of compounds could be assessed on the basis of a wide variety of physicochemical and biological properties 1. These physicochemical and biological properties of molecules constitute their molecular descriptors.
Polychlorinated dibenzofurans (PCDFs) are polychlorinated aromatic compounds that represent a group of environmental contaminants known by their ubiquitous distribution, resistance to biological and chemical degradation, high toxicity and bioaccumulation 2. They can have a significant impact on the health and well-being of human and animals 2.
Some of the health effect at long exposure to these compounds includes liver enlargement and lesions, immune-toxicity, a wasting syndrome, spleen atrophy, carcinogenesis, endocrine disruption, and extreme cases, death 3. In addition to these, several persistent organic pollutants (e.g. PCDFs) are suspected to contribute to the increasing prevalence and risk of type 2 diabetes 4.
Polychlorinated dibenzofurans (PCDFs) are mainly formed or produced from as by products of various industrial processes and incomplete combustion of wastes such as medical or municipal wastes incineration, including burning of many materials that contain chlorine and polychlorinated chemicals 5-6.
Quantitative structure activities relationship (QSAR/QSTR) as an important area of chemo metrics has been the subject of a series of investigations 7. In order to ensure a safer environments and quicker estimation of the environmental behaviours of PCDFs, quantitative structure–toxicity relationship (QSTR) models, which correlate and predict toxicity data of compounds from their molecular structural descriptors have been developed over the years, providing valuable approach in research into the toxicity of compounds without necessarily embarking on the conventional laborious, time consuming and expensive experiments. QSTR has been widely applied to evaluate and predict toxicity of chemicals 8. Previous studies have shown that reliable QSTR models are not only applied to predict toxicity and provide basic data to risk assessment, but also used to explain the toxicity mechanisms 9. The alternative hypothesis to this study includes:
The magnitude of the observed toxicity log (1/EC50) of Polychlorinated dibenzofurans (PCDFs) are direct function of the empirical property (ies) or the theoretical parameter(s) which makes the descriptor of the total chemical structure of the compounds under investigation. The null hypothesis to this research includes;
The observed toxicity log (1/EC50) of Polychlorinated dibenzofurans (PCDFs) is independent of the descriptors of their total chemical structures. The aim of this study is to build robust and rational Genetic function approximation (GFA) based QSTR models to predict the toxicity of Polychlorinated dibenzofurans (PCDFs) by exploring the correlations between the experimental log (1/EC50) of the compounds and their calculated molecular descriptors. It is envisaged that the information in this study would provide a fast, economical, more environmentally friendly and less time consuming techniques of accessing the toxicity of Polychlorinated dibenzofurans (PCDFs) and other related toxic Polychlorinated aromatic chemicals and Organic pollutants that could endanger our environment.
MATERIALS AND METHODS: The materials used in this study include; Dell® computer system (Intel Pentium), 4.80 GHz processor, 8GB RAM size on Microsoft windows 7 Ultimate operating system, Spartan 14 V.1.1.0, Chem Draw Ultra 12.0.V, Padel descriptor tool kit and Microsoft office Excel 2013 version, Material Studio (modeling and simulation software) version 7.0, and Dataset Division GUI v 1.2 software. The various steps invoked for the QSTR study are presented in the flowchart in Fig. 1.
FIG. 1: STEPS INVOKED IN THE QSAR STUDY
Data collection: A data set of Polychlorinated dibenzofurans (35 PCDFs) used for the QSTR analysis was selected from the literature 10. The Chemical structures and experimental log (1/EC50) values for studied compounds are represented in Table 1.
TABLE 1: CHEMICAL STRUCTURES AND EXPERIMENTAL LOG (1/EC50) VALUES OF (PCDFS)
Molecular optimization and descriptors calculation: Optimization is the process of finding the equilibrium or lowest energy geometry of molecules. The chemical structure of each compound was drawn with Chem Draw ultra version 11 12.02 module of the program and subsequently imported into Wave function program Spartan ‘’14’’ version 12 1.2.2 for structural minimization. The geometries of all the compounds (35 PCDFs) were optimized by means of Density functional theory (DFT) using B3LYP level of theory and 6-31G* as the basis set. The molecular descriptors were calculated by using paDel descriptor tool kit and Spartan “14” software. The most significant descriptors were identified using the Genetic Function Approximation (GFA) algorithm. Molecular descriptors simply refer to arithmetical values that describe properties of molecules obtained from a well-defined algorithm or experimental procedure 13. The various 0D, 1D, 2D and 3D descriptors were calculated.
Data set division into training and test set: The training set comprises of molecules used in model development while the test set is made up of molecules not used in building the model that are used in the external validation of the model i.e. evaluation of its prediction abilities. Dataset Division GUI v 1.2 software was used to divide the data set of the studied compounds into a training set of 24 PCDFs (70%) and a prediction set (test set) of 11 PCDFs (30%) respectively.
Genetic function algorithm and model building: In this study, a statistical technique of analysis by Genetic function approximation algorithm was employed to build the models. Genetic function approximation (GFA) algorithm is a search method to find exact or approximation solution to optimization and search problems which is based on the principles of Darwinian evolution 14. A peculiar features of Genetic function approximation (GFA) algorithm is that it generate a population of equations rather than a single equation as do most other statistical methods. The range of variations in this population gives added information on the quality of fit and importance of the descriptors 15. The fitness function or Lack of Fit (LOF) used to estimate the quality of the model here was the leave one out gross validated correlation coefficient (Q2LOO) and is calculated by equation 2.1.
Where c is the number of basic functions, d is the smoothing parameter, M is the number of samples in the training set, LSE is the least square error and P is the number of features contained in all basis functions 16.
Validation of developed model: The predictive ability of the developed QSTR model was evaluated using both internal and external statistical validation parameters. The validation parameters were comparing with the minimum recommended value for a generally acceptable QSAR/QSTR model proposed by Revinchandran et al 17 shown in Table 2.
TABLE 2: VALIDATION PARAMETERS FOR A GENERALLY ACCEPTABLE QSAR MODEL
|1||R2||Coefficient of determination||≥ 0.6|
|2||Q2||Gross validation coefficient||> 0.5|
|3||R2 pred.||Coefficient of determination for external test set||≥ 0.6|
|4||R2 adj||Adjusted square correlation coefficient||> 0.5|
|5||P (95%)||Confidence interval at 95%||≤ 0.05|
|6||Next test set||Minimum number of extend test set||≥ 5|
|7||R2 – Q2||Difference between R2 and Q2||≤ 0.3|
Source: Ravinchandran et al 17.
Applicability Domain (AD): The model was further validated by applying the Williams plot, the plot of the standardized residuals versus the leverage as shown in Fig. 2. This was exploited to visualize the applicability domain (AD) 8. (Leverage indicates a compound’s distance from the centroid of X. The leverage of a compound in the original space is defined as;
Where is the descriptor vector of the considered compound and X is the descriptor matrix derived from the training set descriptor values.
The warning leverage (h*) is defined as:
Where n = number of training compounds, p= number of predictor variables.
TABLE 5: MODELS GENERATED BY GFA
|1.||Y = 67.283 * X273
+ 30.319* X388
+ 71.368 * X646
- 31.859 * X850
|X273 : MATS2e
X388 : SpDiam_Dzv
X646 : SHBa
X850 : XLogP
|2.||Y = 70.468* X273
+ 70.901 * X646
+ 27.833 * X810
- 25.951 * X850
X646 : SHBa
X810 : SpDiam_D
|3.||Y = 12.864* X55
+ 55.475 * X68
+ 44.384 * X273
- 21.059 * X665
|X55 : BD : ATS5i
X68 : AATS0m
X273 : MATS2e
X665 : maxwHBa
|4.||Y = 404.661* X109
+ 49.204 * X273
+ 415.328* X431
- 17.126 * X665
|X109 : DF : AATS5i
X273 : MATS2e
X431 : SM1_Dzi
X665 : maxwHBa
|5.||Y = 39.983631205 * X97
+ 4.055892341 * X235
+ 43.096234880 * X273
- 19.022551018 * X665
|X97 : AATS2p
X273 : MATS2e
X665 : maxwHBa
TABLE 6: STATISTICAL PARAMETERS OF THE BEST MODEL
Based on statistical significance, model 1 is chosen as the best model
TABLE 7: A BRIEF DESCRIPTION OF THE SELECTED DESCRIPTORS OF THE BEST MODEL-1
|Descriptor||Regression coefficient||Description||Descriptor Class|
|MATS2e||67.283||Moran autocorrelation - lag 2 / weighted by Sanderson electro negativities||Autocorrelation Descriptor
|30.319||Spectral diameter from Barysz matrix / weighted by van der Waals volumes||Barysz Matrix Descriptor|
|SHBa||71.368||Sum of E-States for (strong) hydrogen bond acceptors||Electro topological State Atom Type Descriptor|
TABLE 8: COMPARISON OF EXPERIMENTAL LOG (1/EC50) AND PREDICTED LOG (1/EC50) OF TRAINING SET MOLECULES BY MODEL 1
|Predicted log (1/EC50)||Residual log (1/EC50)|
FIG. 2: PLOT OF PREDICTED (TRAINING & TEST SETS) VERSUS THE OBSERVED LOG (1/EC50) VALUES
FIG. 3: SCATTERED PLOT OF THE EXPERIMENTAL AND PREDICTED LOG (1/EC50) TEST SET MOLECULES
FIG. 4: WILLIAMS PLOT OF THE MODEL 1
TABLE 9: THE RESULTS OF Y-RANDOMIZATION OF THE TRAINING SET
TABLE 10: RANDOM MODELS PARAMETERS
|Average R :||0.370|
|Average R^2 :||0.162|
|Average Q^2 :||-16.107|
DISCUSSIONS: Model 1 gives the best GFA derived QSTR model for predict the p1/EC50 of PCDFs. The result of the GFA QSTR model is in conformity with the standard shown in Table 2 as N = 24, Friedman LOF = 0.361, R2 = 0.963, R2adj. = 0.955 R2cv = 0.889, R2pred= 0.8286, P95% ˂ 0.05. This confirms the robustness of the model. Fig. 2 reveals the agreement between the experimental and the predicted values of p1/EC50 of molecules in the test set. The high Linearity of this plot indicates a sound agreement between the experimental and predicted values indicative of the high internal accuracy of the model. Likewise, Fig. 3 gives a combine plot of the experimental and the predicted values of p1/EC50 training and test set molecules.
The high linearity of the plot is indicative of an excellent external predictive power of the model. The comparison of experimental and predicted p1/EC50 of the compounds is presented in Table 8. The predictability of model 1 is evidenced by the low residual values observed in the Table. The P-value of the optimization model at 95% confidence level shown has α value ˂ 0.05. This reveals that the alternative hypothesis that the magnitude of the observed toxicity of PCDFs is a direct function of the descriptors of their total chemical structures takes preference over the null hypothesis which states otherwise.
The statistical significance of the relationship between the toxicity of PCDFs and their molecular descriptors was further demonstrated by Y-randomization procedure. The results of Y-randomization test as well as the random models parameters are shown in Tables 9 and 10 respectively. The low R2 and Q2 values obtained shows that the optimization model is robust and was not obtained due to a chance correlation. The fact that the value of cR2p of the model is > 0.5 as reported in the Table 10 is a good confirmation that the model is robust and very reliable 18.
Since the model 1 cannot predict the toxicity of all compounds in the universe, its applicability domain was determined using William’s plot shown in Fig. 4. All the compound in the test set fall inside the domain of the GFA model (the warning leverage h* =0.40). There are only two compounds in the training set which have the leverage higher than the warning h* value as shown in the plot, thus they can be regarded as structural outliers. This implies that the models can be successfully applied to this series of Polychlorinated dibenzofurans. The few compounds with higher leverage than h* are most likely to be structural outliers.
Significance of the descriptors in the model 1: The positive coefficient of the descriptors; MATS2e, SpDiam_Dzv, SHBa reveal that the toxicity of PCDFs increases with increase in the values of these descriptors. Thus, the higher the values of these descriptors in a PCDFs, the more the toxicity of the molecule and vice versa. Also, the negative coefficient of XLogP descriptor as an indication that the value of this descriptor in PCDFs varies inversely with its toxicity. The percentage contribution of each descriptor in the model include; 33.50% (MATS2e), 15.1% (SpDiam_Dzv), 35.5% (SHBa), 15.9% (XLogP). Judging from the percentage contribution of each descriptor in the model, MATS2e and SHBa descriptors were found to predominantly influence the observed toxicity of PCDFs.
MATS2e (Moran autocorrelation / weighted by Sanderson electro-negativities) is a descriptor of electro-negativity of the molecules. Its positive coefficient in the model reveals that the toxicity of PCDFs varies directly with the value of this descriptor in the molecule. This is consonance with the findings of Emilia et al. (2011) 19 in which the observed toxicity of aromatic nitro-derivatives was influence by a descriptor of molecular electro-negativity, X1.
SpDiam_Dzv (Spectral diameter from Barysz matrix weighted by van der Waals volumes) is a descriptor of molecular size. Its positive coefficient in the model reveals that the toxicity of PCDFs varies directly with the value of this descriptor in the molecule. This is agreement with the findings Falandysz et al. (2001) 20 in which vander waal’s volume (size descriptor) of dioxins has a pronounce influence on the observed toxicity of the molecules. Also in agreement is the result of the QSTR modelling by Hassan et al. (2016) 21 in which ETA-dAlpha-B (a measure of electronic features of the molecules relative to molecular size) was found to influence the toxicity of the studied dioxins. The increase in toxicity with increase in molecular size may be due to the possibility of the molecule been largely confined to the plasma compartment because of their too large size affecting its distribution via out the body.
XLogP is a descriptor of lipophilicity of molecules. Its negative coefficient in the model is an indication that the value of this descriptor varies inversely with the toxicity of the molecules and vice versa.
SHBa (Sum of E-States for (strong) hydrogen bond acceptors), just as the name implies is a descriptor of hydrogen bond acceptor ability of a molecule. Its positive coefficient in the model reveals that the toxicity of PCDFs varies directly with the value of this descriptor in the molecule. This is in agreement with the findings of Lipinski et al. (2001) 22 and van de et al. (2003) 23. The increase in toxicity of PCDFs with increase in values of hydrogen bond acceptor descriptors may be due to the possibility of this descriptor eliciting some interaction of the toxic molecules with biological macromolecules such as enzymes or cellular receptors.
CONCLUSION: In this study, QSTR modelling for the toxicity of Polychlorinated dibenzofurans (35 PCDFs) to explore the structural features that are responsible for its toxicity was successfully performed using Genetic Function Approximation (GFA) approach at B3LYP level of theory and 6-31G* as basis set . The observed log (1/EC50) of the Polychlorinated dibenzofurans (PCDFs) was found to be predominantly influenced by MATS2e, SpDiam_Dzv, SHBa and XLogP descriptors. The robustness, reliability, stability and applicability of the QSTR models was established by internal and external validation techniques (N = 24, Friedman LOF = 0.361, R2 = 0.963, R2adj. = 0.955 R2cv = 0.889, R2pred= 0.8286, P95% ˂ 0.05.).
It is believed that the information in this model will provide a fast, economical and more environmentally friendly techniques of assessing the toxicity of Polychlorinated dibenzofurans (PCDFs) and other related Polychlorinated aromatic toxic chemicals/ pollutants which are being constantly released into our environment owing to ever increasing industrial activities and incomplete combustion of various processes such as medical and domestic wastes incineration.
ACKNOWLEDGEMENT: The authors appreciate the support from Mrs Olasupo Rasheedat Titilayo. We also acknowledge the valuable assistance of Mr David Ebuka Arthur, Samuel Hassan, Philip Ameji and Pharm. Olaniran Olakunle Daniel.
CONFLICT OF INTEREST: There are no conflicts of interest.
- C. Basak, G.D. Grunwald, B.D. Gute, K. Balasubramanian, D. Opitz, J. Chem. Inf. Comput sci. 2013, 40, 885.
- Tysklind, K. Lundgreen, C. Rappe, L. Eriksson, M. Jonsson, Environ toxicol chem. 2013, 12, 659.
- Safe, L. Safe, M. Mullin, Polychlorinated Biphenyls (PCBs): Springer-Verlag: Berlin. 2015,1.
- Ohura, M. Morita, R. Kuruto-Niwa, T. Amagai, H. Sakakibara, K. Shimo, Environ. Toxicol. 2010, 25,180.
- Chu, M. Zheng, X. Xu: Characterization of the combustion products of polyethylene, Chemosphere. 2015, 39, 1497-1512
- M. Hay, L.L. Aylward: Dioxin risks in perspective: post, present and future, Regul.Toxico Pharmcol. 2015, 37, 202-217.
- X. Li, C. Liu, L. Zhang, Y. Liping, J. Zhao, H. Wu; Environmental toxicology and pharmacology. 2016, 32, 278-485.
- Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY,Perkins R, Roberts DW, Schultz TW, Stanton DT, van de Sandt JJM, Tong WD, Veith
- G, Yang CH, Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships-The report and recommendations of ECVAM Workshop 52. Atla-Alternatives to Laboratory Animals. 2015, 33:155-173.
- Ashek, C. Lee , H. Park , S.J. Cho, 3D QSAR studies of dioxins and dioxin-like compounds using CoMFA and CoMSIA Chemosphere. 2013, 65,521-529.
- G. Mekenyan, D.J. Veith, G.T.A Ankley, QSAR evaluation of Ah receptor bonding of halogenated aromatic Exnobiotics. Environ Health Perpect. 2013, 104, 1302-1310.
- CS Chem3D Ultra Cambridge soft corporation, Cambridge USA, 2014
- WAREFUNCTION, Inc. Spartai 14 version 1.1.2, Irvine, California, USA. 2013.
- P. Ameji, A. Uzairu, S. O. Idris. J. Comput. Methods Mol. Des. 2015, 5, 120.
- Holland, University of Michigan Press, Adaptation in Natural and Artificial systems, 1975/1992.
- W Wu, C. Zhang, W. ling, Q. chen, X. Guo, Y. Qian, PLOS ONE, 2015, 10, 3.
- R Kunal; P P Roy; S Paul; I Mitra, Molecules. 2009 , 14, 1660-1701
- Ravinchandran, V.; Rajak, H.; Jain, A.; Sivadasan, S.; Varghese, C.P.; Kishore-Agrawal, P. Int J. of Design and Discovery. 2011, 2, 511-519
- Ambure, R.B. Aher, A. Gajewicz, T. Puzyn, K. Roy, Chemometrics and Intelligent Laboratory Systems. 2015, 147, 1-13.
- Emilia Amzoiu, Paul Gabriel Anoaica, Costinel I. Lepădatu. QSAR Study of Toxicity of Aromatic nitro derivatives using the electronegativity of Omo/Umo States as Fingerprint Descriptors. Revue Roumaine De Chimie. 2011, 56(7), 711-716.
- Falandysz J., T. Puzyn1, B. Szymanowska1, M. Kawano, M. Markuszewski, R. Kaliszan, P. Skurski, J. Błażejowski4, T. Wakimoto, Thermodynamic and Physico-Chemical Descriptors of Chloronaphthalenes: An Attempt to Select Features Explaining Environmental Behaviour and Specific Toxic Effects of These Compounds. Polish Journal of Environmental Studies.2001, 10 (4); 217-235.
- Hassan Samuel; Adamu Uzairu; Paul Andrew Mamza; Okunola Oluwale Joshua. Genetic Functional Algorithm Prediction of Toxicity of some Polychlorinated Dioxins using DFT and Semi-empirical Calculated Molecular Descriptors. International Journal of Pharma Sciences and Research, 2016, 7(3); 114-125
- Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J., Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv DrugDeliv Rev. 2001, 46, 3-26.
- Van de Waterbeemd, H. and Gifford, E., ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov. 2003, 2, 192-204.
How to cite this article:
Olasupo SB, Uzairu A and Sagagi B: Quantitative structure-toxicity relationship (QSTR) study of polychlorinated dibenzofurans using quantum chemical descriptors. Int J Life Sci Rev. 2016; 2(12) 175-86:.doi:10.13040/IJPSR.0975-8232.IJLSR.2(12).175-86.
All © 2015 are reserved by International Journal of Life Sciences and Review. This Journal licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Sabitu Babatunde Olasupo * , Adamu Uzairu and Balarabe Sagagi
Department of Chemistry, Kano University of Science and Technology, Wudil Kano, Nigeria
20 August, 2016
15 November, 2016
11 December, 2016
31 December, 2016