1. Introduction
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), is one of the most devastating infectious diseases, which currently still has high mortality levels [1]. Despite the availability of treatments, 7 million new cases and 1.5 million deaths are the alarming figures recently reported by the World Health Organization (WHO) for 2018 [1]. The massive resurgence of multi- and extensively-drug resistant TB, together with the high susceptibility of HIV-infected persons to the disease, are current important concerns leading to an urgent demand for new and more effective antitubercular drugs [2,3,4,5]. One of the approaches being used for this purpose is the identification of new therapeutic uses (repurposing) for molecules that were already approved to treat a specific disease or were previously synthesized but were not found to have a clinical application [6]. This is the case of the fluoroquinolones gatifloxacin and moxifloxacin, marketed in 1999 for the treatment of respiratory tract infections, and which are presently the most valuable second-line anti-TB agents according to the WHO guidelines [7].
Cinnamic acid derivatives (CAD) have a century-old history as antitubercular agents [8]. However, this family of compounds was never fully explored for their antimicrobial activity against Mtb, and it was not until a decade ago that some studies were conducted to find novel CAD active against tuberculosis [8,9,10,11,12,13,14,15,16]. Noteworthy, trans-cinnamic acid was found to be bacteriostatic at 200 μg/mL against Mycobacterium smegmatis [9] and was reported to show synergism with some first-line antitubercular agents [9,10,12]. As the cinnamoyl scaffold is a privileged and important pharmacophore in medicinal chemistry, in recent years CAD have also attracted much attention due to their antitumoral, antioxidative, and antimalarial properties [17,18,19]. In this context, and following our experience in the establishment of biologically relevant quantitative structure–activity relationships (QSAR) [20,21,22,23], we have engaged in the setting up of a QSAR strategy to: (i) derive a statistically significant model to describe the antitubercular activity of CAD towards wild-type (wt) Mtb; and (ii) identify the most relevant properties that have a substantial effect on the antitubercular activity of those derivatives.
QSAR analysis is usually based on the assumption that compounds with similar structures are expected to exhibit similar properties and, therefore, changes in chemical structure are likely to be accompanied by proportional changes in biological activity. Although it is now widely known that this congeneric principle is not as universal as initially thought [24,25,26], it is still the basis behind many computational QSAR studies, from the time when Hansch established the very first QSAR model to predict chemical solubility up to these days [27]. Due to the current explosive growth of experimental data, mainly originated from high throughput screening campaigns, several QSAR methodologies have been called up to establish models involving large and complex data sets [28,29,30].
Differences among the various QSAR approaches depend mostly on the descriptors used to characterize the molecules and the methods used to establish relationships between input descriptor values and biological activities. The choice of a particular method depends mainly on the nature of the problem being addressed and on the final purpose of the analysis [22,23]. Linear methods are usually used if the main objective is to rationalize and/or interpret a given biological behavior, while nonlinear methods are more commonly employed if the main purpose is to accurately predict a property. However, non-linear methods are prone to overfitting, which occurs when the number of descriptors (ranging from hundreds to thousands) is much greater than the number of samples in the dataset (less than a hundred compounds is common). In this context, as we were handling a small modeling dataset, we chose a multiple linear regression (MLR) analysis, which is one of the most used linear methods to build up QSAR models and has been very profusely and successfully applied in the field of Medicinal Chemistry [31]. Additionally, MLR has the advantage of being easily interpretable, allowing a direct link between a given biological response and the set of molecular features, encoded by the descriptors, which are responsible for that response. In this work, we depict the details of the construction and validation of an MLR-based model to describe the antitubercular activity of a set of CAD and the analysis of the model’s descriptors.
2. Results and Discussion
A dataset of 54 CAD with known MIC values for the Mtb H37Rv strain was retrieved from ChEMBL [32]. Two different research groups performed the in vitro experiments [10,11,14,16]. However, a QSAR dataset should include biological activity values for all compounds, preferably measured using the same experimental methodology. As this was not the case, and as the quality of the input data has a large influence in the QSAR model quality, the set with the higher number of compounds was selected in this study [11,16]. Thus, the final data set comprised 29 compounds covering a wide range of MIC values from 0.26 to 1560.59 μM. A pool of 33 molecular descriptors (energetic, geometrical, structural, physicochemical, electronic) was generated using MMPro+ or ChemDraw for cLogP (see experimental section for details). The data set was then split into a training set (22 compounds) and an independent test set (7 compounds) as indicated in Table 1. Compounds were selected in such a way that the chemical domain in the two sets was not too dissimilar and that both the training and the test sets spanned, separately, the entire descriptor space occupied by both sets (Supplementary Materials Table S1). The training set was used to derive the model, whereas the test set was used to evaluate the predictive ability of the generated model.
To establish a relationship between the MIC value and the molecular characteristics of the training set compounds, we derived standard MLR-based QSAR models, testing all combinations of the 33 descriptors and retaining or disregarding descriptors according to rigorous statistical criteria. The intercorrelation matrix among descriptors was always checked; descriptors were considered not intercorrelated, and therefore non-redundant, if r2 between any two descriptors was below 0.5 and R2 of one against all others was below 0.8 [21,22,33]. Suspicious points were initially spotted by inspection of a plot of Ycalc vs. Yexp and then confirmed as outliers according to two criteria: the conventional measure |Ycalc − Yexp| > 2 SD, where SD stands for standard deviation of the fit, and a more refined measure known as the Cook’s distance (see experimental section for details) [22]. The identified outliers were compounds 10, 17, and 28, as seen in Table 1. The occurrence of outliers can happen for many reasons, such as: i) an error in the reported MIC value or in one or several derived descriptors’ values; ii) a mechanism of action different from that of the majority of the data set points; or iii) a non-representative sampling design, among others. Still, no plausible explanation could be assigned for the outlier behavior of compounds 10, 17, and 28. The best model, found by a forward stepwise procedure, upon removal of these three compounds from the training set, is shown in Table 2.
The model’s robustness (evaluated with training set compounds) was duly assessed, and the best found model fulfilled all the recommended criteria for internal validation (Table 3) such as a determination coefficient (R2) higher than 0.6, a leave-one-out (LOO) cross validation correlation coefficient (Q2LOO) higher than 0.6, the F-test value (F = 35) significant at 99% with its corresponding tabulated value, a small value for the standard deviation of the fit (SD = 0.357), and a significance level (SL) of each adjusted parameter higher than 95% [21,29,34,35]. In order to remove any possibility of attributing the quality of the statistics of these models to a chance correlation between the response variable and the descriptors, a Y-randomization test was performed on the developed QSAR model (Supplementary Materials Table S2). We observed a significant decrease in the quality of the randomized models when compared to the original non-randomized one, and therefore it seemed there was no chance correlation, as corroborated by the value of cR2p, quite above the 0.5 threshold value [36]. Chance correlation was also assessed by applying the Q under influence of K (QUIK) rule, a technique that measures the total correlation of a set of variables and that allows the rejection of models with high predictor collinearity, as proposed by Todeschini [37]. Thus, according to the QUIK rule, our model was not due to chance correlation, as the xy correlation was higher than the x correlation (Supplementary Materials Table S2). Finally, the absence of intercorrelation between descriptors in the best model (Supplementary Materials Table S3) also indicated that the quality of the statistics was not due to collinearity among descriptors.
Internal validation methods, as the ones mentioned above, are very useful to assess whether a model is stable and robust, and whether overfitting occurs. However, it is more and more commonly accepted that the predictive power of a QSAR model should be evaluated by verifying if the model is able to predict the behavior of chemicals not used on the training set. For that purpose, the predictive ability of the best-found model was analyzed using the test set. The external predictivity was confirmed as the established QSAR model fulfilled all the following recommended “classic” criteria for external validation (Table 3): Q2ext > 0.5, R2 > 0.6; (R2 – R02)/R2 < 0.1; 0.85 < m < 1.15, where R02 is the test set’s regression determination coefficient that goes through the origin, and m is the slope of the regression between the predicted and the experimental values. The parameter rm2, proposed by Roy and Paul [29,38,39], was also used to assess external validation of the model (Table 3). This stricter parameter penalizes a model for large differences between predicted and experimental values of the test set compounds not accounted for by Q2ext, being an indicator of good external predictivity if greater than 0.65. External validation was also performed by using the very demanding concordance correlation coefficient (CCC) [40,41]. This coefficient measures both accuracy (how far the regression line deviates from the concordance line) and precision (how far the observations are from the fitting line) between experimental and predicted values (Table 3), and a minimum value of 0.85 is required as an indicator of good predictive ability. Finally, a scatter plot of predicted vs. experimental values was also obtained, as recent studies have recommended the visual inspection of these plots as important complementary indicators of model predictivity [22,41]. The scatter plot for the best-found model is represented in Figure 1 showing that no systematic deviations from the ideal line were observed.
A close analysis of Table 2 reveals that the activity of the studied CAD against wt Mtb H37Rv strain did not depend on their energetic, steric, or physicochemical features. Descriptors belonging to these classes were found not to contribute to model log (1/MIC) values. Additionally, the lipophilicity, as measured by cLogP, did not seem significant to explain the antitubercular activity of this family of compounds. On the other hand, geometrical and electronic properties came out as very effective in explaining the biological activity of these derivatives. Indeed, the best-found model included two geometrical descriptors (angles a1 and a3 as depicted in Table 2), which both favored activity, suggesting that the sp2 hybridization for these sets of atoms is preferred over the sp3. The model also comprises two properties related to the ability of permeation through membranes, polar surface area (PSA) and Hanse polarity parameter (HansPol). This last parameter represents the energy from dipolar intermolecular interactions and contributes negatively to the antitubercular activity of the CAD. Conversely, PSA, which corresponds to the surface sum over all polar atoms (primarily oxygen and nitrogen, including their attached hydrogens), contributes to enhanced activity with an average of 71.3 Å2 for training set and 73.2 Å2 for test set compounds, thus indicating that compounds that are good at permeating membranes are preferred. However, both geometrical descriptors have a relatively higher impact than the two electronic descriptors in the activity of these compounds. Although the cinnamic skeleton has been considered an interesting scaffold for the development of novel antimicrobials, little is known about its mechanism of antimicrobial action. Therefore, no clear relation between a possible mode of action of this family of compounds and the model’s descriptors can be made. Still, the results clearly suggest that both penetration through cell membrane and adequate geometrical properties to bind to their target are crucial for the biological activity of CAD.
An additional important aspect to take into consideration is the applicability domain (AD) of the built MLR model, which is crucial to ensure that its predictions are reliable. Thus, to assess the AD for the QSAR model obtained, two different methods were considered: i) the leverage approach (Figure 2) [42], and ii) the range of the individual descriptors (Supplementary Materials Table S1). The leverage value, h, provides a measure of the distance of a molecule from the training set’s centroid. A “cautionary leverage”, h*, is usually set to 3p/N, where N is the number of molecules in the training set and p the number of model descriptors plus one [42]. Thus, plotting the standardized residuals, SR, as a function of the leverage values (Williams plot) allows for a graphical assessment of the AD, enabling the detection of influential points, i.e., compounds structurally distant from training set compounds (h > h*) and of response outliers (SR > ± 3 SR units). In a Williams plot, the AD is defined by the squared area between ± 3 SR and the threshold h* value. By analyzing Figure 2, we can observe that the built MLR model performed well in terms of AD. In fact, training and test set compounds lie within ± 3 SR units, indicating the absence of any outlier. Additionally, there are also no significant influential points in the training set since the leverage values of all compounds are smaller than the cut-off value h*. Three compounds from the test set (12, 23, and 24) had h values higher than the warning value h*, thus falling somewhat outside the model AD. Still, the model accurately predicted these compounds (SR within ± 1.2 units), being such points being called “good influential points” [43].
In summary, a set of 29 CAD was investigated to relate their antitubercular activity values to their molecular structure. Using MLR analysis, a stable and predictive QSAR model with good statistical results was developed. The main descriptors involved in the model were related to geometrical and electronic CAD properties. However, more in-depth studies regarding the mechanism of action of the antitubercular activity of CAD should be performed in order to further explore the relationship between QSAR model descriptors and the physicochemical properties of the surface of bacterial cells. Still, the physicochemical meaning of the descriptors of the proposed model will be helpful for rational structural modifications on this class of compounds in order to design better antitubercular agents. 3. Materials and Methods 3.1. Data Set Preparation and Descriptors Calculation
The data set consisted of 29 cinnamic acid derivatives retrieved from ChEMBL [32] with known MIC values for Mtb H37Rv strain (Supplementary Materials Table S4) [11,16]. MIC values were converted to the pMIC scale (–log MIC). MarvinSketch was used for molecules’ construction and the dominant protonation states of molecules were calculated using the Major Microspecies Plugin, MarvinSketch 16.2.1, ChemAxon [44, 45, 46, 47].
A pool of 33 molecular descriptors (Supplementary Materials Table S4) was generated using Molecular Modeling Pro Plus software [44]. Each compound was first submitted to a molecular structure optimization by MM2, a molecular mechanics method incorporated in the software. ChemDraw was used to calculate cLogP values [48].
To determine a relationship between the molecular descriptors of the selected compounds and their respective biological activity, we performed a standard MLR of the type
Y = AX + ζ,
where ζ is an n × 1 residuals vector whose elements are assumed to be independent normal random variables with mean zero and known variance σ2, X is a known n × k matrix of molecular descriptors, A is a k × 1 vector of adjusted parameters, and Y is an n × 1 vector of the response variable related, in this case, to the biological activity. For this purpose, we used the Microsoft Excel Data Analysis add-in and several statistical validation tests to ensure the trustworthiness of the analyses.
3.2. Outlier Search The decision to consider a given point as an outlier was made according to two criteria: Cook’s distance and the more conventional measure |Ycalc − Yexp| > 2 SD, where SD stands for standard deviation of the fit.
Cook’s distance, Di, is a measure of the influence of a suspicious point (outlier) in the results of a certain regression and is given by [45,46]
Di= ∑i Y ^− Y^i2kσ2 ,
whereY^andY^iare the n × 1 vectors of the predicted observations for the entire data set and for the data set without the ith observation, respectively, and k is the number of parameters adjusted by the linear model with a variance σ2. The specific criterion used to exclude a supposed outlier was Di > 4/(n – k – 1), where n is the number of experimental points.
3.3. Internal Validation
The data set was divided into training (22 compounds) and test (7 compounds) sets with similar degrees of variability. In order to make an internal validation of the data, we applied the leave-one-out (LOO) approach to the training set [22,34,35] as follows:
Q2=1− ∑i=1training yi− y^i2∑i=1training yi− y¯i2,
where yi,y^iandy¯iare the measured, predicted, and averaged (over the whole data set) values of the dependent variable, respectively, and Q2 is a cross-validated correlation coefficient.
We also considered traditional statistical criteria such as the determination coefficient, R2, the standard deviation, SD, the F statistic, and the significance level, SL, of each adjusted parameter (parameters were kept if SL > 95%) and tested the intercorrelations among all descriptors included in each regression. 3.4. External Validation
The test set was used for external validation, and the predictive ability of the model was assessed by an external Q2ext parameter defined as [22,34,35]:
Qext2=1− ∑i=1test yi− y^i2∑i=1test yi− y¯training2,
where yi andy^iare the experimental and predicted (over the test set) values, respectively, andy¯trainingis the averaged value of the dependent variable for the training set.
To further assess the predictive capability of the established QSAR model, we also computed three measures of fit, namely the average error (AE), the absolute average error (AAE), and the root-mean square error (RMSE). Additionally, we determined Roy’s parameters [38,39,40,41], r2m andrm2¯ and the concordance correlation coefficient (CCC) [40,41]. The former two criteria were calculated according to the following formula:
rm2= R2 (1− R2− R02, rm2¯= rm2+r′m22,
whereR2andR02are, respectively, the determination coefficients of the regression function, calculated using the experimental and the predicted data of the prediction set, forcing the regression to pass, respectively, through the origin of the axis (R02) or not (R2).rm2is calculated using the experimental values on the ordinate axis, andr′m2using them on the abscissa. The latter criterion CCC is obtained by
CCC= 2 ∑i=1ntestYi−Y¯Y^i−Y^¯∑i=1ntest Yi −Y¯2+ ∑i=1ntest Y^i−Y^¯2+ ntest Y^i−Y^¯2 ,
whereYiandY^istand for the abscissa and ordinate values of the plot of experimental vs. predicted values (or, similarly the opposite, which causes no difference), n is the number of compounds, andY¯andY^¯correspond to the averages of experimental and predicted values, respectively.
Figure 1. Log(1/MIC)pred vs. log(1/MIC)exp according to the best built QSAR model.
Figure 2. Williams plot for the built model representing the leverage values for the training and test set compounds.
Cpd | R1 | R2 | R3 | R4 | MIC H37Rv (μM)2 | Cpd | R1 | R2 | R3 | R4 | MIC H37Rv (μM)2 |
1 | [1] | H | farnesyl | H | 1.28 | *16 | [5] | H | isopentenyl | H | 168.23 |
2 | [1] | H | isopentenyl | H | 95.97 | 171 | [5] | OCH3 | H | H | 23.78 |
3 | [1] | H | methyl | H | 225.52 | *18 | [6] | H | methyl | H | 950.00 |
*4 | [1] | OCH3 | H | OCH3 | 384.16 | 19 | [7] | H | isopentenyl | H | 2.30 |
5 | [1] | OCH3 | H | H | 423.21 | 20 | [7] | H | CF3 | H | 1.10 |
6 | [1] | H | H | H | 237.44 | 21 | [7] | H | CF3CH2 | H | 2.20 |
7 | [2] | OCH3 | H | H | 27.94 | 22 | [7] | H | geranyl | H | 1.90 |
8 | [2] | H | H | H | 31.21 | *23 | [7] | H | ethyl | H | 1.30 |
9 | [3] | H | geranyl | H | 0.26 | *24 | [8] | H | isopentenyl | H | 21.00 |
101 | [3] | H | isopentenyl | H | 199.11 | 25 | [8] | H | CF3CH2 | H | 20.00 |
11 | [4] | H | isopentenyl | H | 51.88 | 26 | [8] | H | ethyl | H | 12.00 |
*12 | [4] | H | methyl | H | 247.75 | 27 | [8] | H | CF3 | H | 21.00 |
13 | [4] | OCH3 | methyl | H | 439.65 | 281 | [8] | H | geranyl | H | 72.00 |
14 | [5] | H | methyl | H | 1560.59 | 29 | [8] | H | methyl | H | 50.00 |
*15 | [5] | H | geranyl | H | 72.30 |
1 Compound identified as outlier. 2 MIC values were retrieved from references [11,16]. * Test set compounds.
Set | N1 | SD2 | R2 3 | F4 | R205 | AE6 | AAE7 | RMSE8 | Q2 9 | r¯m210 | Δr2m 11 | CCC12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | 19 | 0.357 | 0.909 | 35 | - | - | - | - | 0.930 | - | - | - |
Test | 7 | 0.297 | 0.920 | 58 | 0.913 | 0.100 | 0.260 | 0.294 | 0.933 | 0.879 | 0.070 | 0.953 |
1 Number of compounds. 2 Standard deviation of fit. 3 Determination coefficient. 4 The F statistics.5 Determination coefficient of regression through the origin. 6 Average error. 7 Absolute average error. 8 Root-mean square error. 9 Cross-validation correlation coefficient. 10 Average value between observed vs. predicted and predicted vs. observed Roy’s parameter, r2m, for the test set. 11 Absolute difference between observed vs. predicted and predicted vs. observed Roy’s parameter, r2m, for the test set. 12 Concordance correlation coefficient.
Supplementary Materials
The following are available online at https://www.mdpi.com/1420-3049/25/3/456/s1, Table S1: Range of variability of descriptors for the training and test sets, Table S2: Results of the Y-randomization (30 shuffles) and the QUIK rule for the best model, Table S3: Intercorrelation matrix between any two descriptors and between one descriptor and a linear combination of all other descriptors, Table S4: Dataset compounds in SMILES format and respective MIC values (μM) and values of descriptors (values not normalized).
Author Contributions
Conceptualization, C.T., C.V., J.R.B.G., P.G., and F.M.; methodology, C.T., C.V., and F.M.; validation, C.T., C.V., and F.M.; investigation, C.T.; data curation, C.T.; writing-original draft preparation, C.T., J.R.B.G., P.G., and F.M.; writing-review and editing, C.T., P.G, and F.M.; visualization, C.T., C.V., and F.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Fundação para a Ciência e Tecnologia (FCT), Portugal, grants UID/QUI/50006/2019, PTDC/BTM-SAL/29786/2017, PTDC/QUI/67933/2006, and PTDC/MED-QUI/29036/2017.
Acknowledgments
The authors thank Fundação para a Ciência e Tecnologia (FCT, Portugal) for funding through grants UID/QUI/50006/2019, PTDC/BTM-SAL/29786/2017, PTDC/QUI/67933/2006, and PTDC/MED-QUI/29036/2017.
Conflicts of Interest
The authors declare no conflict of interest.
1. World Health Organization. Global Tuberculosis Report. 2019. Available online: https://www.who.Int/tb/global-report-2019 (accessed on 3 December 2019).
2. Brigden, G.; Hewison, C.; Varaine, F. New developments in the treatment of drug-resistant tuberculosis: Clinical utility of bedaquiline and delamanid. Infect. Drug. Resist. 2015, 8, 367-378.
3. Gunther, G. Multidrug-resistant and extensively drug-resistant tuberculosis: A review of current concepts and future challenges. Clin. Med. 2014, 14, 279-285.
4. Pawlowski, A.; Jansson, M.; Skold, M.; Rottenberg, M.E.; Kallenius, G. Tuberculosis and hiv co-infection. PLoS Pathog. 2012, 8, e1002464.
5. Zumla, A.; Chakaya, J.; Centis, R.; D'Ambrosio, L.; Mwaba, P.; Bates, M.; Kapata, N.; Nyirenda, T.; Chanda, D.; Mfinanga, S.; et al. Tuberculosis treatment and management--an update on treatment regimens, trials, new drugs, and adjunct therapies. Lancet Respir. Med. 2015, 3, 220-234.
6. Maitra, A.; Bates, S.; Kolvekar, T.; Devarajan, P.V.; Guzman, J.D.; Bhakta, S. Repurposing-a ray of hope in tackling extensively drug resistance in tuberculosis. Int. J. Infect. Dis. 2015, 32, 50-55.
7. Pranger, A.D.; van der Werf, T.S.; Kosterink, J.G.W.; Alffenaar, J.W.C. The role of fluoroquinolones in the treatment of tuberculosis in 2019. Drugs 2019, 79, 161-171.
8. De, P.; Bedos-Belval, F.; Vanucci-Bacque, C.; Baltas, M. Cinnamic acid derivatives in tuberculosis, malaria and cardiovascular diseases - a review. Curr. Org. Chem. 2012, 16, 747-768.
9. Asif, M.; Mohd, I. Synthetic methods and pharmacological potential of some cinnamic acid analogues particularly against convulsions. Prog. Chem. Biochem. Res. 2019, 2, 192-210.
10. Bairwa, R.; Kakwani, M.; Tawari, N.R.; Lalchandani, J.; Ray, M.K.; Rajan, M.G.; Degani, M.S. Novel molecular hybrids of cinnamic acids and guanylhydrazones as potential antitubercular agents. Bioorg. Med. Chem. Lett. 2010, 20, 1623-1625.
11. De, P.; Koumba Yoya, G.; Constant, P.; Bedos-Belval, F.; Duran, H.; Saffon, N.; Daffe, M.; Baltas, M. Design, synthesis, and biological evaluation of new cinnamic derivatives as antituberculosis agents. J. Med. Chem. 2011, 54, 1449-1461.
12. Eedara, B.B.; Tucker, I.G.; Zujovic, Z.D.; Rades, T.; Price, J.R.; Das, S.C. Crystalline adduct of moxifloxacin with trans-cinnamic acid to reduce the aqueous solubility and dissolution rate for improved residence time in the lungs. Eur. J. Pharm. Sci. 2019.
13. Guzman, J.D. Natural cinnamic acids, synthetic derivatives and hybrids with antimicrobial activity. Molecules 2014, 19, 19292-19349.
14. Kakwani, M.D.; Suryavanshi, P.; Ray, M.; Rajan, M.G.; Majee, S.; Samad, A.; Devarajan, P.; Degani, M.S. Design, synthesis and antimycobacterial activity of cinnamide derivatives: A molecular hybridization approach. Bioorg. Med. Chem. Lett. 2011, 21, 1997-1999.
15. Liu, Q.; Liu, Z.; Sun, C.; Shao, M.; Ma, J.; Wei, X.; Zhang, T.; Li, W.; Ju, J. Discovery and biosynthesis of atrovimycin, an antitubercular and antifungal cyclodepsipeptide featuring vicinal-dihydroxylated cinnamic acyl chain. Org. Lett. 2019, 21, 2634-2638.
16. Yoya, G.K.; Bedos-Belval, F.; Constant, P.; Duran, H.; Daffe, M.; Baltas, M. Synthesis and evaluation of a novel series of pseudo-cinnamic derivatives as antituberculosis agents. Bioorg. Med. Chem. Lett. 2009, 19, 341-343.
17. Chung, H.S.; Shin, J.C. Characterization of antioxidant alkaloids and phenolic acids from anthocyanin-pigmented rice (oryza sativa cv. Heugjinjubyeo). Food Chem. 2007, 104, 1670-1677.
18. De, P.; Baltas, M.; Bedos-Belval, F. Cinnamic acid derivatives as anticancer agents-a review. Curr. Med. Chem. 2011, 18, 1672-1703.
19. Teixeira, C.; Vale, N.; Perez, B.; Gomes, A.; Gomes, J.R.; Gomes, P. "Recycling" classical drugs for malaria. Chem. Rev. 2014, 114, 11164-11220.
20. Kovalishyn, V.; Aires-de-Sousa, J.; Ventura, C.; Elvas Leitão, R.; Martins, F. Qsar modeling of antitubercular activity of diverse organic compounds. Chemom. Intell. Lab. Syst. 2011, 107, 69-74.
21. Martins, F.; Santos, S.; Ventura, C.; Elvas-Leitao, R.; Santos, L.; Vitorino, S.; Reis, M.; Miranda, V.; Correia, H.F.; Aires-de-Sousa, J.; et al. Design, synthesis and biological evaluation of novel isoniazid derivatives with potent antitubercular activity. Eur. J. Med. Chem. 2014, 81, 119-138.
22. Martins, F.; Ventura, C.; Santos, S.; Viveiros, M. Qsar based design of new antitubercular compounds: Improved isoniazid derivatives against multidrug-resistant tb. Curr. Pharm. Des. 2014, 20, 4427-4454.
23. Ventura, C.; Latino, D.A.; Martins, F. Comparison of multiple linear regressions and neural networks based qsar models for the design of new antitubercular compounds. Eur. J. Med. Chem. 2013, 70, 831-845.
24. Dimova, D.; Stumpfe, D.; Bajorath, J. Method for the evaluation of structure-activity relationship information associated with coordinated activity cliffs. J. Med. Chem. 2014, 57, 6553-6563.
25. Maggiora, G.M. On outliers and activity cliffs--why qsar often disappoints. J. Chem. Inf. Model. 2006, 46, 1535.
26. Nikolova, N.; Jaworska, J. Approaches to measure chemical similarity - a review. QSAR Comb. Sci. 2003, 22, 1006-1026.
27. Hansch, C.; Fujita, T. P-σ-π analysis. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 1964, 86, 1616-1626.
28. Butkiewicz, M.; Lowe, E.W., Jr.; Mueller, R.; Mendenhall, J.L.; Teixeira, P.L.; Weaver, C.D.; Meiler, J. Benchmarking ligand-based virtual high-throughput screening with the pubchem database. Molecules 2013, 18, 735-756.
29. Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, II; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. Qsar modeling: Where have you been? Where are you going to? J. Med. Chem. 2014, 57, 4977-5010.
30. Ekins, S.; Freundlich, J.S.; Reynolds, R.C. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for mycobacterium tuberculosis. J. Chem. Inf. Model. 2014, 54, 2157-2165.
31. van de Waterbeemd, H.; Rose, S. Chapter 23 - quantitative approaches to structure-activity relationships a2 - wermuth, camille georges. In The Practice of Medicinal Chemistry, 3rd ed.; Academic Press: New York, NY, USA, 2008; pp. 491-513.
32. Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Kruger, F.A.; Light, Y.; Mak, L.; McGlinchey, S.; et al. The chembl bioactivity database: An update. Nucleic Acids Res. 2014, 42, D1083-D1090.
33. Livingstone, D. Data pre-treatment and variable selection. In A Practical Guide to Scientific Data Analysis; Livingstone, D., Ed.; Wiley: Chichester, UK, 2009; pp. 57-73.
34. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269-276.
35. Tropsha, A.; Gramatica, P.; Gombar, V.K. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of qspr models. QSAR Comb. Sci. 2003, 22, 69-77.
36. Mitra, I.; Saha, A.; Roy, K. Exploring quantitative structure-activity relationship studies of antioxidant phenolic compounds obtained from traditional chinese medicinal plants. Mol. Simul. 2010, 36, 1067-1079.
37. Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. Detecting "bad" regression models: Multicriteria fitness functions in regression analysis. Anal. Chim. Acta 2004, 515, 199-208.
38. Pratim Roy, P.; Paul, S.; Mitra, I.; Roy, K. On two novel parameters for validation of predictive qsar models. Molecules 2009, 14, 1660-1701.
39. Roy, K.; Mitra, I.; Kar, S.; Ojha, P.K.; Das, R.N.; Kabir, H. Comparative studies on some metrics for external validation of qspr models. J. Chem. Inf. Model. 2012, 52, 396-408.
40. Chirico, N.; Gramatica, P. Real external predictivity of qsar models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J. Chem. Inf. Model. 2011, 51, 2320-2335.
41. Chirico, N.; Gramatica, P. Real external predictivity of qsar models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 2012, 52, 2044-2058.
42. Netzeva, T.I.; Worth, A.; Aldenberg, T.; Benigni, R.; Cronin, M.T.; Gramatica, P.; Jaworska, J.S.; Kahn, S.; Klopman, G.; Marchant, C.A.; et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ecvam workshop 52. Altern. Lab. Anim. 2005, 33, 155-173.
43. Jaworska, J.; Nikolova-Jeliazkova, N.; Aldenberg, T. Qsar applicabilty domain estimation by projection of the training set descriptor space: A review. Altern. Lab. Anim. 2005, 33, 445-459.
44. Molecular modeling pro plus, version 6.2.5. Available online: www.chemistry-software.com.
45. Dı́az-Garcı́a, J.A.; González-Farı́as, G. A note on the cook's distance. J. Stat. Plan. Inference 2004, 120, 119-136.
46. Militino, A.F.; Palacios, M.B.; Ugarte, M.D. Outliers detection in multivariate spatial linear models. J. Stat. Plan. Inference 2006, 136, 125-146.
47. ChemAxon - Software Solutions and Services for Chemistry & Biology. Available online: https://www.chemaxon.com (accessed on 17 January 2020).
48. ChemDraw - Chemical Communication Software. Available online: https://www.perkinelmer.com/category/chemdraw (accessed on 17 January 2020).
Cátia Teixeira1,*, Cristina Ventura2, José R. B. Gomes3, Paula Gomes1 and Filomena Martins4,*
1LAQV-REQUIMTE, Departamento de Química e Bioquímica da Faculdade de Ciências da Universidade do Porto, P-4169-007 Porto, Portugal
2Instituto Superior de Educação e Ciências, P-1750-142 Lisboa, Portugal
3CICECO, Departamento de Química, Universidade de Aveiro, P-3810-193 Aveiro, Portugal
4Centro de Química e Bioquímica (CQB), Centro de Química Estrutural (CQE), Faculdade de Ciências da Universidade de Lisboa, P-1749-016 Lisboa, Portugal
*Authors to whom correspondence should be addressed.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Tuberculosis, caused by Mycobacterium tuberculosis (Mtb), remains one of the top ten causes of death worldwide and the main cause of mortality from a single infectious agent. The upsurge of multi- and extensively-drug resistant tuberculosis cases calls for an urgent need to develop new and more effective antitubercular drugs. As the cinnamoyl scaffold is a privileged and important pharmacophore in medicinal chemistry, some studies were conducted to find novel cinnamic acid derivatives (CAD) potentially active against tuberculosis. In this context, we have engaged in the setting up of a quantitative structure–activity relationships (QSAR) strategy to: (i) derive through multiple linear regression analysis a statistically significant model to describe the antitubercular activity of CAD towards wild-type Mtb; and (ii) identify the most relevant properties with an impact on the antitubercular behavior of those derivatives. The best-found model involved only geometrical and electronic CAD related properties and was successfully challenged through strict internal and external validation procedures. The physicochemical information encoded by the identified descriptors can be used to propose specific structural modifications to design better CAD antitubercular compounds.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer