Content area
Linear models are not always able to sufficiently capture the structure of a dataset. Sometimes, combining predictors in a non-parametric method, such as deep neural networks (DNNs), would yield a more flexible modeling of the response variables in the predictions. Furthermore, the standard statistical classification or regression approaches are inefficient when dealing with more complexity, such as a high-dimensional problem, which usually suffers from multicollinearity. For confronting these cases, penalized non-parametric methods are very useful. This paper proposes two heuristic approaches and implements new shrinkage penalized cost functions in the DNN, based on the elastic-net penalty function concept. In other words, some new methods via the development of shirnkaged penalized DNN, such as
Full text
1. Introduction
When linear models are insufficient to describe the structure of a dataset, additional flexibility, like non-linear models, should be considered to counter such difficulties. Neural networks (NNs), or non-parametric regression models, are a beneficial approach to consider and have been applied in many situations, such as clustering, classification, and regression. A computer model of the human brain was used to create neural networks, which are meant to automate complicated tasks [1]. There are some shortcomings to NN’s brain model-based concept, despite its potential. There are many contentious philosophical issues regarding how an algorithmic processor may duplicate some of the activities of the brain, despite the fact that the brain model of linked neurons is oversimplified [2].
“Perceptron”, a model of a neuron, is a basic building block of a neural network. With respect to Figure 1a, the output is determined from an input :
(1)
where is called the activation function. The functions that are used on the outputs are known as activation functions. The subsequent layer, Figure 1d, uses the neurons from the previous layer’s output as an input. The are weights, and the NN learns the weights from the data. Based on Figure 1b,c, it is possible to extend a typical simple neural network to a deep neural network, or DNN. A DNN is a multi-layer neural network that has two or more hidden layers, to put it simply. The specialized nature of the model to train data is increased by adding more layers and also by adding more neurons per layer, but the performance on the test dataset is decreased [3,4].The data is used for estimating the weights of each neuron in each layer; in other words, data is indeed used to estimate the weights associated with each connection between neurons in adjacent layers. These weights, along with biases, determine how much influence one neuron’s output has on another neuron’s input, essentially learning the relationships within the data, also known as training the network in the NN technique [5]. The weights, , are chosen to minimize an error measuring criterion, like:
(2)
where y is the observed output and is the predicted output. For categorical responses (categorical is the general term for a single-choice or multiple-choice response), a different criterion would be more appropriate.Similar to regression and smoothing splines, a penalty function is employed to fit a more stable model by “weight decay” minimization, as shown in Equation (3). Instead of only minimizing E, a similar concept that was applied to “ridge regression” can be applied [1,6]:
(3)
NNs have certain significant shortcomings in comparison to competing statistical models [7]. In contrast to statistical models, where parameters frequently have some interpretation, NNs have parameters that are uninterpretable. It is also possible to assert that there are no standard errors because NNs are not founded on a probability model that captures structure and variation. NNs are thus typically good for prediction, but they are to some extent poor for interpretation [7]. Furthermore, the NN can easily lead to an overfitted model, producing too optimistic forecasts if careful supervision is not exercised. If the NNs are applied appropriately, they can be considered a good tool in the toolbox that can perform better than some of their statistical counterparts for particular tasks [7,8]. There are many possible ways to handle neural networks, such as using max norm constraints, dropout, …, but we have to consider that a famous method such as dropout is not a “shrinkage penalized regularization method” and is only a method for overfitting prevention [4]. This brief explanation clarifies that the dropout strategy is not founded on the shrinkage penalization approach. During training, dropout is applied by maintaining a neuron’s activation with a certain probability (a hyperparameter) or deactivating it otherwise. This indicates that certain neurons may be absent during training, resulting in dropout. The network remains unimpeded and exhibits enhanced accuracy despite the lack of specific information. This reduces the network’s excessive dependence on any one neuron or a limited subset of neurons [4,9]. Compared to the statistical approaches, where the burden of developing an appropriate sampling might occasionally impede or even block progress, NNs are extremely effective for large, complicated datasets [10]. However, NNs do not have an adequate statistical theory for shrinkage penalization (regularization) and model selection when they are dealing with large-scale datasets. Hence, the algorithm of the NN is extended using the blended penalty function property, which produces the new shrinkage penalized models for the DNNs. As an instance, by using the ratio theory of , a number of shrinkage penalized DNN structures are developed based on the elastic-net (Enet) concept. It is worth noting that the elastic-net penalty is used for generalized linear models (GLMs) as a linear combination of the ridge and bridge penalties. Furthermore, the elastic-net is developed based on the elastic-net and includes a number of special cases, such as ridge, Lasso, and bridge penalties [11,12].
Still, there are many motivations in order to develop and extend going over different aspects of the DNN. Farrell and his colleagues have established the nonasymptotic bounds for deep nets (novel nonasymptotic high probability bounds for deep feed forward neural nets) for a general class of nonparametric regression-type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. They then applied their theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrated the effectiveness of deep learning with an empirical application to direct mail marketing [13].
Kurisu et al. have developed a general theory for adaptive nonparametric estimation of the mean function of a nonstationary and nonlinear time series model using deep neural networks (DNNs) [14].
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron [15,16]. Liu and his colleagues have studied over a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) from the perspective of sample complexity and generalization properties [17].
Shrestha et al. have researched how to increase the accuracy and reduce the processing time for the image classification through Deep Learning Architecture by using elastic-net regularization in feature selection. Methodology: The proposed system consists of a convolutional neural network (CNN) to enhance the accuracy of classification and prediction by using elastic-net regularization [18].
Zhang et al. have shown that despite the massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, they have shown how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, their experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data [19].
Many researchers are interested in the automatic detection of binary responses such as (health/disease) or (0/1) cases, utilizing machine learning (ML) methods [20,21,22]. Conventional ML techniques have been employed in a number of research works to classify data based on microbiome samples [21,22]. However, these techniques have a number of drawbacks, including poor accuracy and the requirement of manual feature selection [23,24]. In contrast to conventional ML algorithms that depend on manual feature selection methods, DNN-based methods, have an inbuilt mechanism for feature extraction [25]. In comparison to conventional ML algorithms, DNN techniques perform better in the classification domain [26]. As a result, there has been a move toward using DNN techniques to increase the classification accuracy of microbiome data [27].
In Section 2, we develop the DNN’s algorithm by extending the concept of shrinkage penalization based on the elastic-net as a blended penalty function. A simulation for a classification study using our developed method is presented in Section 3. In addition, Section 3 illustrates the application of the developed penalized DNN to a real compositional high-dimensional classification problem, “microbiome” data. Afterward, the results of DNN’s classification-based approach are compared to a non-parametric classification tree, “GUIDE”. GUIDE stands for Generalized, Unbiased, Interaction Detection and Estimation [28]. Section 4 deals with conclusions, future works, and activities, including the possible ways to extend the proposed methods for more improvement and extension.
2. Regularization of DNN
Let us begin by addressing overfitting as we move into the main body of the section title. Suppose that we have a training dataset and a testing dataset. Now, it is possible that after training the model for a certain time, the decision boundary fits so well for this dataset and captures almost all the points in it. Now, if we try to make a prediction on the testing dataset with this model, we can see that it will not perform well on the testing dataset. Thus, we will have high training accuracy but low testing accuracy. This condition is called an “overfitting” condition. What we ideally want is a smooth curve that can fit well both the training dataset as well as the testing dataset. Therefore, we use “regularization”. In the application of NN, a highly complex non-linearity could be a result of having a very deep neural network, Figure 1c, which has many hidden layers, and each of them has many neurons. Hence, all the complex connections between neurons will create a highly complex non-linear curve. So, if we want to somehow increase the linearity to have a smoother curve, then we will want to have a slightly lesser number of neurons. Thus, if somehow we can get rid of certain neurons in the neural network, or, said better, nullify the effect of certain neurons in the hidden layers, then we can increase the linearity, and thus we have a better smoother curve that will fit properly. In fact, we want a fitted line in the middle ground between highly complex and highly linear curves for fitting.
2.1. Extending the Concept of Shrinkage Penalization in the GLMs to DNNs
We think that it is very useful to highlight our purpose for extending two shrinkage methods in the heart of a DNN, which is solving grouping selection issues by elastic-net and ridge & bridge methods. Although the Lasso has shown success in many situations, it has some limitations. Consider the following three scenarios.
(a). In the case of , the Lasso selects at most n variables before it saturates, because of the nature of the convex optimization problem. It seems to be a limiting feature for a variable selection method. Moreover, the Lasso is not well defined unless the bound on the of the coefficients is smaller than a certain value.
(b). If there is a group of variables among which the pairwise correlations are very high, then the Lasso tends to select only one variable from the group and does not care which one is selected.
(c). For usual situations, if there are high correlations between predictors, it has been empirically observed that the prediction performance of the Lasso is dominated by ridge regression [29].
Scenarios (a) and (b) make the Lasso an inappropriate variable selection method in some situations. Segal et al. illustrate their points by considering the gene selection problem in microarray data analysis. A typical microarray data set has many thousands of predictors (genes) and often fewer than 100 samples. For those genes sharing the same biological “pathway”, the correlations between them can be high [30]. They think of those genes as forming a group. The ideal gene selection method should be able to carry out two things: eliminate the trivial genes and automatically include whole groups into the model once one gene among them is selected (“grouped selection”). For this kind of and grouped variables situation, the Lasso is not the ideal method, because it can only select at most n variables out of p candidates [31], and it lacks the ability to reveal the grouping information. As for prediction performance, scenario (c) is not rare in regression problems. So it is possible to further strengthen the prediction power of the Lasso. Their goal was to find a new method that works as well as the Lasso whenever the Lasso does the best and can fix the problems that were highlighted above, i.e., it should mimic the ideal variable selection method in scenarios (a) and (b), especially with microarray data, and it should deliver better prediction performance than the Lasso in scenario (c).
The idea of regularizing GLMs is employed for the DNN algorithm in this research. This extension is carried out using special cases of (least absolute shrinkage and selection operator, Lasso, or regularization or penalty [29]) and (ridge regression or regularization or penalty [29]) of the elastic-net penalization. After that, we develop this motivation for more general ridge and bridge blended penalty functions. The development of penalized equations is applied in the cost function of the DNN algorithm. We show how ratio theory can be seen as a pairing of and as specific examples of elastic-net penalties before dealing with the extension of the penalized DNN equation.
2.2. Empirical Extension of the Application of the Ratio Theory
In this section, regarding the ratio theory [32], it is shown that it is possible to recover the sparsest solution exactly by minimizing the ratio and difference in and norms, thereby establishing the origin of their sparsity-promoting property. As is known, the elastic-net model is obtained based on the linear combination of ridge and Lasso through the penalty term . Similar to the elastic-net, ridge and Lasso are also special cases of elastic-net. So, regarding Theorem 3.2 and Lemma 3.2 of [32], one of the combinations of them is the ratio of , which would be considerable and determinant. Since, in the modeling of a high-dimensional dataset via two models, the total numbers of coefficients in both of the models are the same, it is possible to map a one-to-one function among the coefficients of two models and then check the ratio of them (irrespective of whether the values of each specific coefficient in each model are zero or not). Therefore, based on the ratio , one set of coefficients for an elastic-net can be created. In other words, the optimal ratio is obtained based on the values of the accuracy of the models in terms of mean square errors, . As a result, the outcomes are locally optimized using MSEs of specially elastic-net models ( and ). On the other hand, any ratio of two components can be considered as a type of combination of those components. So, if we study the ratio of ridge and Lasso as a type of combination of two components, then we can consider the ratio of the coefficients of to the coefficients of . So, regarding Theorem 3.2 and Lemma 3.3 of [32], one of the combinations of and can be written as follows [32]:
where the coefficients of are in the denominator of the ratio and hence, none of the coefficients are zero. On the other hand, the coefficients of the numerator belong to the that has a large number of zero coefficients. Similarly, we can extend this logic for the elastic-net because the modified elastic-net (ME-net) is a type of elastic-net [6,12]. The special cases of the elastic-net penalties are the ridge , Lasso , bridge and elastic-net penalties [33,34]. A specific set of regression coefficients based on type of penalty, in a the elastic-net penalty is given by: where and [12,33,34]. In fact, (I) uses two , which is equivalent to the equation used by [12] as follows: now, if we set Equations (I) and (II), we have: and then, from the set of (I) and (II) we have: and so, Now, if we take log from both sides:Now, with respect to Figure 2 and Figure 3, the intuitive and geometrical base for drawing a one-to-one function between the and penalties is possible (each imaginary line drawn parallel to the vertical axis only touches one point of the graphs of and ), whereas the graph of elastic-net is settled somewhere between the and penalties with the properties of being monotonic and invertable, so the rule of one-to-one function is still valid [29,35]. In other words, the elastic-net penalty function regarding the definition of its penalty function () has responses for its unknown coefficients.
Now, based on explanation above, instead of using and in computations, we use only via the development of our deep neural network algorithm. The simplicity of the procedure, which has roughly the same accuracy as using both individual , and calculation speed, are two advantages of . And all the R codes are written step by step with respect to the mathematical development of the methods of the different DNNs based on the basic mathematics of them in the following, as well as in the appendix. The basic mathematics of DNNs are thoroughly provided in Appendix A.
It is time to examine the regularization technique at this point. We are aware that the primary goal of our model is to decrease the amount of the cost function, where the following is true:
, in which m is the number of observations in a dataset. If we add the term in our cost function, then it will create a nullifying effect of certain parameters. In fact, the additional term attempts to reduce the values of the cost function by requiring our model to change (lower) the value of weights, , as well. As a result, some of the carry out the nullification process since they are close to zero. Here, , as a regularization parameter or tuner finds tunes and can be considered as the proportion of linearity and non-linearity in our model. Therefore, increasing the value of will force our model to reduce the value of even more, and so it increases the linearity even more. Now, if we replace the modulus of in the penalty term of the cost function, we will have regularization as
Again, this added term will force some to tend to 0. So, we can write a general equation for a regularized DNN as below:(4)
where m is the number of observations, n is the number of dimensions, and l is related to the number of layers. The matrix for any layer can be considered as follows:(5)
Then, we have to calculate:(6)
It is worth noting that if we are adding any term to our cost function, we will also need to change our back-propagation equations. Because, in the back-propagation algorithms, we definitely face with and so when we calculate , it is also needed to add the derivative of this term The derivative of Equation (9) is equal to , because the rest of weights except , are considered as constant values. Thus, the entire derivative of this term is going to be where is a matrix containing all the weight parameters of that layer.
2.3. Constructing Two Heuristics DNN Approaches Based on the Shrinkage Penalized Methods
In Section 2.2, we illustrated how we employ the idea of an elastic-net penalty function in the GLMs in order to develop the algorithms of DDN. We have created two heuristic approaches for DNN based on practical statistical evidence, including current shrinkage penalized methods such as the combination of ridge and Lasso (elastic-net) and ridge&bridge. We defined a new penalty term in the structure of DNN with respect to the restrictions of any hyper-parameters in the penalized equation of the elastic-net. The important issue is that we have to consider that the coefficients, , in the GLM’s penalty terms are somehow the weights, , in the structure of the DNN. Additionally, while describing the function of cost in DNNs, it is required to embed them in the back-propagation section in addition to including the penalty term in the cost function.
Now, in order to introduce an extension of the DNN’s algorithm and generalize its shrinkage penalized structures in other circumstances, such as the and we need to develop Equation (5) and prepare it for usage with variously developed DNN’s algorithms. In this regard, the general form of the elastic-net (en) cost function is defined as below:
(7)
where m is the number of observations, n is the number of dimensions, l is the number of layers, is the size of penalty or tuning parameter, and determines the type of penalty in the DNN’s structure, . It is revealed that is equivalent to the regularization and is equivalent to the regularization. Therefore, the derivation of the elastic-net cost function must be calculated and then embedded in the back-propagation section to examine the effects of the regularization term that has been added. Similarly to the calculated derivations of the cost functions in Section 2.2, the derivation can be written as:(8)
So, we can extend the regularization term to elastic-net.We have now reached the final stage of the DNN regularization process, which is based on the elastic-net structure and represents one example of the general “ridge&bridge” regularization in GLMs. With respect to Equations (5) and (6), the elastic-net cost function in general is defined as below:
(9)
where the restriction of determines different types of ridge&bridge shrinkage regularizations in DNN structures, . The derivation of the cost function of Equation (A23) with respect to the weights can be written as follows:(10)
By using Equation (10) in the cost function of the shrinkage-regularized DNN structure and embedding Equation (A1) in the back-propagation section of it, we developed and generalized the shrinkage penalization algorithm of DDN. Based on this development and generalization, we can present a wide range of penalized DNN models that take advantage of the general GLM penalization.
3. Microbiome Data
Operational Taxonomic Units (OTUs) are the basic unit used in numerical taxonomy. A typical microbiome dataset has thousands of OTUs and a small number of samples. These units may refer to an individual, species, genus, or class. The taxonomic units utilized in numerical approaches are invariably not equivalent to the formal taxonomic units [36,37]. Much of the challenge in analyzing microbiome data stems from the fact that they involve, either explicitly or implicitly, quantities of a relational nature. As such, measurements are typically both high-dimensional and dependent. Additionally, such data are often substantial in quantity, and thus computational tractability is generally an issue not far from the surface when developing and using statistical methods and models in this area. High dimensionality, non-normal distribution, and spurious correlation, among others, even make non-parametric methods invalid, and hence, analysis and interpretation of microbiome phenomena are very problematic [38,39].
Typically, there are four general characteristics of microbiome data. First of all, they are compositional, which means that the sum of the percentage of all bacteria in each sample is equal to or almost equal to 1. Secondly, the microbiome datasets are high-dimensional. Third, they are overdispersed, meaning that their variances are much larger than their means. Finally, often microbiome datasets are sparse with many zeros. So, for these couple of unique characteristics, the analysis of microbiome datasets is very challenging [40].
3.1. Simulation Study for Microbiome Data
In this sub-section, we individually particularized the equation of an elastic-net for the penalized logistic regression model. This particularization is important since employing Algorithm 1 will make the calculation and programming easier when implementing the model. As is well known, a binary logistic regression model is used to model the probability of certain events [41]. In the case of a binary response variable, the linear relationship between the predictor variables and response variable is explained by a logit model. Note that a logistic regression model with is explained by:
where is considered as the predictor variable, is an unknown regression coefficient, and denotes a binary response variable with parameter . Based on the [12], the log-likelihood of the penalized logistic regression model with the elastic-net penalty can be written as follows:| Algorithm 1 Extendable algorithm of the neural network. |
1:. Initialize weights randomly. 2:. Forward propagation w.r.t. the suitable increased hidden layers. 3:. Find the value of cost function. 4:. Backward propagation. 5:. Repeat Steps 2, 3 and 4, many times until to reach minimum cost function. |
Since the microbiome simulation scenario is a little complex and requires a lot of steps to generate simulated microbiome data, we just put references that we have used them [40,42,43]. The method used to deal with zeros in compositional data is an important step in the preparation of the simulated microbiome dataset [44,45,46,47]. We used the “zCompositions” package in R 4.1.1 software to solve the zero problem.
We could generate “compositional high-dimensional simulated microbiome dataset” with the property of normal distribution. But apparently, the actual microbiome dataset has no response or a “” variable. So, based on the nature of microbiome datasets, we can extract the response variable from it (extracting hidden response variable) [38,39]. In clinical studies, usually the natural effect of the microbiome can determine different opposite cases, such as high/low or healthy/disease; hence, we defined two clusters for them. We applied the “Ward-D2” clustering algorithm over the incomplete real microbiome dataset and binned the results of clustering (0/1) as a dependent variable, “”, to the dataset (we used the same process for generating the simulated one). In this study, in order to evaluate the developed algorithm, a simulated dataset with observations for dimensions (explanatory variables) was generated. The simulated dataset was organized to have binary response variables ( or . With regard to the nature of microbiome data, we now put all of the aforementioned strategies into practice to generate compositional, high-dimensional data.
The simulation has 200 (150 + 50) observations for 4000 dimensions, which is carried out as below:
Class The first category of the second simulated dataset has observations for dimensions, and the correlation structures ∼, ∼, and ∼, respectively.
Class The second category of the second simulated dataset is included observation for dimensions, and the correlation structures: ∼, ∼, and ∼, respectively.
Every Poisson distribution parameter, , has been assigned randomly. The Aitchison transformation method will then be used to normalize each of these non-normal simulated datasets. After that, the simulated data was fed into the general and penalized DNN using the methodology outlined for the development of the penalized DNN model. Namely, five DNN models were generated: , , , , and . This is repeated on 30 training and testing datasets. All penalized DNNs are evaluated by taking the mean of the following performance metrics: prediction accuracy and sensitivity, together with their confidence intervals (see Table 1 and Table 2).
3.2. Classification of Simulated Microbiome Data Based on the Elastic-Net Penalization Using DNN
In Section 3.2, we classified one kind of compositional high-dimensional simulated data called microbiome data for 200 observations and 4000 dimensions. In general, Figure 4 and Figure 5 show the data-transferring process inside a deep neural network algorithm in use as well as the fundamental mother role for this algorithm. We can state concisely that the simulated OTU data are first placed in the input vector. After applying numerous processes to the data, the findings are eventually extracted. The results of different types of the elastic-net, such as ridge , Lasso , elastic-net and bridge , in our developed DNN are illustrated in Table 1 (also see [12]), whereas the general DNN algorithm is developed based on different regularization methods; hence, for each desired model, the related penalized hyperparameters such as , and are added to their penalty functions. Therefore, as with the parametric regularization method for classification, the different values of hyperparameters have their own effect on the DNN-penalized prediction accuracy as a non-parametric method for classification.
Now, with respect to the previous paragraph, we can more clearly explain Table 1. As observed in Table 1, aside from the general DNN, there are four penalized DNN models, such as , , , and . Additionally, we employed GUIDE as a second non-parametric binary classifier to have a strong rival when comparing its outcomes to those of general and penalized binary DNNs. In order to perform these methods, we first applied all available DNNs to the entire simulated dataset. Next, we divided the simulated dataset into training and testing datasets. Prediction of model accuracy and model sensitivity are the evaluation criteria for the proposed approaches. In Table 1, besides the general DNN model, the prediction accuracy, sensitivity, and amount of cost function for the whole, training, and testing simulated datasets are listed for each type of penalized DNN model. As seen in Table 1, across all training and testing simulated datasets, the overall prediction accuracy of penalized DNNs is higher than the general DNN model.
The has more prediction accuracy and sensitivity than the general and other penalized DNNs among the whole dataset. As observed in Table 1, in the whole dataset, the , , , and have the next most prediction accuracy and sensitivity DNNs, respectively.
In the training simulated OTU dataset, again, the prediction accuracy and sensitivity of the are larger than the general and other penalized DNN models, and , respectively. The general DNN has the lowest prediction accuracy among all penalized models for the training dataset, . In contrast to the whole and testing dataset, in the training dataset, the values of prediction accuracy and sensitivity of DNNs have slightly decreased, but the order of prediction, accuracy, and sensitivity of DNNs is the same as the whole dataset.
In the testing simulated OTU dataset, the trend of growth in the prediction accuracy and sensitivity is not exactly similar to the order of them in the whole and training datasets. For instance, has more prediction accuracy and sensitivity than . One reason might be the type of dataset that is compositional (OTU data), and another reason might be the reduction in the number of observations in the testing dataset. has the highest values of prediction accuracy and sensitivity , respectively. As observed, the general DNN algorithm for testing simulated OTU data has the lowest prediction accuracy and sensitivity in comparison with other penalized DNN models.
The last column of Table 1 shows the amount of the cost function for both the general and penalized DNN algorithms in terms of decreasing or increasing. We see that the values of prediction accuracy and sensitivity can be larger, but at the expense of increasing the cost function in the algorithm. So, during the implementation process, if the cost function is not so big, then to increase the model accuracy and sensitivity (and maybe other desired criteria), all of the regularization processes in the penalized DNN models can be employed. The results of implementation of the GUIDE method over the same simulated OTU data (whole, training, and testing simulated OTU data) were shown in Table 1 as well. As seen, there is stiff competition between the algorithms of the DNNs and GUIDE based on different criteria. It is worth noting that GUIDE has a larger prediction accuracy in comparison with the general DNN by employing the whole and training datasets. However, the prediction accuracy of all penalized DNNs is higher than GUIDE, but the sensitivity of GUIDE in the whole and training datasets is larger than some DNN algorithms such as , , and . The prediction accuracy and sensitivity of GUIDE are less than DNNs in testing datasets.
3.3. Classification of Simulated Microbiome Data with GUIDE
GUIDE is an algorithm for the construction of classification and regression trees and forests [48,49]. As we know, the predictors, , are combined in a linear scheme to represent the effect on the response in the linear model. Sometimes, greater flexibility is needed since this linearity is insufficient to reflect the structure of the data. Because it combines the predictors in a non-parametric way, the models like additive models, trees, and neural networks enable them to fit a more flexible model of the response on predictors than the linear techniques [48]. In this study, we applied the GUIDE algorithm across identical simulated microbiome and real microbiome datasets in order to have a strong competitor for our proposed non-parametric method, i.e., penalizing various DNN models. Figure 6 shows how the GUIDE algorithm is applied to the whole training and testing datasets of simulated microbiome data, which are, respectively, indexed by (a), (b), and (c). The number of dimensions is 4000, and the number of observations for the whole, training, and testing datasets are 200, 140, and 60, respectively.
Based on the results of Table 1, it can be found that there is a stiff competition for prediction accuracy and sensitivity between developed penalized DNN models. In the implementation of penalized DNNs on the whole simulated dataset, the model has the highest prediction accuracy . The same implementation using GUIDE on the whole dataset gives prediction accuracy. The same trend is observed for the training and testing simulated OTU dataset in both approaches, penalized DNNs and GUIDE. For instance, the has more accuracy in comparison with other penalized DNN models. As seen via Table 1, all the penalized DNN models have larger accuracy than the general DNN model.
We are now dealing with the GUIDE results for these simulated OTU data. For the whole, training, and testing simulated OTU dataset, Figure 6 shows three possible OTUs splitting by GUIDE. Note that GUIDE is applied pruning by k-fold cross-validation, with , and the selected tree is based on the mean of CV estimates (the results of splitting have been summarized in Table 1). As can be found, the prediction accuracy of GUIDE for the whole simulated dataset is larger than its training and testing simulated datasets. It is important to notice that the GUIDE results fall between the created general DNN and penalized DNN models. The very interesting thing in Figure 6c is that the splitting is started by while the splitting for the whole simulated dataset is started by . The reason for such changes is the starting point for splitting, which might be the decreasing number of sample sizes in the simulated testing dataset (see Figure 6a–c).
3.4. Classification of Real Microbiome Data Based on the Elastic-Net Penalization Using DNN
The performance of the applied penalized DNN models is demonstrated in this section using an actual compositional high-dimensional dataset, microbiome data, or OTU data (the source of the data can be found in [50]). A classification with the GUIDE method is then run on this dataset as a comparison. The applied OTU dataset includes 675 samples and 6696 different OTUs as predictors, which is considered a compositional ultra-high-dimensional case. As explained in Section 2, the development of the algorithm of the DNN based on a new penalization equation (the elastic-net equation) in fact enables us to apply the OTU data through different rival new penalized DNN models. Because there is no response variable in this OTU dataset, it must be created. So we may solve this early step challenge of the dataset using the OTU distance measurement attribute, which was described in Section 3.1 (also see [23,24]). In terms of the geometry of the OTU data, we must first move the data from simplex space to Euclidean space; after that, we must cluster the data into two sub-clusters. Finally, we can specify the response variables with regard to which of the two clusters they belong to. We can now test the binary classification of our built models using a new dataset that has binary responses. In summary, we created an ultra-high-dimensional compositional dataset using a variety of statistical techniques in order to create a binary response variable with respect to Section 3.1.
As mentioned, the focus of this research is related to the extension of the DNN using the “elastic-net” approach. Therefore, to display the property of various of the elastic-net penalties, such as , , , and , different hyperparameters regarding the type of penalized DNN must be included in the DNN algorithm. Subsequently, the real OTU dataset is subjected to the expanded DNN algorithms. Additionally, a comparison is made with GUIDE. The evaluation results of various penalized DNN models are displayed in Table 2.
Table 2 shows that the general and penalized DNN models have been used over the whole training and testing datasets. The whole OTU dataset is divided into two parts: for training and for testing. The results of prediction accuracy and sensitivity in Table 2 for the whole dataset are almost similar to Table 1. Regarding Table 2, all of the penalized models plus the general DNN have the same prediction accuracy and sensitivity, with the exception of and , which are the best DNN models, for prediction accuracy and for sensitivity, respectively. In terms of the prediction accuracy and sensitivity, the penalized DNN models show an improvement over the general DNN model up to in the training dataset. Although the results of the penalized DNN models for the training dataset demonstrate a fierce rivalry amongst them, the model predicts outcomes with the highest prediction accuracy and sensitivity when compared with the other regularized models, , respectively. Regarding sensitivity and prediction accuracy, Table 2 shows that all penalized DNN models produce better results for the testing dataset in comparison with the general DNN. As seen in Table 2, their performances are significantly superior to that of the general DNN model, for prediction accuracy and for model sensitivity (see the and ). The last column in Table 2 displays the cost function for both general and penalized DNN models for real OTU data with respect to the prediction accuracy and sensitivity (similar to the simulation study in Section 3.2).
The outcomes of the developed penalized DNN models were compared to the outcome of applying the same simulated dataset to the classification tree with the GUIDE method for evaluation purposes. The comparison results can be explained in two parts: the first part is included (GUIDE vs. general DNN), and the second part is included (GUIDE vs. penalized DNNs). In the first part, for the whole dataset, the general DNN outperforms GUIDE in terms of the prediction accuracy, vs. , respectively, while GUIDE has greater sensitivity, vs. . For training and testing datasets, the same trend is observable, although in different percentages (see Table 2). In the second part, penalized DNN methods make better predictions than classification trees with the GUIDE. However, GUIDE is slightly better than sensitivity in training datasets and testing datasets and (see Table 2).
3.5. Classification of Real Microbiome Data with GUIDE
In Section 3.5, we first go over how data is prepared for the classification tree with the GUIDE method, and then we deal with the method outcomes. Preparing data for the GUIDE method is a challenging problem, particularly when dealing with high and ultra-high-dimensional datasets. In other words, making and preparing the appropriate datasets requires extreme precision in code writing in R software, then proper code writing for the GUIDE method. Code writing will be increasingly difficult as compositional high-dimensional collections become more complicated.
In order to achieve the best binary classification, Figure 7 illustrates which OUTs are highly significant and should be proposed for splitting the compositional high-dimensional dataset. The OTU indexed by 4154 is the best split variable for binary classification, as demonstrated among 6996 OTUs. Furthermore, it can be observed from Table 2 about the GUIDE classification tree outcomes for the actual dataset that this non-parametric approach can effectively rival the general DNN and its established penalized ones. Put differently, the classification tree with the GUIDE can be viewed as an assessment tool and a formidable competitor to our proprietary methods. Using the general DNN technique and the GUIDE findings in Table 2, we find that they are very competitive with each other for OTU dataset classification but not for developed penalized DNNs. Stated differently, two non-parametric approaches (the general DNN and GUIDE) nearly produce the identical results for both simulated and actual OTU datasets. It means that their ability for classification of OTU datasets is close to each other.
4. Conclusions and Future Works
In this paper, we developed a deep neural network method with respect to the notion of shrinkage GLM penalization strategies. Also, in this paper, the ratio of and norms has been used empirically as a criterion of combination of and to extend the concept and application of the DNNs. The elastic-net penalty, as a linear combination of ridge and bridge penalties, is a shrinkage regularization and penalization method for the elastic-net model. The curse of dimensionality and multicollinearity in large-scale datasets can be effectively addressed with the elastic-net technique. After understanding the idea behind the elastic-net technique, we attempted to apply it to the deep neural network algorithm determinative cores. Therefore, we increased the performance of the DNN algorithm in comparison with its general (classical) case. This improvement was confirmed through simulation and a real microbiome dataset via a non-parametric shrinkage penalized classification approach (shrinkage penalized DNN classification).
Furthermore, the outcomes of using a classification tree with the “GUIDE” method on the same simulated and real microbiome data are very competitive. The outcomes of a competition between two distinct non-parametric techniques (general and developed shrinkage penalized DNN models vs. the GUIDE method) show that the GUIDE method is extremely competitive with the general DNN. Still, as compared to the GUIDE, the classification results favor the developed shrinkage penalized DNN models. As a result, the researchers may choose a non-parametric scheme for their model building and prediction based on factors including time, budget, and required accuracy.
Many research titles can be proposed in relation to the conducted research: in order to generalize the findings and outcomes, more simulation and real-world studies on larger datasets (including and excluding compositional high-dimensional datasets) will be useful. Working on the some function of hyperparameters of the shrinkage penalized methods in order to extend them in the DNN algorithm can be considered as a new interesting research (the shrinkage regularization process can include the generalization of hyperparameters), particularly developing the DNN algorithm’s theoretical foundations first and then its implementation. Also, further investigation into the implementation of and models in comparison with and penalties, with an eye toward (compositional) high-dimensional datasets, would be a worthwhile avenue for future research. Finally, examining the variation in cost functions across all versions of developed DNN models could be a future study focus.
Conceptualization, R.M.Y.; Methodology, M.B. and N.A.H.; Software, M.B.; Validation, M.R. and N.A.H.; Formal analysis, M.B. and R.M.Y.; Investigation, M.B., S.B.M. and N.A.H.; Resources, N.A.H.; Data curation, M.B.; Writing—original draft, M.B.; Writing—review & editing, M.B., S.B.M., M.R. and N.A.H.; Visualization, M.R.; Supervision, S.B.M. and R.M.Y.; Project administration, N.A.H.; Funding acquisition, S.B.M., M.R. and N.A.H. All authors have read and agreed to the published version of the manuscript.
Data is contained within the article. Further inquiries can be directed to the corresponding author.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Forwarding process: (a) Perceptron, (b) NN, (c) DNN, and (d) DNN calculations.
Figure 2 Two-dimensional plots. The blue line displays the behavior of the ridge penalty. The red line displays the behavior of the Lasso penalty. The combination of these two lines makes a bed for elastic-net penalty. In other words, the ratio of the effect of the ridge or the Lasso (combinations of the effects) introduces an elastic-net penalty. The Lasso or the
Figure 3 Two dimensional plots. Generalization of the ridge and Lasso penalties. The dotted red line is the Lasso penalty, and the rest are ridge lines whose combination (the effect of their ratio) introduces ridge&brdige penalty.
Figure 4 Illustration of general structure of shrinkage penalized deep neural network. Regarding the implementation of the elastic-net penalty function, each layer receives its own penalty individually for the node nullifying and shrinkage process (OTU (Operational Taxonomic Unit) data is a fundamental way to represent microbiome data). Note that, because of implementing penalized functions in the structure of the DNN, the number of each selected OTU in each layer may differ from the other layer. So as seen, in the input layer, the number of OTUs is “n”; in the first layer, the number of selected OTUs after the penalization process is “
Figure 5 General presentation of a developed penalized DNN model that is used to classify microbiome data.
Figure 6 OTU (Operational Taxonomic Unit) data is a fundamental way to represent microbiome data. (a) Classification tree for predicting OTU-Class using estimated priors and unit misclassification costs. Tree constructed with 200 observations. The maximum number of split levels is 10 and the minimum node sample size is 5. At each split, an observation goes to the left branch if and only if the condition is satisfied. Predicted classes and sample sizes printed below terminal nodes; class sample proportions for OTU = 0 and 1 beside nodes. The second-best split variable at root node is
Figure 7 Classification tree by “GUIDE” for predicting “OTUs” using estimated priors and unit mis-classification costs. OTU (Operational Taxonomic Unit) data is a fundamental way to represent microbiome data. The tree is constructed with 675 observations. The maximum number of split levels is equal to 10 and the minimum node sample size is equal to 6. At each split, an observation goes to the left branch if and only if the condition is satisfied. Predicted classes and sample sizes printed below terminal nodes; class sample proportions for “Classes = 1 and 2” beside nodes. The second-best split variable at root node is
Simulated compositional high-dimensional data
| Type of | Prediction | Sensitivity | Prediction | Sensitivity | Prediction | Sensitivity |
|---|---|---|---|---|---|---|
| | 80 | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | 82 | | 80 | | | |
| | | | | | | |
| | | | 82 | | | |
| | | | | | | |
| | 84 | | | | | |
| | | | | | | |
| GUIDE | 81 | 82 | | | | |
| | | | | | |
The performance of the
Real compositional high-dimensional data
| Type of | Prediction | Sensitivity | Prediction | Sensitivity | Prediction | Sensitivity |
|---|---|---|---|---|---|---|
| | | | | | | |
| | | | | |||
| | | | | | | |
| | | | | |||
| | | | | | | |
| | | | | |||
| | | | | | | |
| | | | | |||
| | | | | | | |
| | | | | |||
| GUIDE | | | | | | |
| | | | |
The performance of the
Appendix A
Developing NN Algorithm to DDNs Algorithms
Here, we focus on the back-propagation process to demonstrate how to generalize the related penalty equations of
It is necessary to take into account that our direct development is dependent on two steps of the back-propagation equation when creating the regularization of DNN. So, in light of the equations for back-propagation, changes for development will be made. For instance, in a DNN structure (see
As can be found, we have to derive the terms
Figure A1 A deep neural network with two hidden layers. The first layer is the input data, and the last layer predicts the 2 response variables. The last node in each input layer (+1) represents the bias term. Here the number of layers
As we are using the sigmoid activation function, our cost function is given by the following logistic regression cost or binary cross-entropy function:
Now, with respect to Equations (
Now, for taking the derivative of
So, if we write
Finally, just the last term is left,
Now, with respect to Equations (
Similarly, we can find
The first two terms are nothing but
For simplicity, we write
Now we move to the “next layer” and calculate
So, to complete it, we are implementing the chain rule in this way:
As it was calculated before, the first two terms are
Likewise, before and with respect to equation of
Now, it is time to move another layer backward and calculate
Now, with respect to Equation (
Again, the under-brace term in equation above can be called
Up to now, because of writing restrictions, we tried to show how we have to extend a DNN algorithm with two hidden layers.
In the following, as an example of the extension of hidden layers in the DNN algorithm, we show the structure of its extension for seven hidden layers. So, with respect to the dimensions, the related chains for seven hidden layers are as below:
It is worth noting that, while working with the high-dimensional datasets that require more hidden layers, there are numerous significant hardware limitations during the implementation process in addition to the complexity of the programming and calculations. As a result, the algorithm for our DNN in this research was constrained based on the establishment and development of three hidden layers.
As it was mentioned before, we extract the basics of the development of DNN algorithms based on two hidden layers. If there are multiple numbers of hidden layers, then we just need to continue repeating these processes. Only the
1. Faraway, J.J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models; Chapman and Hall: New York, NY, USA, 2016.
2. Penrose, R. The emperor’s new mind: Concerning computers, minds, and the laws of physics. Behav. Brain Sci.; 1990; 13, pp. 643-655. [DOI: https://dx.doi.org/10.1017/S0140525X00080675]
3. Ciaburro, G.; Venkateswaran, B. Neural Networks with R: Smart Models Using CNN, RNN, Deep Learning, and Artificial Intelligence Principles; Packt Publishing Ltd.: Birmingham, UK, 2017.
4. Liu, Y.H.; Maldonado, P. R Deep Learning Projects: Master the Techniques to Design and Develop Neural Network Models in R; Packt Publishing Ltd.: Birmingham, UK, 2018.
5. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw.; 1989; 2, pp. 359-366. [DOI: https://dx.doi.org/10.1016/0893-6080(89)90020-8]
6. Roozbeh, M.; Maanavi, M.; Mohamed, N.A. Penalized least squares optimization problem for high-dimensional data. Int. J. Nonlinear Anal. Appl.; 2023; 14, pp. 245-250.
7. De-Julián-Ortiz, J.V.; Pogliani, L.; Besalú, E. Modeling properties with artificial neural networks and multilinear least-squares regression: Advantages and drawbacks of the two methods. Appl. Sci.; 2018; 8, 1094. [DOI: https://dx.doi.org/10.3390/app8071094]
8. Hammerstrom, D. Working with neural networks. IEEE Spectr.; 1993; 30, pp. 46-53. [DOI: https://dx.doi.org/10.1109/6.222230]
9. Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics; 2023; 12, 3106. [DOI: https://dx.doi.org/10.3390/electronics12143106]
10. Hertz, J.A. Introduction to the Theory of Neural Computation; Chapman and Hall: New York, NY, USA, 2018.
11. Fan, J.; Li, R. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. arXiv; 2006; arXiv: math/0602133
12. Huang, J.; Breheny, P.; Lee, S.; Ma, S.; Zhang, C. The Mnet method for variable selection. Stat. Sin.; 2016; 26, pp. 903-923. [DOI: https://dx.doi.org/10.5705/ss.202014.0011]
13. Farrell, M.; Liang, T.; Misra, S. Deep neural networks for estimation and inference. Econometrica; 2021; 89, pp. 181-213. [DOI: https://dx.doi.org/10.3982/ECTA16901]
14. Kurisu, D.; Fukami, R.; Koike, Y. Adaptive deep learning for nonlinear time series models. Bernoulli; 2025; 31, pp. 240-270. [DOI: https://dx.doi.org/10.3150/24-BEJ1726]
15. Bach, F. Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res.; 2017; 18, pp. 1-53.
16. Celentano, L.; Basin, M.V. Optimal estimator design for LTI systems with bounded noises, disturbances, and nonlinearities. Circuits Syst. Signal Process.; 2021; 40, pp. 3266-3285. [DOI: https://dx.doi.org/10.1007/s00034-020-01635-z]
17. Liu, F.; Dadi, L.; Cevher, V. Learning with norm constrained, over-parameterized, two-layer neural networks. J. Mach. Learn. Res.; 2024; 25, pp. 1-42.
18. Shrestha, K.; Alsadoon, O.H.; Alsadoon, A.; Rashid, T.A.; Ali, R.S.; Prasad, P.W.C.; Jerew, O.D. A novel solution of an elastic net regularisation for dementia knowledge discovery using deep learning. J. Exp. Theor. Artif. Intell.; 2023; 35, pp. 807-829. [DOI: https://dx.doi.org/10.1080/0952813X.2021.1970237]
19. Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv; 2016; arXiv: 1611.03530[DOI: https://dx.doi.org/10.1145/3446776]
20. Mulenga, M.; Kareem, S.A.; Sabri, A.Q.; Seera, M.; Govind, S.; Samudi, C.; Saharuddin, M.B. Feature extension of gut microbiome data for deep neural network-based colorectal cancer classification. IEEE Access; 2021; 9, pp. 23565-23578. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3050838]
21. Namkung, J. Machine learning methods for microbiome studies. J. Microbiol.; 2020; 58, pp. 206-216. [DOI: https://dx.doi.org/10.1007/s12275-020-0066-8]
22. Topçuoğlu, B.D.; Lesniak, N.A.; Ruffin, M.T.; Wiens, J.; Schloss, P.D. A framework for effective application of machine learning to microbiome-based classification problems. mBio; 2020; 11, 10-1128. [DOI: https://dx.doi.org/10.1128/mBio.00434-20]
23. Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural networks: A review. J. Med. Syst.; 2018; 42, 226. [DOI: https://dx.doi.org/10.1007/s10916-018-1088-1]
24. Lo, C.; Marculescu, R. MetaNN: Accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinform.; 2019; 20, 314. [DOI: https://dx.doi.org/10.1186/s12859-019-2833-2]
25. Reiman, D.; Metwally, A.; Dai, Y. Using convolutional neural networks to explore the microbiome. Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Jeju, Republic of Korea, 11–15 July 2017; pp. 4269-4272. [DOI: https://dx.doi.org/10.1109/EMBC.2017.8037799]
26. Arabameri, A.; Asemani, D.; Teymourpour, P. Detection of colorectal carcinoma based on microbiota analysis using generalized regression neural networks and nonlinear feature selection. IEEE/ACM Trans. Comput. Biol. Bioinform.; 2018; 17, pp. 547-557. [DOI: https://dx.doi.org/10.1109/TCBB.2018.2870124]
27. Fiannaca, A.; La Paglia, L.; La Rosa, M.; Lo Bosco, G.; Renda, G.; Rizzo, R.; Gaglio, S.; Urso, A. Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform.; 2018; 19, pp. 61-76. [DOI: https://dx.doi.org/10.1186/s12859-018-2182-6]
28. Loh, W.Y. Classification and Regression Trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.; 2011; 1, pp. 4-23. [DOI: https://dx.doi.org/10.1002/widm.8]
29. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. Stat. Methodol.; 1996; 58, pp. 267-288. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1996.tb02080.x]
30. Segal, M.; Dahlquist, K.; Conklin, B. Regression approach for microarray data analysis. J. Comput. Biol.; 2003; 10, pp. 961-980. [DOI: https://dx.doi.org/10.1089/106652703322756177] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/14980020]
31. Efron, B.; Hastie, T.; Johnstone, I.R. Least angle regression. Ann. Statist.; 2004; 32, pp. 407-499. [DOI: https://dx.doi.org/10.1214/009053604000000067]
32. Yin, P.; Esser, E.; Xin, J. Ratio and difference of L1 and L2 norms and sparse representation with coherent dictionaries. Commun. Inf. Syst.; 2014; 14, pp. 87-109. [DOI: https://dx.doi.org/10.4310/CIS.2014.v14.n2.a2]
33. Frank, L.E.; Friedman, J.H. A statistical view of some chemometrics regression tools. Technometrics; 1993; 35, pp. 109-135. [DOI: https://dx.doi.org/10.1080/00401706.1993.10485033]
34. Fan, J.; Lv, J. A selective overview of variable selection in high dimensional feature space. Stat. Sin.; 2010; 20, 101.
35. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. Stat. Methodol.; 2005; 67, pp. 301-320. [DOI: https://dx.doi.org/10.1111/j.1467-9868.2005.00503.x]
36. Tyler, A.D.; Smith, M.I.; Silverberg, M.S. Analyzing the human microbiome: A “How To” guide for physicians. Am. J. Gastroenterol.; 2014; 109, pp. 983-993. [DOI: https://dx.doi.org/10.1038/ajg.2014.73]
37. Sender, R.; Fuchs, S.; Milo, R. Revised estimates for the number of human and bacteria cells in the body. PLoS Biol.; 2016; 14, e1002533. [DOI: https://dx.doi.org/10.1371/journal.pbio.1002533] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27541692]
38. Bharti, R.; Grimm, D.G. Current challenges and best-practice protocols for microbiome analysis. Briefings Bioinform.; 2021; 22, pp. 178-193. [DOI: https://dx.doi.org/10.1093/bib/bbz155]
39. Qian, X.B.; Chen, T.; Xu, Y.P.; Chen, L.; Sun, F.X.; Lu, M.P.; Liu, Y.X. A guide to human microbiome research: Study design, sample collection, and bioinformatics analysis. Chin. Med. J.; 2020; 133, pp. 1844-1855. [DOI: https://dx.doi.org/10.1097/CM9.0000000000000871]
40. Xia, Y.; Sun, J.; Chen, D.G. Statistical Analysis of Microbiome Data with R; Springer: Singapore, 2018.
41. Wang, Q.Q.; Yu, S.C.; Qi, X.; Hu, Y.; Zheng, W.J.; Shi, J.X.; Yao, H.Y. Overview of logistic regression model analysis and application. Chin. J. Prev. Med.; 2019; 53, pp. 955-960.
42. Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Ser. Methodol.; 1982; 44, pp. 139-160. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1982.tb01195.x]
43. Aitchison, J.; Barceló-Vidal, C.; Martín-Fernández, J.A.; Pawlowsky-Glahn, V. Logratio Analysis and Compositional Distance. Math. Geol.; 2000; 32, pp. 271-275. [DOI: https://dx.doi.org/10.1023/A:1007529726302]
44. Martín-Fernández, J.A.; Barceló-Vidal, C.; Pawlowsky-Glahn, V. Dealing with zeros and missing values in compositional datasets using nonparametric imputation. Math. Geol.; 2003; 35, pp. 253-278. [DOI: https://dx.doi.org/10.1023/A:1023866030544]
45. Martín-Fernández, J.A.; Palarea-Albaladejo, J.; Olea, R.A. Dealing with Zeros, Compositional Data Analysis: Theory and Applications; John Wiley and Sons: New York, NY, USA, 2011.
46. Martín-Fernández, J.A.; Hron, K.; Templ, M.; Filzmoser, P.; Palarea-Albaladejo, J. Model-based replacement of rounded zeros in compositional data: Classical and robust approaches. Comput. Stat. Data Anal.; 2012; 56, pp. 2688-2704. [DOI: https://dx.doi.org/10.1016/j.csda.2012.02.012]
47. Martín-Fernández, J.A.; Hron, K.; Templ, M.; Filzmoser, P.; Palarea-Albaladejo, J. Bayesian-multiplicative treatment of count zeros in compositional datasets. Stat. Model.; 2015; 15, pp. 134-158. [DOI: https://dx.doi.org/10.1177/1471082X14535524]
48. Loh, W.Y. Fifty years of classification and regression trees. Int. Stat. Rev.; 2014; 82, pp. 329-348. [DOI: https://dx.doi.org/10.1111/insr.12016]
49. Loh, W.Y.; Zhou, P. The GUIDE Approach to Subgroup Identification, Design and Analysis of Subgroups with Biopharmaceutical Applications; Springer: Cham, Switzerland, 2020; pp. 147-165.
50. Turnbaugh, P.J.; Ridaura, V.K.; Faith, J.J.; Rey, F.E.; Knight, R.; Gordon, J. The effect of diet on the human gut microbiome: A metagenomic analysis in humanized gnotobiotic mice. Sci. Transl. Med.; 2009; 1, 6ra14. [DOI: https://dx.doi.org/10.1126/scitranslmed.3000322] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20368178]
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.