1. Introduction
Semantic segmentation is a process that involves the labeling of image pixels to distinguish between objects of interest and the background of the image; see Figure 1. This task is challenging since the analyzed images have variations that can add noise to the segmentation process.
Image segmentation has numerous applications in computer vision, including analyzing medical images for diagnostic purposes [1,2]. Many methodologies have been devised to address this issue [3]; however, contemporary research has demonstrated a marked predilection for deploying Convolutional Neural Networks (CNNs) [4]. This is primarily because CNNs have been shown to yield optimal performance across a diverse array of applications. Nevertheless, CNNs are often characterized as “black boxes” due to the complexity of their internal processes, making it challenging to comprehend their decision-making processes [5]. This limitation is particularly problematic in domains where explainability is paramount, such as medicine.
Convolutional Decision Trees (CDTs) [6] present a viable alternative to CNNs for image segmentation. This method involves the construction of a multivariate decision tree, where a convolution kernel defines each condition on the tree nodes. These kernels are learned in a supervised manner by solving an optimization problem that aims to maximize the classification accuracy of the pixels in the segmentation of grayscale images.
Several techniques have been proposed to induce CDTs, considering different approaches to maximize the accuracy in classifying the image pixels for the segmentation task. The original approach uses an analytical process to maximize the information gain function on each node of the tree [6]. Consequently, it is a local greedy search that partitions the data (pixels) on each node of the CDT, but it can result in overfitting during the learning process. To conduct a global search approach, in [7], the capabilities of the differential evolution (DE) algorithm are analyzed to identify competitive convolution kernels to induce CDTs that maximize the F1-score in the classification task. The DE algorithm is a metaheuristic search strategy to solve optimization problems by exploring the problem domain with several parameters and stochastic elements that must be adapted to the specific problem [8]. In [7], each individual in the DE algorithm represents all the kernels of a CDT, so a set of CDTs are obtained to select the one with the best F1-score.
Both techniques consider CDTs with kernels of the same size in the tree’s nodes. In [9], a local search is performed using the DE algorithm to identify the optimal convolution kernel size and the convolution kernel that maximizes the F1-score on each node of the CDT. So, this technique obtains a unique CDT with convolutional kernels of different sizes.
Since the DE algorithm requires several parameters that must be adapted to the specific dataset (images in the training set) employed to induce a CDT, in [10], the SHADE algorithm is used instead of the traditional DE algorithm. SHADE saves the values of the successful parameters that guide the optimization process in a historical memory H, so the best values are considered for the DE process. This approach facilitates the attainment of comparable or superior results without establishing two parameters for the evolutionary process.
It is important to note that all the aforementioned methods necessitate substantial computational time and memory during the induction process. This is due to the requirement to classify each image pixel in the training set to evaluate the fitness function. This classification process is essential regardless of the method employed, be it an analytical approach or a search with differential evolution. Additionally, the nature of the search, whether local or global, influences the evaluation process. For local searches, evaluations are conducted for each kernel node, while, for global searches, evaluations are performed on all the proposed CDTs.
For this reason, in [11], two techniques for selecting a representative sample of pixels from an image for the model’s training process are proposed: raw selection and median selection. The results in the cited work demonstrate that these techniques reduce the computational time required for CDT induction while maintaining or enhancing the precision of the resulting segmentation.
Despite the advancements mentioned above, previous studies have indicated that numerous segmentation databases comprise color images, necessitating their conversion to grayscale to execute CDT induction processes. This conversion procedure often results in the loss of critical information. Consequently, this work proposes a comprehensive enhancement to the induction process of a CDT, considering the use of color images within the learning process and applying the median selection technique to reduce the computational cost that this generalization may entail.
So, the primary objective of this project is to implement a color image segmentation algorithm based on CDTs using the SHADE algorithm, operating under the hypothesis that it is possible to characterize the kernels corresponding to the CDT with the SHADE process by considering chromosomes as vectors that represent the three channels of the convolution kernels of the CDT nodes. The novel aspect of this project lies in its potential to broaden the scope of knowledge about semantic segmentation and CDT-based learning, by leveraging color images and three-channel tree structures, in contrast to the prevailing practice of grayscale image utilization.
The performance of the proposed model was evaluated using the “Weizmann Horse dataset” and “Digital Retinal Images for Vessel Extraction” (DRIVE) dataset, resulting in favorable outcomes regarding the segmentation performance of the induced CDTs, and a significant reduction in the processing time and memory required for the induction process when compared with previous works [6,7,9,10,11]. The findings of this study underscore the significance of incorporating color characterization into the segmentation process for CDTs. Moreover, the proposal facilitates the comprehension of the methodology employed for image pixel classification given the intuitive nature of the graphical representation of decision trees compared to the approach used by CNNs.
The remainder of the paper is structured into four sections. Section 2 describes the DE algorithm and highlights the most relevant characteristics of the Convolutional Decision Trees. This section also presents the details of the median selection technique employed for the reduction in computational time and memory and the proposal for the induction of CDTs with color images. Section 3 is dedicated to the experiments and results obtained. Finally, Section 4 and Section 5 offer a detailed discussion, conclusion, and consider future work.
2. Materials and Methods
In this section, the main subjects of the project are presented: the differential evolution algorithm and SHADE. Also, a description of the Convolutional Decision Trees (CDTs) and the methodology proposed for CDT induction using SHADE on color images is presented.
2.1. Differential Evolution Algorithm and SHADE
Differential Evolution (DE) algorithm is a stochastic research strategy for optimization problems [12]. In the DE process, a competitive solution for a given fitness function is found by perturbing a population of feasible solutions through an iterative process composed of three operators: crossover, mutation, and selection. There are several DE strategies [8,13], but the DE/rand/1/bin strategy is described in detail next.
For a given fitness function f, a population of feasible solutions (individuals) is randomly generated. Then, a mutant vector is obtained for each target vector in the population with the mutation operator described in Equation (1), where , , and are individuals randomly selected from the population, and F is the scale factor defined by the user.
(1)
The mutant vector and its corresponding target vector are used as parents to generate a trial vector , called offspring, by selecting between the parents’ coordinates, with the process described in Equation (2). If a random number is less than a Crossing Rate , defined by the user, or the coordinate position j corresponds to one previously determined by chance (), the component takes the value from . Otherwise, it takes the value from .
(2)
Once the trial vector is generated, a binary tournament between it and the corresponding target vector is performed to determine what vector survives to the next generation, selecting the one with the best fitness. If the trial vector has the best fitness, it replaces the corresponding target vector in the next generation. Otherwise, the target vector remains in the population, and the trial vector is discarded.
This process is performed times (generations), so the population is becoming better or remains the same concerning fitness but never worse; see Figure 2.
In the DE algorithm, the parameters , F, , and are problem-dependent and significantly influence the performance of the process [14]. Consequently, it is imperative to calibrate these parameters when implementing the algorithm in real-world scenarios to achieve optimal outcomes. For these reasons, numerous self-adaptive mechanisms to adjust these parameters have been the focus of academic study, including JADE [15], SHADE [16], and L-SHADE [17].
The present study focuses on the Success-History-Based Adaptive Differential Evolution (SHADE) algorithm. SHADE is a technique that regulates the DE algorithm parameters, and F, through an adaptive parameter control mechanism. This technique uses a historical memory of size H with successful parameters to modify and adapt the values during the search process; see Figure 2. Further details about SHADE can be found in [16].
2.2. Convolutional Decision Trees
The Convolutional Decision Tree (CDT) is a supervised machine learning model employed for image segmentation [6]. It has the graphical structure of an oblique (multivariate) decision tree, which is easy to interpret [9,10].
The input of a CDT is a digital image composed of pixels with specific positions and hues. Until this work, the images used to induce a CDT were restricted to grayscale images, which are two-dimensional arrays where each pixel has a unique integer value between 0 and 255 (the range of values that an 8-bit number can represent); see Figure 3.
Convolution kernels represent the primary components of CDTs. They are square arrays of discrete numbers, known as weights, that function as feature extractors through the convolution operation they execute. The convolution process entails transferring a kernel over an image, thereby integrating the information from both sets. This integration is achieved by applying a dot product between the kernel values and the corresponding values of the image region the kernel covers. The resultant pixel value is derived from this product in conjunction with the bias associated with the filter, thus generating an output feature map; see Figure 4. This process is repeated until the convolution kernel has passed through all pixels of the input image, at which point the full convolution is complete. The result is a filtered version of the image.
CDT employs the values obtained in the feature map to classify the pixels in the image according to a predefined criterion, called predicated . It is a function that has two output values, 0 or 1, depending on whether the value in the feature map is less than or equal to zero or greater than zero, respectively. This is represented in Equation (3), where denotes the convolution kernel and x is the information of a pixel and its neighboring pixels, which together form an array of the same size of the kernel.
(3)
In this manner, each node of the CDT is represented by a convolutional kernel that partitions the data into two subsets of pixels. The value of a pixel serves as a directive, guiding toward the subsequent branch of the tree. Pixels with values equal to zero go to the left branch, and pixels with values equal to 1 go to the right branch. Consequently, the kernels within a CDT execute successive classifications of each pixel until a leaf node is reached, as illustrated in Figure 5. The value obtained in a leaf node of a CDT is the label or class assigned to the pixel.
The values of the weights of these kernels are learned in a supervised manner. The initial method proposed for the CDT induction is a local greedy search [6]. To enhance performance, the methods proposed in [7,9] use the DE algorithm to induce a CDT. However, these methods have parameters that must be defined by the user and directly affect the model’s performance. So, the use of the SHADE algorithm is proposed in [10] to auto-calibrate the parameters in the DE process.
Another challenge that all these proposals have to deal with is the computational cost of performing the learning process in terms of time and memory, so, in [11], two techniques were proposed (row and median selection). With these techniques, a representative subset of pixels in the images of the training set are selected, speeding up the learning process while maintaining and even improving the performance of the CDT.
In this study, an adaptation of the SHADE-CDT technique will be employed for the induction of a CDT that enables the segmentation of color images, using the median selection technique to reduce the computational cost of the process. The following sections will provide a concise description of these subjects.
2.2.1. SHADE-CDT Technique for CDT Induction
The SHADE-CDT technique performs a global search to induce a CDT [10]. It uses the variant of the differential evolution called SHADE algorithm. This variant tunes the values for the parameters of the crossover and mutation operators in the differential evolution process, described in Section 2.1, by adding a memory of successful parameters.
In the SHADE-CDT technique, each individual in the population conforms to a CDT, so they are vectors of size with possible weights for the kernels in the internal nodes of the CDT, obtained randomly between −255 and 255. In Figure 6, an example of a node kernel encoding for a CDT of depth with kernels of size is shown. In this case, each kernel node has values, and there are nodes, so, each individual for the SHADE-CDT technique is a vector with values randomly obtained between −255 and 255.
The DE algorithm employs the F1-score metric to evaluate the accuracy of a model’s label assignment. This is achieved by comparing the predicted labels with the actual labels of the instances. To calculate the F1-score value, it is necessary to obtain the labels assigned to each pixel on the images in the training dataset with each individual of the population (representing a CDT) and compare them with the actual labels of the pixels. To achieve this, the images in the training dataset are preprocessed to obtain pixel representations as coded vectors. The corresponding encoded vector for a pixel is formed by the values of the pixels in the neighborhood of size surrounding it. The length of this vector depends on the kernel size s provided by the user, with a value of 1 added as a bias value; see Figure 7.
After encoding the pixel information and the kernel structure as vectors of equivalent size, a dot product is performed between them. The resulting value is then processed by an activation function. For the particular case of the SHADE-CDT method, the sigmoid function, defined in Equation (4), is employed.
(4)
This function returns a value between 0 and 1. This value undergoes a classification step in which value 1 is assigned to values greater or equal to 0.5 and 0 to values less than 0.5. The tree node where the instance needs to go is then decided based on this classification step, considering the label 0 for the branch on the left side of the node and 1 for the branch on the right side. This procedure is repeated until a leaf node is reached, at which point a label is assigned to the instance, as illustrated in Figure 8.
Subsequent to the acquisition of all the labels for all the pixels in the images in the training dataset, the F1-score metric, also known as Dice coefficient, is employed to evaluate the fitness of the corresponding CDT. The F1-score is calculated with Equation (5), where is the total of true positive cases, is the total of false positives, and is the total of false negatives.
(5)
An F1-score value approaching zero is indicative of poor segmentation, whereas a value approaching one is indicative of good segmentation.This process is performed on each individual in the population under consideration for the SHADE process to identify the individual (CDT) with the highest F1-score after multiple generations.
As is made evident by the process, it envolves a substantial amount of computational time and memory. For this reason, two techniques to address this issue were proposed in [11]: the raw selection and the median selection techniques. Given the results obtained with these methods, the median selection technique is employed in this work as considering color images directly affects the amount of information taken into account in the learning process.
Also, it is important to mention that, for this method, five parameters are user-defined: For the SHADE-CDT algorithm: the population size (), the number of generations (), and the memory size (H). For the CDT structure: the size (s) of the kernels and the depth (d) of the tree.
Further details on the SHADE-CDT method can be found in [10].
2.2.2. Median Selection Technique for CDT Induction
The median selection technique involves the selection of samples of representative pixels from each of the two classes to be segmented in the images of the training set [11]. With the median selection technique, a partition of the dataset is performed to obtain the median vector with the representative information of each subset in the partition.
In this technique, a proportion value is needed. This value indicates the quantity or proportion of the total image information sampled for the induction process. So, if the user establishes , this means that the process will find the representative information equivalent to only of the total information. For example, in Figure 9, a grayscale image and its ground truth are presented. Remember that the ground truth contains the labels for each pixel from the original image, considering two classes: 0 (black) and 1 (white). This image has dimensions , yielding a total of 390 pixels, 82 of which are classified as class 1 (white) and 308 as class 0 (black). Assuming that for the median technique, the image information will go through the corresponding process that results in the extraction of the representative information, as if the image were composed at most of 78 pixels, equivalent to 20% of the total pixels ().
A distinctive feature of this process is preserving the proportion between the classes. In the example of Figure 9, approximately 21% of the pixels in the original image are designated as class 0, while 79% are labeled as class 1. To maintain this ratio, the technique generates pixels equivalent to 20% of each class, i.e., 16 pixels for class 1 and 61 pixels for class 0 ( and ), consequently maintaining the proportion 21%/79% between classes and a total of 77 representative pixels for this image.
The generation of pixels is achieved through the implementation of a partitioning technique, which divides each class into a predefined number of subsets of the same size if possible. The number of subsets is equivalent to the number of pixels desired for each class. So, for the example, class 1 has 82 pixels, but, with the technique at , only 16 pixels of this class are required. So, the pixels of this class are partitioned into 16 disjoint subsets: 15 subsets of size 5 and 1 subset of size 7 (7 (). The pixels on each subset are randomly selected from the pixels of class 1, but, remembering that a partition is a division of a set into a family of subsets where each element in the original set is present only in one of the subsets and all the subsets together contain all the members of the original set, once a pixel is selected for a subset, it cannot be in any other subset. This process is repeated for the class 0 pixels, resulting in the acquisition of information for only 61 pixels out of 308 pixels that were originally present. Consequently, the partition of this class is constituted by 61 disjoint subsets, 60 subsets of size 5, and 1 subset of size 8 (). As in the other class, the pixels within each subset are randomly selected from the pixels of class 0.
Following the implementation of this partition, denoted by , a total of 77 subsets are obtained, 16 of class 1 and 61 of class 0. For each subset, denoted by , the vectors associated with each pixel in the grayscale image are obtained, considering the kernel size s of the CDT that is sought to be obtained, as illustrated in Figure 9. Subsequent to the acquisition of these vectors, the representative vector of this subset is formed by the median value calculated by coordinates and depicted in the right image of Figure 9. These “mean vectors” have the class of the subset from which they were derived.
Subsequent to the completion of the process of obtaining representative vectors of each image in the training set, the new training set for the induction process of a CDT will consist of all these vectors. The pseudocode of the median selection technique is presented in Algorithm A1 of Appendix B.
As demonstrated in the experiments performed in [11], this technique has been shown to reduce the computational time required to induce a CDT on grayscale images. This reduction is due to a decrease in the number of training instances. In addition, it has been shown to improve the quality of the segmentation results by minimizing the risk of overfitting in the learning process. For these reasons, the median selection technique is used in this work to induce CDTs using color images (with three color channels). This generalization to color images was not even considered because the process triples the number of elements in the training dataset. However, the median selection technique allows datasets with color images to perform the CDT induction process without losing critical information when converted to grayscale images.
The following sections describe images in color spaces and the proposed process to induce CDTs in three color channels.
2.3. Color Images and Color Spaces
For this work, the images used as an input are color images. Usually, to represent color in digital images, the RGB color space is used. The RGB color space is an additive model, wherein colors are created through the combination of three primary colors (red (R), green (G), and blue (B)). So, the RGB model can be conceptualized as a cube whose coordinate axes correspond to R, G, and B; see Figure 10a. All RGB values are constrained to the interval [0, 255]. Consequently, each possible color corresponds to a unique point in the RGB cube with coordinates in the space . For instance, the black and white colors are located at (0, 0, 0) and (255, 255, 255), respectively, in the RGB color space. This indicates that the white color is representative of an absence of the three primary colors, while the white color is itself a combination of the three colors.
The RGB color space can be converted to other color spaces, such as L*a*b* and HSV, to obtain different color representations; see Figure 10. These alternative representations can help to extract color features and patterns [18]. For this reason, the following sections will briefly describe the CIE L*a*b* and HSV color spaces.
2.3.1. CIE L*a*b* Color Space
The CIE L*a*b* color space is a mathematical transformation of the colorimetric system in which the numerical differences between colors consistently correspond to the visual perception of the human eye [19,20]. The CIE L*a*b* space is three-dimensional. The vertical axis represents the lightness L*, with values ranging from 0 (black) to 100 (white). The primary color axes extend from red to green (a* axis) and from yellow to blue (b* axis), as illustrated in Figure 10b. In the a* axis, a positive number indicates a redder color, and a negative number indicates greener color. In the b* axis, a positive number indicates yellower color, and a negative number indicates bluer color.
Consequently, each possible color corresponds to a unique point in the space . For example, the colors white and black are located in the coordinates (0, 0, 0) and (100, 0, 0), respectively, and the primary colors red, green, and blue are located (approximately) in the coordinates (53.24, 80.09, 67.2), (87.73, −86.18, 83.18), and (32.3, 79.19, −107.86), respectively.
2.3.2. HSV Color Space
In the HSV color space, color information is represented through three components: hue (H), saturation (S), and value (V). This model is commonly depicted as an inverse pyramid, where the vertical axis represents the value (V), the horizontal distance, with the V axis used as a reference point, corresponds to the saturation (S), and the angle, with the V axis acting as the rotation point, defines the hue (H) [21]. Refer to Figure 10c for a visual representation. In this color space, the parameters on H are color indicators with angular dimensions. The black color in the HSV space is located at the apex of the pyramid, while white occupies the center of the base. Within the base, the three fundamental colors (red, green, and blue) and their respective combinations are distributed.
So, each possible color corresponds to a unique point in the HSV space with coordinates in the space . For example, the colors white and black have the coordinate values (0, 0, 100) and (0, 0, 0), respectively. The fundamental colors are located at the base of the pyramid at positions (0, 100, 100) for red, (120, 100, 100) for green, and (240, 100, 100) for blue.
2.3.3. Representation of Color Images
Images in the previously described color spaces are structured in three dimensions since the color values of each pixel are described as a vector with three coordinates, as in Equation (6).
(6)
Thus, a composition by channels can be used to describe the structure of a color image C considering a representation of three matrices of the same size as C, denoted by , , and ; see Equation (7).
(7)
These matrices have as element in the position the corresponding color channel value of the pixel at the same position in the original image, as shown in Figure 11 and in Equation (8).
(8)
Thus, any color image can be represented by three two-dimensional matrices that display the values of the corresponding coordinate channels at the corresponding pixel locations.
The subsequent section presents a detailed description of the convolutional process on images with three channels. This provides a structured framework for the proposal outlined in this work.
2.4. Convolution Process on Images with Three Channels of Color
In the convolution process, it is essential that the dimensions of the convolutional kernel be matched to the image’s dimensions, ensuring proper coverage.
For images with three color channels, such as in RGB, CIE L*a*b, and HSV, the convolutional kernels must be represented as arrays of three dimensions (height, width, and depth). In Figure 12, a 3 × 3 × 3 kernel performs the convolution process on an image with three color channels.
In this scenario, the convolution process is performed for each depth layer, where each dimension of the image representing a color channel is subjected to the convolution process using the corresponding kernel dimension. Subsequently, three convolution processes are performed, whose results, in conjunction with the value of the bias assigned to the kernel, are aggregated to obtain the value of the output pixel. Notably, although the convolution process is executed on a three-dimensional image, the result will have only one dimension. An illustration of this process can be found in Figure 13.
2.5. Proposed Model: CDT Induction Using SHADE with Color Images (-)
In this work, the - search strategy is proposed. This model employs the SHADE algorithm to find the kernels that serve as conditions in the nodes of a CDT, with the particularity that these kernels are directed to the convolution process in color spaces; that is, they have three channels.
In this model, the CDT structure is defined by two parameters: the size of the kernels, denoted by s, and the depth of the tree, denoted as d. The objective is to perform a global search to maximize the classification accuracy of the CDT, measured with the F1-score metric.
In the - model, the pixels in the images and the kernels for the CDT are encoded separately for each channel. To encode a pixel (instance) of the image, the hues of the pixels in the neighborhood of size s surrounding it are used in each channel. Similarly, the weights of a node kernel of the CDT in each channel are encoded as a vector of size . These coding processes are illustrated in Figure 14.
In the SHADE process, each individual in the population is represented by a vector of real values that correspond to the weights of the kernels of the CDT to be found; see Figure 15. The length of this vector is dependent on the kernel size s and the depth d of the CDT under consideration. For instance, when a kernel with three channels is considered at each of the three nodes, values are required, along with a single value for the bias of the convolution process (see Figure 13 for a description of the convolution process on a three-color-channel image). This sum of values must then be multiplied by the number of kernels in the CDT, which is calculated as . So, for the SHADE process, each individual in the population is a vector of real values of size . These values are randomly generated between −255 and 255.
A perceptron structure is employed to ascertain the appropriate branch of the tree to pursue during the classification of an instance. To accomplish this, the final pixel (instance) encoding is a vector with the encodings of the channels appended as a vector, incorporating a value of 1 for the bias; see Figure 16.
In the perceptron structure, the dot product of the vector with the values of the encoded instance (as in Figure 16) with the vector corresponding to the encoding of the kernel (as in Figure 15) is passed through the sigmoid activation function, defined by Equation (4). This function returns a value between 0 and 1, and, as in the SHADE-CDT method, if this value is smaller than or equal to 0.5, the label 0 is assigned; if the value is higher than 0.5, the label 1 is assigned. The tree node to which the instance should be directed is then determined based on this label, with label 0 assigned to the node on the left side and label 1 assigned to the node on the right side. This procedure is repeated until a leaf node is reached, at which point a label is assigned to the instance (pixel).
After the acquisition of all the labels for the pixels in the images of the training dataset, the F1-score metric is utilized to evaluate the fitness of the corresponding CDT.
The pseudocode describing the process for determining the fitness value of an individual using the - method can be found in Algorithm A2 of Appendix B.
As previously indicated, the one-dimensional process (-) necessitates considerable computational time and memory when considering only grayscale images. However, when color images are incorporated, the learning process involves tripling the amount of information due to the convolution performed in each color channel. Consequently, the median selection technique is a powerful tool employed in the - method proposed in this study. This technique facilitates reducing information during the learning process by selecting a sample of representative pixels, as Section 2.2.2 outlines, but, in the three-channel version, the associated vector for each pixel is obtained as depicted in Figure 16.
Also, to reduce the computational time required for the learning process of the - method, another modification considered was the utilization of a memory of previously evaluated individuals for the SHADE algorithm. In this memory, the individuals of the initial population and their fitness values are stored, and, for the subsequent generation, solely the recently generated individuals resulting from the mutation, crossover, and selection operators are evaluated and substituted for the defeated individuals. This process is repeated for subsequent generations until the DE algorithm is complete, as illustrated in Figure 17. This modification has been demonstrated to reduce the number of individual evaluations and the time required to complete the induction process, even when considering the three color channels.
The pseudocode of the proposed model for CDT induction using SHADE with color images (- method) is presented in Algorithm A3 of Appendix B.
It is important to acknowledge the critical role of the median selection technique in reducing information during the learning process and the proposed memory to reduce individual evaluations. These approaches enable the execution of the experiments described in the following section implemented in the Matlab R2023b software on a computer with the same characteristics as the one used in [10,11]; see Table 1. This facilitates a comparative analysis of the outcomes, showing that the proposed methodology reduces the memory and time of the induction process of a CDT compared to previous works while improving the F1-scores in the segmentation task by incorporating color images instead of grayscale images.
3. Experiments and Results
This section describes the experiments performed to analyze the performance of the CDT induction process for the segmentation of color images proposed in this work. The images from the “Weizmann Horse dataset” [22] are used to compare the proposal results considering the different color spaces described in Section 2.3. Once the best color space for induction is identified, the “Digital Retinal Images for Vessel Extraction” (DRIVE) dataset (
3.1. Comparison of Color Spaces with the Weizmann Horse Dataset
The Weizmann Horse dataset was used in experiments to compare the three color spaces under consideration (RGB, CIE L*a*b*, and HSV). This dataset comprises 327 manually segmented images of horses, which were resized to of their original size for the experimental procedures. The images were divided into training and test sets, with 33 and 295 images, respectively. Subsequently, controlled experiments were conducted to compare the segmentations obtained in each color space. These experiments consider as a primary approach the following parameters: 100 generations, population size with 100 individuals, and a memory of size 100 for the SHADE algorithm, and depths ranging from 1 to 5 and kernel sizes of 3, 5, 7, 9, and 11 were considered for the CDT structure.
The results obtained for the segmentation of the images in the test dataset employing the - method with the median technique, with , are presented in Table 2. This value was selected after performing controlled experiments considering proportions of 10%, 20%, 30%, 40%, and 50%, obtaining the best results with 30%, so this value was maintained for the subsequent experiments. The training and test datasets were the same in these experiments to ensure consistency in the comparison between the color spaces and the different configurations of tree depth and kernel size in the structure of the CDTs.
To perform more in-depth experiments, the CDT structure with the highest F1-score value by kernel size was used. For this reason, a graphical representation of the information in Table 2 is presented in Figure 18, where the subfigures are line plots in two dimensions considering the depth of the CDT and their corresponding F1-scores, fixing the kernel size in each subfigure.
As demonstrated in Figure 18, the comparison lines facilitate the identification of the depth value, d, and the color space corresponding to the higher F1-score for each kernel size s. In Figure 18a, the maximum F1-score attained by a CDT with a kernel size of 3 is achieved when the depth is 3 in the RGB color space. Figure 18b illustrates that a higher F1-score is obtained by a CDT with a kernel size of 5 when the depth is 4 in the CIE L*a*b* color space. Furthermore, Figure 18c–e demonstrate that the best F1-score outcomes achieved with the CDTs employing fixed kernel sizes 7, 9, and 11, correspondingly, are attained when the trees have a depth of 2 in the CIE L*a*b* color space.
Five experiments were conducted to induce a CDT with each of the structures previously enumerated and for each space color, so, in total, 75 experiments were performed (25 for each color space). For these experiments, the same training and test sets were used to ensure a consistent comparison with the median selection technique.
In Table 3, the minimum, maximum, and mean ± standard deviation of the image segmentation precision, measured by the F1-score, of the images in the test set are presented.
The Shapiro–Wilk statistical test was employed for each CDT configuration within each color space to verify the normal distribution of the F1-score results obtained from the five executions. The corresponding p-values are also presented in Table 3.
In Figure 19, the boxplots of each color space are presented for each analyzed configuration.
The following observations can be made with reference to Figure 19: In Figure 19a–c, the values obtained with the CIE L*a*b* color space demonstrate less variability when compared to the other two color spaces. This is because the corresponding boxes are comparatively short. Conversely, the boxes corresponding to the RGB and HSV color spaces are comparatively tall. As illustrated in Figure 19c,d, the values obtained with the CIE L*a*b* color space demonstrate less variability when compared to the other two color spaces. However, in these instances, the boxes corresponding to the RGB and CIE L*a*b* color spaces do not exhibit significant variability. As illustrated in Figure 19a,b, two values are positioned on the upper whiskers of the boxplots, corresponding to the CIE L*a*b* and RGB color spaces, respectively. This observation suggests the presence of outliers in their corresponding values. However, it should be noted that these values are above the median F1-score. Conversely, in Figure 19b,d,e, the outliers are values that fall below the median F1-scores since they appear in the lower whiskers of the RGB, RGB and CIE L*a*b*, and HSV color spaces, respectively. However, in these cases, the comparative superiority of the RGB and CIE L*a*b* color spaces over the HSV color space is evident. The boxplots demonstrate that the CIE L*a*b* color space consistently yields the highest mean F1-score values and exhibits minimal variability. However, a one-way ANOVA test was conducted to ascertain the disparities among the means of the various color spaces.
Subsequently, the one-way ANOVA test was performed to ascertain whether the differences between the means of the color spaces are statistically significant for each CDT structure. The p-values for these tests are also presented in Table 3. As these p-values were lower than the pre-determined significance level of , it can be concluded that the differences observed between the means of some of the methods are indeed statistically significant.
Tukey’s Honestly Significant Difference Procedure was employed as a post hoc multiple comparison test to ascertain which color spaces’ means are distinct on each CDT structure since the ANOVA test rejected the null hypothesis that all the method means are equal. The test results are illustrated in Figure 20. Additionally, a spider graph is provided in Figure 21 to facilitate a visual representation of the performance on each color space. This graph considers the three analyzed color spaces (RGB, CIE L*a*b*, and HSV) and the different CDT structures outlined in Table 3. The value of 1 indicates that the color space has obtained the best segmentation result in the corresponding CDT structure. Conversely, the value of 0 indicates that the mean F1-score obtained by the color space is not the best in the corresponding CDT structure.
The following observations can be made with reference to Figure 20 and Figure 21: For the CDT structures with kernel sizes 3, 5, and 9 in Figure 20a, Figure 20b, and Figure 20d, respectively, the differences between the RGB and CIE L*a*b* color spaces are not significant. In Figure 21, the RGB and CIE L*a*b* color spaces have a value of 1 in the corresponding CDT structures, indicating that the best segmentation results are achieved in these color spaces. For the CDT structures with kernel sizes of 7 and 11, significant differences are presented between the three color spaces in Figure 20a and Figure 20b, respectively. The spider graph in Figure 21 shows that those CDT structures’ best segmentation results are obtained in the CIE L*a*b* color space. In Figure 20, the means for the RGB and CIE L*a*b* color spaces are always significantly different from those of the HSV color space. This assertion can be made since the lines corresponding to the HSV color space are positioned more to the left on the horizontal axis and do not overlap with the lines corresponding to the other color spaces. These figures demonstrate the comparative superiority of the CIE L*a*b* color space over the other color spaces. As illustrated in Figure 20, in all the subfigures, the lines corresponding to the CIE L*a*b* color space are positioned more to the right on the horizontal axis. This finding indicates that optimal segmentation outcomes are attained within this color space. This is represented in Figure 21 since the CIE L*a*b* color space has a value of 1 in all the CDT structures.
These results demonstrate that a CDT induced with the information of the images in the CIE L*a*b* color space produces superior segmentation results in comparison to a CDT induced with information in other color spaces.
- vs. -
A comparison of the results obtained with the - technique and those obtained with the - method is conducted in this study.
The best results reported in [10] per kernel size for the - technique are presented in Table 4. These results were documented following the implementation of a singular execution of the - technique. Given the absence of computational reduction approaches for the induction process, the volume of information for the training process was substantial. Consequently, multiple experiments with different kernel sizes and depths were not feasible, even when grayscale images were considered.
In the same Table 4, the best results with the - method are reported. These values were obtained from Table 3 considering the CIE L*a*b* color space with the corresponding CDT structure.
In both methods, the median selection technique was employed with a proportion of 0.3, and, for the SHADE algorithm, 100 individuals and a memory size of 100 were used. For the number of generations, 200 were considered for the - approach, while 100 were considered for the - method.
In Figure 22, a visual representation of the information in Table 4 is presented. The comparison between the F1-score and induction time results with each method (- and -) is presented as a plot line, indicating the kernel size considered on each point.
As indicated in Table 4 and Figure 22, the precision values obtained with the - technique are lower than those obtained with the - method. In Table 4, the F1-scores in column 3 are less than 0.53, while the scores in column 5 are greater than 0.56, with some exceeding 0.6. This assertion is corroborated by Figure 22, which illustrates that the points representing the - method results are positioned more to the left along the horizontal axis, corresponding to the F1-score, in comparison to the points for the - method.
Concerning the induction time, none of the experiments employing the - technique exceeded 24 min in execution, whereas the - method reported execution times over 48 min for identical structures. This hypothesis is corroborated by Figure 22 since the points representing the - method are located higher along the vertical axis, corresponding to the induction time, in comparison to the points for the - method.
It is important to note that the median selection technique and the modification in the SHADE process for the - technique have significantly reduced the learning process time and memory in experiments that utilize information from three color channels for induction. As demonstrated in Figure 22, the behavior of the induction time for the - method appears to be exponential. In contrast, for the - method, the induction time is not as variable and is entirely dependent on the number of new individuals generated in the SHADE algorithm.
It is also noteworthy that the SHADE algorithm process involved 200 generations with the - technique and only 100 generations with the - method. This discrepancy precludes a precise time induction comparison between the methods. In the following section, an accurate comparison of times will be performed in the experiments conducted with the DRIVE dataset, considering only the CIE L*a*b* color space.
It should be noted that experiments with kernels of size 11 are not documented in [10]. This is likely due to the computer specifications described in Table 2 being inadequate for executing the induction of CDT, or due to the experimental time required for these structures, even when grayscale images were considered. In contrast, the - method, in conjunction with the median selection technique, has yielded favorable outcomes for CDTs with kernels of size 11 and color images, requiring a small amount of time for the induction process.
3.2. Comparison of Induction Time and Precision with the DRIVE Dataset Considering the CIE L*a*b* Color Space
In this section, the “Retinal Vessel Segmentation” dataset was employed to analyze several CDT structures’ induction times and performance with the - technique. This dataset comprises 20 retinal images with their corresponding segmentation masks or ground truths. The selection of these images was guided by the findings reported in [11], which detailed the segmentation outcomes obtained by applying the - method with the median selection technique to induce CDTs with this dataset.
Replicating the characteristics mentioned in [11] was considered in these experiments. Consequently, the resolution of the images, which previously was , was changed to . Furthermore, the same images utilized for the training and test sets in [11] were considered, with proportions of 70% and 30%, respectively. The parameter settings for the experiments are presented in Table 5. In this work, these values were applied to the - technique, considering 10 independent executions, enabling a comparison of the computational time for the induction by employing the median selection technique.
In Table 6, the outcomes of the image segmentation process utilizing both methodologies are presented. The results with the full training data (-) and with the - method were obtained from [11].
As demonstrated in Table 6, the findings reveal that, for a given CDT configuration (kernel size and depth), the time required to induce a CDT using the median selection technique, with images in a single color channel (grayscale images for the - technique) or three color channels (color images for the the - technique), is consistently less than the induction process that incorporates all the information of the grayscale images present in the training set (-).
It is important to mention that, although the - technique increases the length of individuals and kernel weights by a factor of three for the SHADE algorithm, the induction time is not tripled compared to the - method. This is attributable to the incorporation of a memory of previously evaluated individuals, which reduces the number of evaluations in the fitness function.
Furthermore, experimentation with the - technique incorporating all the information of the color images present in the training set was not feasible with the computer available. Consequently, a direct comparison of the technique in three color channels with the full training data and the median selection technique, as in [11], was not possible.
As previously indicated, these experiments were conducted to analyze the computational time required for these processes, not the precision, given that the CDT induction using the - technique did not correspond to the ideal parameter values. Initially, the structure employed in [11] was used to compare the times for both techniques. So, to evaluate the precision of the CDT induced with the - technique, several experiments were performed considering different parameters for the SHADE algorithm, the CDT structure, and the proportion for the median selection technique. The best results were obtained with CDTs induced with the parameters presented in Table 7.
A total of ten independent experiments were conducted to induce CDTs using the - technique, with the parameter values given in Table 7. The minimum, maximum, and mean ± standard deviation of the F1-scores obtained are presented in Table 8. The results obtained in [11] with the - technique using the parameter values in Table 5 are also reported in the same table to facilitate the comparison. The p-values obtained when performing the Shapiro–Wilk statistical test are included in both cases.
The Shapiro–Wilk statistical test was performed to verify the normal distribution on the F1-score results obtained in the ten executions of each of the two methods (- and -). The p-values indicate that the distribution of F1-scores obtained with the - technique is non-parametric, so, in order to identify if the samples of both methods exhibit significant differences, the Mann–Whitney U test was conducted, obtaining a value of 0.0017. Since this p-value is lower than the significance level , it is concluded that the difference between the two methods is significant.
So, it is accurate to assert that the - method, employing the median selection technique, yields superior outcomes compared to the - method, utilizing the same technique, when the induction parameters are calibrated. This assertion is substantiated by the median values of 0.6520 and 0.6392 for the - and - methods, respectively, and the means in Table 8.
Regarding temporal requirements, the - method necessitated a greater investment of time than the - approach. This is because the algorithm process incorporates a population of 200 individuals and 350 generations in the first method as opposed to the 100 individuals and 200 generations employed in the second method. This suggests that there is a compromise between the temporal demands of the processes and their performance. Nevertheless, the induction times reported in this work are shorter than those documented in [6] and [7]. In those papers, the process required at least 12 h of execution. This represents a significant and noteworthy enhancement regarding the CDT induction process.
4. Discussion
As reported in previous studies, the induction of CDTs for the segmentation task employs grayscale images and analytic or evolutive processes to perform local or global searches to obtain the kernels of the nodes of the CDTs, respectively [6,7,9,10,11]. The present study explores the integration of color images in the induction process of a CDT.
In this direction, three color spaces were employed in the initial stages of the investigation to establish the most effective channels for characterizing the colors of the images in the induction process. A comparison of the RGB, CIE L*a*b*, and HSV color spaces, three of the most relevant color spaces, was undertaken, and it was found that the CIE L*a*b* color space rendered the best three channels in all the experiments of induction of the CDTs considered (see Table 2 and Table 3). Consequently, this color space was utilized to compare the outcomes obtained through the conventional induction of CDTs with grayscale images in a single channel and the proposed approach with color images and CDTs with three channels.
This study considered two types of analysis: one to analyze the execution time and the other to analyze the segmentation task’s precision.
Regarding temporal considerations, the proposed methodology employs the median selection technique and a memory of previously evaluated individuals for the SHADE algorithm in the learning process. This modification reduces the number of individual evaluations and the time required to complete the induction process (see Table 6). The induction times reported in this study (with a maximum of 131.54 min for the induction process) are shorter than those previously reported in other studies (which involved 12 h of experimentation with lower F1-scores).
Regarding precision, the proposal enhances the F1-scores obtained for the Weizmann Horse and DRIVE datasets when considering the CIE L*a*b* color space and the appropriate parameter settings for the SHADE algorithm in the induction process (see Table 4 and Table 8). These enhancements signify a substantial and noteworthy advancement in the CDT induction processes.
Regarding the so-called explainability of the CDTs, the structures proposed in this work have the same performance as the classical single-channel CDT structures. In the segmentation process, the information of each pixel of the image to be segmented passes through the classification process determined by the convolution on each node of the CDT, and its final label is obtained when the pixel reaches a leaf node. Thus, classifying a pixel with label 0 or 1 can be directly observed. It is also possible to characterize the pixels that reach the same leaf node of the CDT or to analyze why the F1-score values are high or low in different images. Appendix A shows some examples of analyses performed using the inherent explanatory power of the CDTs.
5. Conclusions and Future Work
This work proposes implementing and analyzing a color image segmentation algorithm based on CDTs using differential evolution. In this proposal, the SHADE algorithm was utilized in the learning process, where multiple individuals (CDTs) were randomly proposed and evolved to ascertain the weights of the CDT kernels, which were structured with three channels. The final CDT is the individual with the best F1-score in the training dataset.
The proposal further considers an analysis of three spaces of color to identify the one with the optimal structure for the CDT induction in three color channels, demonstrating the most favorable results in the CIE L*a*b* color space.
The results demonstrated the significance of considering the three color channels in the induction of CDTs, resulting in enhanced segmentation outcomes. The implementation of the median selection technique, in conjunction with the memory of previously evaluated individuals, exhibited a significant contribution, as evidenced by a substantial reduction in the computational time required for the induction process.
This study demonstrated that color characterization is a critical factor in enhancing the efficacy of the segmentation task performed with the induced CDTs since the precision increased by at least 8% between the experiments with grayscale and color images.
Future work will examine other color spaces, including L*C*h*, which have exhibited substantial outcomes in prior studies investigating color. Moreover, restricting the analysis to two channels and two-dimensional structures could be a viable approach to reduce computational time. Consequently, a thorough investigation into the two most representative channels for color space is warranted. Another approach that merits consideration is translating the color information to an optimal color space, where the pixels of the different classes (0 and 1) are separated by techniques such as Linear Discriminant Analysis. In addition, the performance of the CDT in comparison with other segmentation techniques that employ CNNs is proposed.
Conceptualization, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Data curation, A.-L.L.-L.; Formal analysis, A.-L.L.-L.; Investigation, A.-L.L.-L.; Methodology, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Resources, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; Software, A.-L.L.-L.; Supervision, H.-G.A.-M. and E.M.-M.; Validation, H.-G.A.-M. and E.M.-M.; Visualization, A.-L.L.-L.; Writing—original draft, A.-L.L.-L.; Writing—review and editing, A.-L.L.-L., H.-G.A.-M., and E.M.-M. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
Data derived from public domain resources: the Weizmann Horse dataset from Kaggle at
The first author acknowledges the Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI) of Mexico for the support provided through scholarship 712182, which was awarded for postdoctoral studies at the Artificial Intelligence Research Institute at the University of Veracruz.
The authors declare no conflicts of interest.
The following abbreviations are used in this manuscript:
CNN | Convolutional Neural Network |
CDT | Convolutional Decision Tree |
DE | Differential Evolution |
SHADE | Success-History-Based Adaptive Differential Evolution |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Semantic segmentation technique applied to retinal vessel detection *. *
Figure 2 DE algorithm with SHADE.
Figure 3 Grayscale image.
Figure 4 Convolution process on a 6 × 6 image with a 3 × 3 kernel.
Figure 5 Convolutional Decision Tree with kernels of size 3 and depth 2.
Figure 6 Encodings of a kernel and an individual for the SHADE-CDT technique to induce a CDT of depth 3 with kernels of size 3.
Figure 7 Coding of a pixel-associated instance.
Figure 8 SHADE-CDT process employing the codification of the pixel-associated instance (x) and the codification of the convolutional kernels (w) in the nodes of a CDT. The dot product between x and w is evaluated in the sigmoid function to obtain a value between 0 and 1. In this example, this evaluation is greater than 0.5, so the label assigned for the root node, denoted as
Figure 9 Grayscale image and its ground truth on the left, and a representation of the obtention of a representative vector for the analyzed subset
Figure 10 Color spaces analyzed in this work: (a) RGB. (b) CIE L*a*b*. (c) HSV.
Figure 11 Composition of a color image by channels.
Figure 12
Figure 13 Convolution process on an image with three color channels with a kernel of size
Figure 14 Encodings of a kernel and an instance by color channel.
Figure 15 Individual considered in the SHADE process for the induction of a CDT with kernel size
Figure 16 Information of the pixel
Figure 17 The memory implemented for the
Figure 18 Line plots in two dimensions considering the depth d of the CDTs and their corresponding F1-scores, fixing the kernel size s. (a) CDTs with
Figure 19 Boxplots of the F1-scores obtained in five experiments in each color space and in each of the following CDT structures: (a) CDTs with
Figure 20 Multiple comparison test between the color spaces analyzed in each of the following CDT structures: (a) CDTs with
Figure 21 Spider graph of the results obtained with the three analyzed color spaces (RGB, CIE L*a*b*, and HSV) and the different CDT structures (kernel size s and depth d). Value 1 indicates that the color space has obtained the best segmentation result in the corresponding CDT structure. Value of 0 indicates the contrary case.
Figure 22 Plot line to compare the F1-scores and the induction time results with each method (
Computer specifications.
Operating System | Windows 11 Pro 23H2 |
RAM | 64 GB |
Processor | AMD Ryzen 5 5600G |
Processor speed | 3.90 GHz |
Results for the Weizmann Horse dataset segmentation with the
CDT Structure | RGB | CIE L*a*b* | HSV | |||||||
---|---|---|---|---|---|---|---|---|---|---|
| | F1-Score | Accuracy | Time (min) | F1-Score | Accuracy | Time (min) | F1-Score | Accuracy | Time (min) |
3 | 1 | 0.5415 | 0.7632 | 6.39 | 0.5466 | 0.7653 | 6.18 | 0.3996 | 0.7383 | 6.64 |
3 | 2 | 0.5346 | 0.7509 | 10.6 | 0.548 | 0.7645 | 10.44 | 0.5239 | 0.7675 | 10.82 |
3 | 3 | 0.5718 | 0.773 | 14.23 | 0.5654 | 0.7664 | 14.27 | 0.5143 | 0.7692 | 14.36 |
3 | 4 | 0.5534 | 0.7492 | 18.08 | 0.5689 | 0.7692 | 18.25 | 0.5353 | 0.7623 | 18.7 |
3 | 5 | 0.5664 | 0.7669 | 22.39 | 0.5664 | 0.7648 | 24.62 | 0.5455 | 0.7689 | 22.16 |
5 | 1 | 0.5562 | 0.7702 | 7.26 | 0.5567 | 0.7647 | 7.16 | 0.4087 | 0.7331 | 6.65 |
5 | 2 | 0.5544 | 0.7694 | 11.22 | 0.581 | 0.76 | 11.37 | 0.4966 | 0.7457 | 11.72 |
5 | 3 | 0.5769 | 0.7484 | 15.41 | 0.5686 | 0.7528 | 15.68 | 0.5152 | 0.7484 | 15.81 |
5 | 4 | 0.5793 | 0.7502 | 19.58 | 0.593 | 0.7778 | 20.13 | 0.5359 | 0.7486 | 19.33 |
5 | 5 | 0.5666 | 0.7557 | 23.6 | 0.592 | 0.7703 | 24.91 | 0.512 | 0.7528 | 23.04 |
7 | 1 | 0.563 | 0.7689 | 8.51 | 0.5641 | 0.7635 | 8.83 | 0.4127 | 0.7235 | 8.61 |
7 | 2 | 0.5669 | 0.7694 | 13.25 | 0.5942 | 0.7667 | 13.22 | 0.5115 | 0.7425 | 13.82 |
7 | 3 | 0.5825 | 0.7532 | 18.16 | 0.5923 | 0.7672 | 17.30 | 0.5226 | 0.7513 | 18.46 |
7 | 4 | 0.5918 | 0.7567 | 22.76 | 0.5865 | 0.756 | 23.15 | 0.5312 | 0.7363 | 21.79 |
7 | 5 | 0.5671 | 0.74 | 28.09 | 0.5642 | 0.7385 | 26.13 | 0.5275 | 0.7337 | 26.44 |
9 | 1 | 0.5548 | 0.7586 | 11.8 | 0.5707 | 0.7631 | 11.62 | 0.4387 | 0.7183 | 11.23 |
9 | 2 | 0.5946 | 0.7502 | 18.38 | 0.6042 | 0.769 | 17.8 | 0.5063 | 0.7258 | 16.75 |
9 | 3 | 0.5936 | 0.7545 | 24.48 | 0.5897 | 0.7567 | 21.91 | 0.5214 | 0.7202 | 22.22 |
9 | 4 | 0.5898 | 0.7503 | 31.26 | 0.5699 | 0.7441 | 26.69 | 0.53 | 0.7376 | 29.41 |
9 | 5 | 0.5848 | 0.758 | 32.14 | 0.5915 | 0.7648 | 31.98 | 0.5234 | 0.7012 | 32.31 |
11 | 1 | 0.5559 | 0.7519 | 12.62 | 0.5806 | 0.763 | 12.43 | 0.4407 | 0.7134 | 15.09 |
11 | 2 | 0.5503 | 0.7458 | 19.69 | 0.6052 | 0.7585 | 18.26 | 0.5088 | 0.7129 | 20.65 |
11 | 3 | 0.5891 | 0.7393 | 23.71 | 0.5903 | 0.7591 | 23.68 | 0.5225 | 0.7166 | 25.76 |
11 | 4 | 0.5623 | 0.7411 | 29.76 | 0.5954 | 0.7507 | 28.39 | 0.5255 | 0.7179 | 30.07 |
11 | 5 | 0.5749 | 0.7437 | 35.69 | 0.5624 | 0.7354 | 35.64 | 0.5315 | 0.7118 | 35.49 |
Results of 5 experiments with the Weizmann Horse dataset in the three color spaces employing the median technique with
CDT Structure | Metrics | RGB | CIE L*a*b* | HSV |
---|---|---|---|---|
Min/Max | 0.5445/0.5777 | 0.5654/0.5843 | 0.5143/0.5416 | |
| Mean ± St.D. | 0.5612 ± 0.0143 | 0.5714 ± 0.0075 | 0.5276 ± 0.0103 |
| Median | 0.5627 | 0.5697 | 0.5288 |
Time (min) | 15.81 | 15.4 | 14.48 | |
Shapiro–Wilk | p-value | 0.6189 | 0.0660 | 0.9871 |
One-way ANOVA | p-value | |||
Min/Max | 0.5462/0.5793 | 0.5723/0.5930 | 0.5028/0.5386 | |
| Mean ± St.D. | 0.5678 ± 0.0133 | 0.5823 ± 0.0075 | 0.5246 ± 0.0150 |
| Median | 0.5706 | 0.5817 | 0.5297 |
Time (min) | 21.31 | 21.15 | 19.54 | |
Shapiro–Wilk | p-value | 0.3067 | 0.9548 | 0.4637 |
One-way ANOVA | p-value | |||
Min/Max | 0.5562/0.5845 | 0.591/0.6057 | 0.5023/0.5119 | |
| Mean ± St.D. | 0.5660 ± 0.011 | 0.598 ± 0.006 | 0.5077 ± 0.0041 |
| Median | 0.5615 | 0.5966 | 0.5078 |
Time (min) | 14.5 | 14.33 | 13.7 | |
Shapiro–Wilk | p-value | 0.1777 | 0.7977 | 0.5394 |
One-way ANOVA | p-value | |||
Min/Max | 0.5559/0.5946 | 0.5699/0.6064 | 0.5063/0.5149 | |
| Mean ± St.D. | 0.5819 ± 0.0152 | 0.5951 ± 0.0147 | 0.5110 ± 0.0038 |
| Median | 0.5880 | 0.5984 | 0.5101 |
Time | 17.68 | 17.21 | 17.42 | |
Shapiro–Wilk | p-value | 0.0952 | 0.0687 | 0.3659 |
One-way ANOVA | p-value | |||
Min/Max | 0.5503/0.5884 | 0.5978/0.6122 | 0.4906/0.5192 | |
| Mean ± St.D. | 0.5696 ± 0.0174 | 0.6041 ± 0.0054 | 0.5096 ± 0.0115 |
| Median | 0.5719 | 0.6041 | 0.5116 |
Time | 19.94 | 19.6 | 20.34 | |
Shapiro–Wilk | p-value | 0.3384 | 0.857 | 0.2049 |
One-way ANOVA | p-value |
* The differences between some of the mean F1-scores of color spaces are significant at the
The best results per kernel size using the
CDT Structure | |||||
---|---|---|---|---|---|
s | d | F1-Score | Time (min) | F1-Score | Time (min) |
3 | 4 | 0.4789 | 48.87 | 0.5689 | 18.25 |
5 | 4 | 0.4981 | 56.95 | 0.5930 | 20.13 |
7 | 4 | 0.5151 | 66.6 | 0.5865 | 23.15 |
9 | 2 | 0.5220 | 86.4 | 0.6042 | 17.8 |
11 | 2 | Not available | Not available | 0.6052 | 18.26 |
Parameter settings for the
Population Size | Number of Generations | Memory Size | Kernel Size | Tree Depth | Proportion |
---|---|---|---|---|---|
| | H | k | d | |
100 | 200 | 100 | 5 | 2 | 0.3 (30%) |
F1-score results and mean time obtained using the
Method | Mean F1-Score ± St.D. | Mean Time |
---|---|---|
0.6283 ± 0.0089 | 85.8 min | |
0.6415 ± 0.0069 | 27.21 min | |
0.5511 ± 0.0260 | 38.67 min |
Best parameter settings found for the
Population Size | Number of Generations | Memory Size | Kernel Size | Tree Depth | Proportion |
---|---|---|---|---|---|
| | H | k | d | |
200 | 350 | 100 | 5 | 2 | 0.3 (30%) |
Best results obtained with the
Method | Min | Max | Mean ± St.D. | Time | p-Value (Shapiro–Wilk) |
---|---|---|---|---|---|
0.6318 | 0.6502 | 0.6415 ± 0.0069 | 27.21 min | 0.1650 | |
0.6468 | 0.6808 | 0.6608 ± 0.0141 | 131.54 min | 0.0095 |
Appendix A. Explainability in CDTs of Three Channels
The segmentation process performed with a CDT can be analyzed by following the partitions performed on each CDT node. This approach facilitates the analysis of the corresponding subsets, thereby enabling the determination of the characteristics the CDT identifies on each branch. In this section, a detailed description regarding explainability with the CDT with the best F1-score for the Weizmann Horse and DRIVE datasets is provided.
Appendix A.1. Weizmann Horse Dataset
The following description of results was conducted by considering the CDT induced with the best F1-score in
As illustrated in
Figure A1 CDT induced to segment the images in the Weizmann Horse dataset.
In
As demonstrated in
Figure A2 Best segmentation performed on an image from the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.
In this instance, the objective of the partitions was to detect the brown pixels in the original image.
Figure A3 Worst segmentation performed on an image from the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.
Figure A4 Analysis of segmentation on an image of the Weizmann Horse dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.
Appendix A.2. DRIVE Dataset
The following description of results was conducted by considering the CDT induced with the best F1-score in
In
Figure A5 CDT induced to segment the images in the DRIVE dataset.
In these cases, it is evident that kernel 1 performs a partition that discriminates the interior and exterior of the retina, with subsequent kernels conducting the detailed classification of the veins. In particular, kernel 3 is responsible for the most detailed classification step. A subsequent analysis of the results reveals that the CDT consistently achieves lower F1-scores because the pixels designated as class 1 exceed the original number, as the CDT identifies all the veins in the retina, in contrast to the number identified in the ground truth.
Figure A6 Best segmentation performed on an image of the DRIVE dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.
Figure A7 Worst segmentation performed on an image of the DRIVE dataset. The image on the left side of the figure displays the partitions that were performed on each node of the CDT. The image on the right presents the original image, its ground truth, and the segmentation obtained with the CDT.
Appendix A.3. Description and Future Analysis
In the context of performing segmentation of color images with CDTs, the structures exhibited in
In addition, the analysis of sub-structures can be considered in future studies, as illustrated by the partition observed in
Figure A8 Segmentation of a liver image [
Appendix B. Pseudocodes
The pseudocodes that describe the
Algorithm A1 delineates the median selection technique described in
Algorithm A1: Median selection technique |
[Image omitted. Please see PDF.] |
Algorithm A2: F1-score of an individual (fitness) |
[Image omitted. Please see PDF.] |
Algorithm A3: |
Input: Dataset of color images Output: Generate Get Get Return: |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Convolutional Decision Trees (CDTs) are machine learning models utilized as interpretable methods for image segmentation. Their graphical structure enables a relatively simple interpretation of how the tree successively divides the image pixels into two classes, distinguishing between objects of interest and the image background. Several techniques have been proposed to induce CDTs. However, they have primarily been focused on analyzing grayscale images due to the computational cost of the Differential Evolution (DE) algorithm, which is employed in these techniques. This paper proposes a generalization of the induction process of a CDT with the DE algorithm using color images, implementing two techniques to reduce the computational time and memory employed in the induction process: the median selection technique and a memory of previously evaluated solutions. The first technique is applied to select a representative sample of pixels from an image for the model’s training process, and the second technique is implemented to reduce the number of evaluations in the fitness function considered in the DE process. The efficacy of these techniques was evaluated using the Weizmann Horse and DRIVE datasets, resulting in favorable outcomes in terms of the segmentation performance of the induced CDTs, and the processing time and memory required for the induction process.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer