An FDA-Based Approach for Clustering Elicited

Document 1 of 1

More like this

Full Text
Scholarly Journal

Full text

Turn on search term navigation

1. Introduction

Even when we receive information under the same conditions, our point of view may greatly differ from others’. Therefore, if we want to analyze expert knowledge, such differences should be considered. Figure 1 shows a representation of this problem. Our point of view on certain information depends on our cognitive skills and external factors that might change our beliefs.

Expert knowledge elicitation (EKE) has the goal of producing, via elicitation, a probabilistic distribution that represents the expert’s knowledge around a parameter of interest. For that purpose, we can adopt the Delphi method as an elicitation method. The latter is defined by Brown (1968) [1] as a technique based on the results of multiple rounds of questionnaires sent to a panel of experts and whose purpose is to reach a consensus on their opinion. Such method is effective, as it allows a group of individuals to address a complex problem and could be implemented to obtain a single representation of experts’ beliefs through a probability distribution. However, this method proves difficult when the number of experts in the study increases considerably.

Finding the mean of the level of certainty of each expert using their personal distributions is another way to obtain a prior distribution of expert knowledge. Nevertheless, it could be erroneous, as shown in Figure 2, where, for instance, we observe two hypothetical experts’ prior distributions (red and black curves) representing their level of knowledge over a proportion of $θ$ , and the mean of the level of the two experts (green curve), which does not represent the actual level of certainty of each expert. As a result, the entire complex elicitation work done for each expert is wasted. Hence, we believe that the probability distributions of each expert can be classified and their opinions represented using clusters of beliefs. Thus, Bayesian inference can be carried out in parallel by considering each cluster of priors and a decision can be arrived at via experts’ criteria (Barrera-Causil et al., 2019) [2].

On the other hand, classifying probability distributions is an essential task in different areas. Clustering methods to classify word distribution histograms in information retrieval systems have been implemented successfully [3]. For instance, Henderson at al. (2015) [4] present three illustrations with airline route data, IP traffic data, and synthetic data sets to classify distributions. Therefore, functional data analysis (FDA) could be used for clustering distributions since it is an extension of the multivariate methods where observations are represented by curves in a function space [5]. Some tools of multivariate analysis have been extended to the functional context pointwise, considering the implementation of multivariate procedures around the real interval where these functions are defined. Thus, in many cases, the curves are discretized to implement statistical procedures.

Cluster analysis is one of multiple techniques that have been extended to FDA, and different methods are implemented to obtain partitions of curves within its framework. Some of those methods have been compared to determine their performance and make recommendations in several situations [6]. For instance, Abraham et al. [7] proposed a two-stage clustering procedure in which each observation is approximated by a B-spline in the first stage, and the functions are grouped using the k-means algorithm in the second stage. Gareth and Sugar (2003) [8] presented a model-based approach for clustering functional data. Their method was effective when the observations were sparse, irregularly spaced, or occurred at different time points for each subject. Serban and Wasserman (2005) [9] proposed a technique for nonparametrically estimating and clustering a large number of curves. In their method, the nearly flat curves are removed from the analysis, while the remaining curves are smoothed and finally grouped into clusters.

Other alternatives can also be found in the literature. For instance, Shubhankar and Bani (2006) [10] proposed a nonparametric Bayes wavelet model for clustering curves based on a mixture of Dirichlet processes. Song et al., (2007) [11] presented a FDA-based method to cluster time-dependent gene expression profiles. Chiou et al. (2007) [12] developed a Functional Clustering (FC) method (i.e., k-centres FC) for longitudinal data. Their approach accounts for both the means and the modes of variation differentials between clusters by predicting cluster membership with a reclassification step. Tarpey (2007) [13] applied the k-means algorithm for clustering curves under linear transformations of their regression coefficients. More recently, Goia et al., (2010) [14] used a functional clustering procedure to classify curves representing the maximum daily demand for heating measurements in a district heating system. Hébrail et al., (2010) [15] proposed an exploratory analysis algorithm for functional data. Their method involves finding k clusters in a set of functions and representing each cluster with a piecewise constant, seeking simplicity in the construction of the clusters. Boullé (2012) [16] presented a novel method to analyze and summarize a collection of curves based on a piecewise constant density estimation where the curves are partitioned into clusters. Furthermore, Secchi et al., (2012) [17] focused on the problem of clustering functional data indexed by the sites of a spatial finite lattice. Jacques and Preda (2013) [18] presented a model-based clustering algorithm for multivariate functional data based on multivariate functional principal components analysis. The references in Jacques and Preda (2013) [19] are of particular importance because they summarize the main contributions in the field of functional data clustering. Other clustering algorithms have been reported in the literature. For instance, Ferreira and Hitchcock (2009) [6] compared four hierarchical clustering algorithms on functional data: single linkage, complete linkage, average linkage, and Ward’s method (T these methods are implemented in the agnes function of R. This function takes several arguments: x, a data matrix, data frame, or dissimilarity matrix; metric which is the metric to be used for calculating dissimilarities between observations (by default, it is the euclidean distance); and method, a character string specifying the clustering method (single linkage, complete linkage, average linkage, or Ward’s method)). Ferreira and Hitchcock (2009) [6] found that Ward’s method and average linkage outperform their counterparts.

Although there is research relating to the clustering of functions, no study has considered functional clustering of experts’ beliefs. For example, Stefan et al., (2021) [20] studied the effect of interpersonal variation in elicited prior distributions on Bayesian inference. In their study one of the six experts exhibited discrepant distributions. Thus, it would be ideal to have a method able to numerically address discrepancies among clusters of elicited prior distributions. Another important situation is when a researcher needs to make a decision based on information obtained from elicited priors. In all these cases, and according to the problem showed in Figure 2, differences between priors should be addressed and the estimation, either posterior or prior, must be done in parallel for each group of elicited priors. Thus, in this paper we propose a new method to deal with multiple elicited prior distributions. The method thus allows clustering distributions using FDA and the Hellinger’s distance (Simpson, 1987) [21]. Hellinger’s distance enables to quantify the similarity between two probability distributions and, we believe, it is a more appropriate metric than the current metrics for functional data. An illustration of the place of the proposed method within the expert knowledge elicitation workflow is shown in Figure 3.

This proposal is motivated by the interest of offering a new tool for the analysis of prior curves from multiple experts when elicitation is used. However, in addition to offering an alternative to the problem posed in Figure 2, or to the complexity involved in applying the Delphi method with a considerable number of experts, this proposal can be implemented to detect atypical curves, or even to create clusters in fuzzy multicriteria decision-making problems (Kahraman, Onar, and Oztaysi, 2015) [22].

To test the efficiency of the clustering algorithm our method, we propose a hierarchical clustering technique for functional data and compare it, via statistical simulation, with functional k-means, Ward’s, and average linkage methods (these methods are implemented in R [23] through thekmeans.fd function in the fda.usc package (Febrero-Bande and Oviedo, 2012) [24] and the agnes function in the cluster package [25]. The latter function performs agglomerative nesting clustering). To examine the similarity between two clusters, we considered the Rand index, the Fowlkes–Mallows index, the Jaccard coefficient—index to measure similarity between sample sets—and the correct classification rate (Hubert and Arabie, 1985; Morlini and Zani, 2012) [26,27]. Note that reporting the outcomes of all these indexes allows seeing patterns in the results and it is in line with practices to enhance transparency in research (Steegen et al., 2016) [28]. Finally, the application of our method is illustrated using real data sets.

The paper is structured as follows: Section 2 introduces some theoretical approaches and details the proposed method for clustering density functions and/or functional data. Section 3 describes the simulation study and its results. Section 4 shows an example using real data sets to illustrate the use of the proposed method. Section 5 presents the main contributions of this paper and suggests topics for further research. Finally, Section 6 contains the supplementary material and three additional illustrations using different data sets (see Appendix A).

2. Definitions

2.1. Elicitation of Probability Distributions

An elicitation process involves extracting information about parameters of interest ( $θ$ ) from the subjective experience of a person (expert) and expressing it as a probability distribution (prior distribution) Barrera-Causil et al., (2019) [2]. Thus, for expert i, a real function $x^{i} (θ)$ (prior distribution) is elicited, and for a grid of m equally spaced points ( $θ_{1}^{i} < θ_{2}^{i} < \dots < θ_{m - 1}^{i} < θ_{m}^{i}$ ) throughout the support of this distribution, the heights of the function are calculated $(y_{1}, y_{2} \dots, y_{m - 1}, y_{m})$ . This process is made for n experts with the aim to discretize the functions $x^{1}, \dots, x^{n}$ .

2.2. Functional Data Analysis

In functional data, the ith observation is a real function $x^{i} (θ)$ , where $θ \in Θ = [θ_{1}^{i}, θ_{m}^{i}]$ , such that $θ_{1}^{i}$ and $θ_{m}^{i}$ are the minimum and maximum of $θ$ for the i-th expert, respectively, and $Θ$ is a real interval $(i = 1, \dots, n)$ . Thus, each $x^{i}$ is a point in certain function space $H$ (Ramsay et al. 2009) [29]. For analysis purposes, we assume that the functional data $x^{1}, \dots, x^{n}$ has an inner product or Hilbert space structure, that is, $x^{i} \in H$ , where $H$ is a vector space of functions defined on a real interval $Θ$ and $H$ is complete and has an inner product $⟨, ⟩$ .

Functional Clustering

There are different methods for clustering functional data, such as k-means, agnes with the agglomerative average method, and Ward’s method, among others (Ferreira and Hitchcock (2009) [6]). However, k-means is used most frequently.

Table 1 describes three different methods implemented in this paper.

2.3. Proposed Method

Our proposed method for clustering density functions, based on the Hellinger distance, works as follows:

1.. For each $x^{i} (θ)$ , $i = 1, 2, \dots, n$ , discretize the curve and get a grid of m equidistant points $(y_{1}, \dots, y_{m})$ . These points correspond to heights of the curve in m equally spaced points of the support of the function (in this paper, we use 200 and 300 equally spaced points throughout the support of the function. The determination of the amount of points depends on computational capacity). Thus, to apply this methodology, we recommend selecting equidistant points throughout the support of the distributions of interest in order to capture the shape of the curves. We have found that increasing the numbers of points does not have a notorious effect on the outcome of the algorithm).
2.. Compute the Hellinger distance for all possible combinations of two of these functions. So, for curves $x^{s}$ and $x^{t}$ , the distance is as follows:
(1) $d (x^{s}, x^{t}) \approx \sqrt{\sum_{j = 1}^{m} {(\sqrt{h_{j}^{s}} - \sqrt{h_{j}^{t}})}^{2}},$
where
(2) $h_{r}^{s} = \frac{y_{r}^{s}}{\sum y_{j}^{s}},$
with $1 \leq r \leq m$ and $(y_{1}, \dots, y_{m})$ are defined.
3.. Build a matrix of distances $d$ between all curves using the proposed metric.
4.. Use the hierarchical clustering function hclust in R with $d$ as the distance matrix.
5.. Obtain the corresponding dendrogram.
6.. Specify the number of clusters and identify the members in each cluster.

To analyze the performance of our proposed method, we compared it with other algorithms available in the literature under different schemes. Such algorithms included those implemented in the kmeans.fd and agnes functions in R, as studied by Ferreira and Hitchcock (2009) [6]. In the following section, we explain in detail the way the simulation study was conducted.

3. Simulation Study

For simulation purposes, density functions were used as functional data. The theoretical counterpart of these densities was estimated using kernel functions via the density() function in R. To assess the performance of our method, we compared it with the functional k-means, Ward’s, and average linkage methods for functional data as implemented in the kmeans.fd(), agnes(method="ward") and agnes(method="average") functions in R, respectively. In the simulation study, we generated overlapping distributions and overlapping clusters of distributions based on the two following definitions:

Definition 1.

Overlapping distributions. Let X and Y be random variables with density functions $f (x)$ and $g (y)$ , respectively, both sharing the same support, and define $P_{γ} |_{Z}$ as the γ-th percentile of Z. Then, we can say that f and g overlap if one of the following conditions is satisfied:

(a)
$P_{γ} {|_{X} \leq P_{1 - γ} |}_{Y}$ ∧ $P_{1 - γ} {|_{X} \leq P_{γ} |}_{Y}$
(b)
$P_{γ} {|_{Y} \leq P_{1 - γ} |}_{X}$ ∧ $P_{1 - γ} {|_{Y} \leq P_{γ} |}_{X}$ .

Definition 2.

Overlapping clusters of distributions. A cluster is δ-overlapped $(0 \leq δ \leq 1)$ with another one if at least $100 \times δ %$ of those distributions is overlapped. If two clusters of distributions are not overlapped, it means they are separated.

Initially, the clusters generated contained a finite number of curves following a Normal $(μ, σ^{2})$ distribution with pre-specified means and variances for each real group but with a random perturbation of the parameters within clusters. Therefore, three clusters with $n = \{5, 10, 30, 50\}$ curves per cluster were considered. All of the cases above were simulated considering two and three clusters of curves.

We also considered asymmetrical features to compare the behavior of the four methods under different scenarios and thus modified the previously described simulation process (clusters of Normal distributions) using Gamma $(α, β)$ and Beta $(α, β)$ distributions within clusters. A further major consideration in all cases was the use of an atypical curve for each simulation scenario to evaluate the performance of the method in the presence of atypical curves. To study the effectiveness of the method under evaluation, we considered separated and $δ$ -overlapped clusters of distributions with a known value of $δ$ . The average overlapping rate in each scenario is presented in Table 2.

In all the simulation scenarios, the existence of clusters of curves was ensured by testing the equality of their means. Figure 4 shows one of the different simulation scenarios used to compare the four methods. These scenarios were considered for two and three clusters, with $n = \{5, 10, 30, 50\}$ curves per cluster, and an equal number of curves. In each simulation scenario, a total of 1000 replicates were run. To validate the clusters found at each replicate, we used the routines in the clv package [31] in R. The performance of the four methods compared here was assessed using the Rand index, the Fowlkes–Mallows index (F.M), the Jaccard coefficient, and the correct classification rate. These measures are defined by the equations below.

Given a set of n elements $S = {v_{1}, \dots, v_{n}}$ and two partitions of S to be compared: $X = {X_{1}, \dots, X_{r}}$ , a partition of S into r subsets, and $Y = {Y_{1}, \dots, Y_{s}}$ , a partition of S into s subsets, define the following:

a: the number of pairs of elements in S that are in the same set in X and in the same set in Y.
b: the number of pairs of elements in S where both elements belong to different clusters in both partitions.
c: the number of pairs of elements in S where both elements belong to the same cluster in partition X but not in partition Y.
d: the number of pairs of elements in S that are in different sets in X and in the same set in Y.

So,

$a = | S^{*} |, where S^{*} = {(v_{i}, v_{j}) | v_{i}, v_{j} \in X_{k}, v_{i}, v_{j} \in Y_{l}}$

$b = | S^{*} |, where S^{*} = {(v_{i}, v_{j}) | v_{i} \in X_{k_{1}}, v_{j} \in X_{k_{2}}, v_{i} \in Y_{l_{1}}, v_{j} \in Y_{l_{2}}}$

$c = | S^{*} |, where S^{*} = {(v_{i}, v_{j}) | v_{i}, v_{j} \in X_{k}, v_{i} \in Y_{l_{1}}, v_{j} \in Y_{l_{2}}}$

$d = | S^{*} |, where S^{*} = {(v_{i}, v_{j}) | v_{i} \in X_{k_{1}}, v_{j} \in X_{k_{2}}, v_{i}, v_{j} \in Y_{l}}$

For some, $1 \leq i$ , $j \leq n$ , $i \neq j$ , $1 \leq k$ , $k_{1}, k_{2} \leq r$ , $k_{1} \neq k_{2}$ , $1 \leq l$ , $l_{1}, l_{2} \leq s$ , $l_{1} \neq l_{2}$ .

The Rand index (R), the Jaccard coefficient (J), the Fowlkes–Mallows index (F.M), and the correct classification rate (CCR) are thus calculated as follows:

$R = \frac{a + b}{a + b + c + d}$

$J = \frac{a}{a + c + d}$

$F M = \sqrt{\frac{a}{a + c}} \times \sqrt{\frac{a}{a + d}}$

$C C R = \frac{Number of correctly classified objects}{Total number of objects}$

All these validation measures have a value between 0 and 1, where 0 indicates that no pair of points is shared by the two data clusters, and 1 means that the data clusters are exactly the same.

Results

The main results of this simulation study are shown in Figure 5 and Figure 6 (R codes and data associated with this article can be found at https://figshare.com/projects/An_FDA-based_approach_for_clustering_expert_knowledge/73437 (accessed on 4 March 2021)). In general, and regardless of the performance index we used, our proposed method performs better than kmean.fd, Ward’s, and average linkage methods in terms of recreating the cluster structure in the data (Figure 5). Therefore, the clustering method based on the Hellinger distance proposed herein is a better choice for classifying density curves (or, for that matter, functional data).

The assertion above is justified from two perspectives. First, when clusters are separated or overlapped or an atypical curve is present in simulated data from Normal $(μ, σ^{2})$ or Gamma $(α, β)$ populations, there is evidence that our method performs better than the other ones. Second, when two clusters from a Gamma $(α, β)$ population are generated, the kmean.fd and Ward’s methods, as well as our proposed method, performed equally well. However, in any other scenario, our method outperforms the others.

Although the kmean.fd method and our proposal perform equally well in almost all the scenarios, our method outperforms others when three clusters of Gamma $(α, β)$ curves are considered regardless of their degree of overlap (Figure 5). When compared with Ward’s method, ours has a slightly lower performance when $n = 50$ curves are generated for two clusters of curves obtained from a Gamma $(α, β)$ distribution. A similar behavior is observed when two clusters of overlapped Beta $(α, β)$ curves are generated, or when $n = \{5, 10\}$ curves from two clusters of a Beta $(α, β)$ distribution with an atypical curve are generated (Figure 6). In the former case, our proposal exhibits a slightly lower performance than the other alternatives. However, in general, our method and the kmean.fd algorithm performed equally well. In the latter case, Ward’s method performs slightly better than our proposed method.

Figure 6 illustrates the performance measures of the four methods compared in this study to obtain optimal partitions. Overall, all the methods studied here exhibit poor performance compared to that of our proposal. The result seems to be a promissory topic for further exploration, especially for clusters generated from a Beta $(α, β)$ distribution because this is a very flexible distribution that can take different shapes. The closed support of a Beta distribution leads to seeking clusters under more concentrated possibilities, turning every method into a more overlapping one and making it more challenging to detect the true differences within these curves. Thus, our proposed method exhibits the best performance compared to the other options considered here.

4. Illustration

This section shows the performance of our proposed method when faced with a real data set.

Computer’s Lifetime Elicitation

The purpose of this application is to segment prior distributions of experts regarding a desktop computer’s average operation time in months; since its purchase until its first failure. Those prior distributions were obtained in an elicitation process described in Barrera and Correa (2008) [32]. They carried out this elicitation process through an interview-survey administered to six experts; however, their study was conducted taking into account only the person with the most exceptional expertise. In this study, we work with all prior distributions, considering the possible existence of clusters.

Figure 7 shows the elicited prior distributions (plotted via the free-hand method) of the six experts, which resulted from the interview-survey conducted in Barrera-Causil and Correa (2008) [32]. We can observe that all the distributions have a similar distributional shape, but one of them (in green) has a very different shape with respect to the others. Table 3 shows the experts’ years of experience in the ICT area (expertise) and years in their current job. Expert 3 exhibits the highest level of expertise.

Applying our proposed clustering method, Figure 8 presents the cluster dendrogram of the six experts, where we can observe that the prior belief of Expert 3 is far from that of the other participants. Furthermore, note that, in Table 3, this expert exhibits the highest level of expertise. For that reason, we decided to analyze the prior distributions using two clusters: one comprising Experts 1, 2, 4, 5 and 6; and another one containing Expert 3 alone.

In this situation, the following questions arise: Is the average of all distributions the best representation of prior beliefs? Is it possible to reach a consensus among these experts? Regarding the last question, in some cases a consensus can be quickly reached, but in many others this process can be tedious or such agreement cannot be reached.

As for the first question, when priors are very different, the average of these distributions can result in almost an uniform distribution that spoils the complex process of elicitation. Thus, we recommend creating homogeneous groups of these prior distributions and analyzing them simultaneously. To analyze each group simultaneously, we calculate the functional mean and approximate the average curves to a Gamma distribution by cluster. By using the fitdistr function of the R software, we obtain a maximum-likelihood fitting of Gamma distribution for the functional mean in each cluster. This procedure can be described as follows:

The functional mean is obtained in each cluster.
1000 observations are generated from the distribution that is proportional to the the functional mean in each cluster.
This distribution is approximated to a gamma distribution using the fitdistr function with the generated samples.

Accordingly, for the cluster formed by Experts 1, 2, 4, 5, and 6, the fitted prior distribution is $G (2.82, 0.19)$ , with standard errors of the estimated parameters 0.39011009 and 0.02798532, respectively. For Expert 3, the fitted distribution is $G (16.21, 0.5)$ , with standard errors of the estimated parameters 2.04588737 and 0.06345412, respectively (the Gamma distribution was chosen as it is a distribution typically used to model lifetime data (Shanker et al., 2016 [33]). A method based on goodness-of-fit metrics (e.g., AIC) via a GAMLSS approach (see Rigby and Stasinopoulos, 2005) [34] could have suggested distributions other than the Gamma as better fits. Thus, the selection of the distribution has a direct effect on estimates of location, scale, and shape and, in turn, on any subsequent inferential analysis). Figure 9 shows prior distributions, by cluster, of a computer’s average operation time in months (since its acquisition until its first failure) considering the functional mean in each group.

It is clear there are two clusters of experts with their distributions exhibiting differences in location and scale. As mentioned above, combining both distributions is indeed possible (e.g., by averaging); however, doing so does not reflect the level of expertise in each cluster (see Figure 2). Thus, each cluster should be analysed separately. Next, we illustrate a parallel analysis for each cluster. Table 4 shows the lifetime of the desktop computers assessed by the experts in the study by Barrera-Causil and Correa (2008) [32]. By resorting to the exponential component of the Gamma distribution, it is possible to estimate 95% probability intervals for the average time of the first occurrence of a physical failure in a desktop computer (in months). Thus,

$p (x | θ) = θ e^{- θ x},$

where x are the observed data, and

θ

is the parameter of interest.

For cluster 1, as the average operation time in months is $\frac{1}{θ} \sim G (2.82, 0.19)$ , then we have that $θ$ follows an inverted Gamma distribution. Then, the prior of $θ$ can be expressed as

$p (θ) \propto θ^{- 3.82} e^{- 0.19 / θ} .$

Given that the likelihood of the Exponential distribution is

$L (θ | x) = θ^{17} e^{- θ \sum_{i = 1}^{72} x_{i}} .$

The posterior distribution is therefore:

$\begin{matrix} p (θ | x) & \propto p (θ) L (θ | x) \\ \propto θ^{13.18} e^{- (θ \sum_{i = 1}^{72} x_{i} + \frac{0.19}{θ}),} \end{matrix}$

where

\sum_{i = 1}^{72} x_{i} = 4297.77

We know that for future observations y, the predictive distribution is given by

$p (y | θ) = θ e^{- θ y}, with (y \geq 0; θ > 0) .$

In order to obtain the predictive Bayesian PDF, 1000 samples from the posterior distribution were obtained. Thus, the Bayesian predictive density for cluster 1, can be expressed as:

$p (y | x) = E_{θ | x} [θ e^{- θ y}] \approx \frac{1}{m} \sum_{i = 1}^{m} θ_{i} e^{- θ_{i} y},$

where

m = 1000

A 95% probability interval of the average time of the first failure time of a desktop computer (in months) in cluster 1, could be calculated by plugging the 1000 samples from the posterior distribution into the previous equation. The interval in cluster 1 is therefore $[1, 79.025]$ . By performing the same procedure for cluster 2, the resulting 95% probability interval is $[1, 75]$ . These probability intervals can then be used by a panel of experts to make case-related decisions.

5. Discussion

In this paper, we proposed a simple method to segment expert knowledge that can be effectively applied to functional data or problems with large volumes of data (data mining). Implementing such method, we built, after a discretization process, a distance matrix between curves using the Hellinger distance.

A simulation study considering different scenarios was presented. In such study, our proposed method performed better in almost all scenarios than the k-means and agglomerative nesting clustering algorithms for functional data (as implemented in the kmean.fd and agnes functions in R, respectively). Therefore, it proved to be a useful tool to perform a cluster analysis with distributions elicited from experts’ personal beliefs. Based on the computers’ lifetime example, we conclude that the proposed clustering method can be used to segment expert knowledge. Furthermore, it can identify the expert with the highest level of expertise, which is very important when analyzing experts’ beliefs considering their different points of view.

Along the same lines of this proposal, further research topics in this field include sensitivity analyses when different initial values are considered. Another interesting area, based on the results of the analyses above, is developing a robust clustering method to generate data partitions, namely a clustering method that performs well in different conditions. Additionally, other distributions (e.g., Poisson), samples, and cluster sizes should be considered in future simulation studies to further assess the performance of our method.

Finally, we believe our proposed method has implications for two specific areas; combination of expert knowledge and machine learning (ML), and Bayesian inference. Supervised (e.g., classification) and unsupervised (e.g., clustering) ML methods have gained momentum in current research. For example, it has been shown that ML-based classification ensembles perform better than experts in segregating viable from non-viable embryos [35] and that (evolutionary) clustering algorithms enable characterising COVID-19-related textual tweets in order to assist institutions in making decisions [36]. In general, researchers pitch ML-based analyses against human expert knowledge because they see no value in it or because do not know how to integrate into the analysis pipeline. Evidence, however, suggests that blending expert knowledge with ML analyses leads to better predictions [37,38,39,40]. We believe our proposal can be used to contribute in this rather under-investigated area.

A challenge in Bayesian inference is to determine the effect of different prior distributions on a parameter to be estimated. In this regard, Stefan et al., (2021) [20] found that the variability in prior distributions from six experts did not affect the qualitative conclusions associated to estimated Bayes factors. Their results thus suggest that although there can be quantitative differences between the elicited prior distributions, the overall qualification associated to those quantities is not altered. We believe, though, that if the elicited distributions show considerable fluctuation in terms of location, scale and shape, those distributions need to be subjected to clustering in order to segregate levels of expertise (see Figure 2, Figure 7 and Figure 9). The method proposed herein can serve for such purpose.

6. Conclusions

Although the proposed clustering method is designed for data obtained from experts’ belief curves, we demonstrated it can be applied to data sets that do not correspond to distributions or curves (we show this in three supplementary examples). In conclusion, the proposed method offers encouraging applications using other types of data sets encountered in data mining problems.

Author Contributions

Conceptualisation and methodology, C.B.-C., J.C.C. and F.T.-A.; validation and formal analysis, C.B.-C., F.M.-R.; investigation, C.B.-C.; resources, C.B.-C., F.M.-R.; writing—original draft preparation, C.B.-C., F.M.-R. and A.Z.; writing—review and editing, C.B.-C., F.M.-R. and A.Z.; visualisation, C.B.-C.; supervision, J.C.C.; project administration, C.B.-C.; funding acquisition, C.B.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are openly available in https://figshare.com/projects/An_FDA-based_approach_for_clustering_expert_knowledge/73437 (acessed on 13 December 2020).

Acknowledgments

C.B.-C. thanks Instituto Tecnológico Metropolitano -ITM- for their support. The authors dedicate this work to the memory of prof. Francisco Torres-Avilés. Figure 1 was designed by Daniela Álvarez Lleras ([email protected]) and it is inspired on the impossible trident optical illusion.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CCR	Correct classification rate
FC	Functional clustering
FDA	Functional data analysis
F.M	Fowlkes–Mallows
ICT	Information and communications technology
J	Jaccard coefficient
R	Rand index

Appendix A

This section presents the performance of the proposed method in different scientific fields. Note that this proposal was focused on elicited prior distributions, but it could be applied using other types of data sets.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. Illustration of two experts having different points of view in regards to a specific phenomenon.

View Image - Figure 2. Level of certainty of two hypothetical experts (red and black curves) and their mean (green curve). Note that, when the location parameters of the experts’ distributions are different, those of the resulting mean distribution (green curve) may be considerably different from the experts’ actual opinion.

Figure 2. Level of certainty of two hypothetical experts (red and black curves) and their mean (green curve). Note that, when the location parameters of the experts’ distributions are different, those of the resulting mean distribution (green curve) may be considerably different from the experts’ actual opinion.

View Image - Figure 3. Illustration of the expert knowledge elicitation workflow. E.K.E.(me) = expert knowledge elicitation from multiple experts; P.D.g = generation of prior distributions; P.D.c = clustering of prior distributions; C1,C2…Cn = cluster 1 to cluster n; P.iC1,P.iC2…P.iCn = prior or posterior inference for each cluster; D.M.(e) = decision making with a panel of experts. The long dash arrow from D.M.(e) to E.K.E.(me) indicates that workflow can be iterated if required. The goal of the workflow is to create clusters of experts and analyze the clusters separately by means of prior or posterior estimations. A final decision is reached via a panel of experts. Step IV is unique to our proposal in that it shows where the clustering of elicited prior distributions takes place in the EKE workflow.

Figure 3. Illustration of the expert knowledge elicitation workflow. E.K.E.(me) = expert knowledge elicitation from multiple experts; P.D.g = generation of prior distributions; P.D.c = clustering of prior distributions; C1,C2…Cn = cluster 1 to cluster n; P.iC1,P.iC2…P.iCn = prior or posterior inference for each cluster; D.M.(e) = decision making with a panel of experts. The long dash arrow from D.M.(e) to E.K.E.(me) indicates that workflow can be iterated if required. The goal of the workflow is to create clusters of experts and analyze the clusters separately by means of prior or posterior estimations. A final decision is reached via a panel of experts. Step IV is unique to our proposal in that it shows where the clustering of elicited prior distributions takes place in the EKE workflow.

View Image - Figure 4. Illustration of some assessed simulation schemes where three clusters of n=30 curves with a (a) separated, (b) overlapped, and (c) atypical structure are generated. Clusters of Normal(μ,σ2) and Gamma(α,β) distributions are presented in the first and second columns, respectively. In (c), atypical curves are shown in blue.

Figure 4. Illustration of some assessed simulation schemes where three clusters of n=30 curves with a (a) separated, (b) overlapped, and (c) atypical structure are generated. Clusters of Normal(μ,σ2) and Gamma(α,β) distributions are presented in the first and second columns, respectively. In (c), atypical curves are shown in blue.

View Image - Figure 5. Performance indexes when two and three clusters of overlapped and separated (A) Normal(μ,σ2), (B) Gamma(α,β), and (C) Beta(α,β) distributions are generated.

Figure 5. Performance indexes when two and three clusters of overlapped and separated (A) Normal(μ,σ2), (B) Gamma(α,β), and (C) Beta(α,β) distributions are generated.

View Image - Figure 6. Performance indexes when two and three clusters of (A) Normal(μ,σ2), (B) Gamma(α,β), and (C) Beta(α,β) distributions with one atypical curve are generated.

Figure 6. Performance indexes when two and three clusters of (A) Normal(μ,σ2), (B) Gamma(α,β), and (C) Beta(α,β) distributions with one atypical curve are generated.

Figure 7. Prior distributions of the beliefs of six experts.

Figure 8. Cluster dendrogram of the six experts.

View Image - Figure 9. Prior distributions by cluster of experts. Expert 3 is represented by the dotted PDF and the remaining experts are represented by the solid PDF.

Figure 9. Prior distributions by cluster of experts. Expert 3 is represented by the dotted PDF and the remaining experts are represented by the solid PDF.

Table 1

Methods for clustering functional data.

Method	Description
Functional k-means	It is an extension of the traditional k-means clustering algorithm for
kmeans.fd	functional data analysis. This method uses a special metric for
	functional data.
Agglomerative hierarchical	It computes agglomerative hierarchical clustering of the data set using
clustering Agnes (average method)	the average method, where the distance between two clusters is the
	average of the dissimilarities between the points in one cluster and the
	points in the other cluster. A complete description of agglomerative
	nesting (agnes) can be found in chapter 5 of Kaufman and
	Rousseeuw (1990) [30].
Agglomerative hierarchical	It computes agglomerative hierarchical clustering of the data set using
clustering Agnes (Ward’s method)	Ward’s method, where the agglomerative criterion is based on the
	optimal value of an objective function, which is usually the sum of
	squared errors.

Table 2

Average value of $δ$ in all simulation scenarios.

Scenarios	$δ$ -Overlapping Average
Scenarios	2 Clusters	3 Clusters
Overlapped Normal curves	0.953	0.961
Overlapped Gamma curves	0.974	0.969
Overlapped Beta curves	0.955	0.959
Separated Normal curves	0.052	0.066
Separated Gamma curves	0.078	0.071
Separated Beta curves	0.064	0.067

Table 3

Experts’ years of experience in the ICT area and years in their current job. Note for example that while E3 and E4 have the same number of years of expertise, E3 has more experience working the computers being assessed in this study.

	E1	E2	E3	E4	E5	E6
Expertise	2.5	5	17	17	4	7
Current Job	0.5	1.3	4	2	1	2.5

Table 4

First failure times (in months) of 72 desktops. Out of the 72 desktop computers, 17 desktops were reported to have failed at the time of the study. The failure times of the remaining 55 desktops were denoted by $66^{+}$ (this indicates that these computers had not failed by the time when the study was carried out).

14,07	17,80	19,43	21,33	24,60	28,97	29,63	33,73	37,60	37,67	40,87	52,40
53,97	60,57	64,27	65,43	65,43	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$
$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$
$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$
$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$
$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$	$66^{+}$

Word count: 5567

Show less

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Expert knowledge elicitation (EKE) aims at obtaining individual representations of experts’ beliefs and render them in the form of probability distributions or functions. In many cases the elicited distributions differ and the challenge in Bayesian inference is then to find ways to reconcile discrepant elicited prior distributions. This paper proposes the parallel analysis of clusters of prior distributions through a hierarchical method for clustering distributions and that can be readily extended to functional data. The proposed method consists of (i) transforming the infinite-dimensional problem into a finite-dimensional one, (ii) using the Hellinger distance to compute the distances between curves and thus (iii) obtaining a hierarchical clustering structure. In a simulation study the proposed method was compared to k-means and agglomerative nesting algorithms and the results showed that the proposed method outperformed those algorithms. Finally, the proposed method is illustrated through an EKE experiment and other functional data sets.

Details

Title

An FDA-Based Approach for Clustering Elicited Expert Knowledge

Author

Barrera-Causil, Carlos¹

; Correa, Juan Carlos²

; Zamecnik, Andrew³; Torres-Avilés, Francisco⁴; Marmolejo-Ramos, Fernando³

¹ Grupo de Investigación Davinci, Facultad de Ciencias Exactas y Aplicadas, Instituto Tecnológico Metropolitano -ITM-, Medellín 050034, Colombia; [email protected]
² Escuela de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia sede Medellín, Medellín 050034, Colombia; [email protected]
³ Centre for Change and Complexity in Learning, The University of South Australia, Adelaide 5000, Australia; [email protected]
⁴ Departamento de Matemática y Ciencia de la Computación, Universidad de Santiago de Chile, Santiago 9170020, Chile

First page

184

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

2571905X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/stats4010014

ProQuest document ID

2521445952

An FDA-Based Approach for Clustering Elicited Expert Knowledge

Jump to:

Full text

Abstract

Details

Suggested sources