Content area
Full Text
psychometrikavol. 82, no. 1, 158185 March 2017doi: 10.1007/s11336-016-9514-0
CLUSTER CORRESPONDENCE ANALYSIS
M. van de Velden
ERASMUS UNIVERSITY ROTTERDAM
A. Iodice DEnza
UNIVERSIT DI CASSINO E DEL LAZIO MERIDIONALE
F. Palumbo
UNIVERSIT DEGLI STUDI DI NAPOLI FEDERICO II
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unied framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study conrms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Key words: correspondence analysis, cluster analysis, dimension reduction, categorical data.
1. Introduction
Cluster analysis aims to nd a meaningful allocation of observations to groups that are similar with respect to a set of observed variables. Depending on the kind of data, an appropriate similarity measure is selected and used to allocate observations to clusters of points with high similarity within a cluster and small similarity between the clusters. To interpret cluster analysis solutions, the distributions over the variables in the different clusters can be considered. When many variables are involved, computation of all dissimilarities may become cumbersome. Moreover, interpretation of the results in terms of (relative) distributions of the variables may not be straightforward. Dimension reduction and visualization techniques can be used to overcome computational issues and at the same time facilitate a more straightforward interpretation of the cluster solutions. In this paper, we concern ourselves with clustering of high-dimensional categorical data. Existing dimension reduction and cluster analysis methods are reviewed, and we propose a method that jointly yields optimally separated clusters and a low-dimensional approximation of the cluster by variable associations.
For continuous data, several...