Content area
Full text
PSYCHOMETRIKAVOL. 73, NO. 4, 753775
DECEMBER 2008
DOI: 10.1007/S11336-008-9065-0
REGULARIZED MULTIPLE-SET CANONICAL CORRELATION ANALYSIS
YOSHIO TAKANE AND HEUNGSUN HWANG
MCGILL UNIVERSITY
HERV ABDI
UNIVERSITY OF TEXAS AT DALLAS
Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we present a simple regularization technique for GCANO and demonstrate its usefulness. Regularization is deemed important as a way of supplementing insufcient data by prior knowledge, and/or of incorporating certain desirable properties in the estimates of parameters in the model. Implications of regularized GCANO for multiple correspondence analysis are also discussed. Examples are given to illustrate the use of the proposed technique.
Key words: information integration, prior information, ridge regression, generalized singular value decomposition (GSVD), G-fold cross validation, permutation tests, the Bootstrap method, multiple correspondence analysis (MCA).
1. Introduction
Multiple-set canonical correlation analysis (GCANO) subsumes a number of representative techniques of multivariate data analysis as special cases (e.g., Gi, 1990). Perhaps for this reason, it has attracted the attention of so many researchers (e.g., Gardner, Gower & le Roux, 2006; Takane & Oshima-Takane, 2002; van de Velden & Bijmolt, 2006; van der Burg, 1988). When the number of data sets K is equal to two, GCANO reduces to the usual (2-set) canonical correlation analysis (CANO), which in turn specializes into canonical discriminant analysis or MANOVA, when one of the two sets of variables consists of indicator variables, and into correspondence analysis (CA) of two-way contingency tables when both sets consist of indicator variables. GCANO also specializes into multiple correspondence analysis (MCA) when all K data sets consist of indicator variables representing patterns of responses to multiple-choice items, and into principal component analysis (PCA) when each of the K data sets consists of a single continuous variable. Thus, introducing some useful modication to GCANO has far reaching implications beyond what is normally referred to as GCANO.
GCANO analyzes the relationships among K sets of variables. It can also be viewed as a method for information integration from K distinct sources (Takane & Oshima-Takane, 2002; see also Dahl & Ns, 2006,...





