Regularized Multiple-Set Canonical Correlation

Full text

Translate

PSYCHOMETRIKAVOL. 73, NO. 4, 753775

DECEMBER 2008

DOI: 10.1007/S11336-008-9065-0

REGULARIZED MULTIPLE-SET CANONICAL CORRELATION ANALYSIS

YOSHIO TAKANE AND HEUNGSUN HWANG

MCGILL UNIVERSITY

HERV ABDI

UNIVERSITY OF TEXAS AT DALLAS

Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we present a simple regularization technique for GCANO and demonstrate its usefulness. Regularization is deemed important as a way of supplementing insufcient data by prior knowledge, and/or of incorporating certain desirable properties in the estimates of parameters in the model. Implications of regularized GCANO for multiple correspondence analysis are also discussed. Examples are given to illustrate the use of the proposed technique.

Key words: information integration, prior information, ridge regression, generalized singular value decomposition (GSVD), G-fold cross validation, permutation tests, the Bootstrap method, multiple correspondence analysis (MCA).

1. Introduction

Multiple-set canonical correlation analysis (GCANO) subsumes a number of representative techniques of multivariate data analysis as special cases (e.g., Gi, 1990). Perhaps for this reason, it has attracted the attention of so many researchers (e.g., Gardner, Gower & le Roux, 2006; Takane & Oshima-Takane, 2002; van de Velden & Bijmolt, 2006; van der Burg, 1988). When the number of data sets K is equal to two, GCANO reduces to the usual (2-set) canonical correlation analysis (CANO), which in turn specializes into canonical discriminant analysis or MANOVA, when one of the two sets of variables consists of indicator variables, and into correspondence analysis (CA) of two-way contingency tables when both sets consist of indicator variables. GCANO also specializes into multiple correspondence analysis (MCA) when all K data sets consist of indicator variables representing patterns of responses to multiple-choice items, and into principal component analysis (PCA) when each of the K data sets consists of a single continuous variable. Thus, introducing some useful modication to GCANO has far reaching implications beyond what is normally referred to as GCANO.

GCANO analyzes the relationships among K sets of variables. It can also be viewed as a method for information integration from K distinct sources (Takane & Oshima-Takane, 2002; see also Dahl & Ns, 2006,...

Show less

Regularized Multiple-Set Canonical Correlation Analysis

Full text

Suggested sources

Regularized Multiple-Set Canonical Correlation Analysis

Content area

Full text

Suggested sources