Content area
Full Text
Computational Statistics (2008) 23:99109
DOI 10.1007/s00180-007-0071-y
ORIGINAL PAPER
Empirical model selection in generalized linear mixed effects models
Christian Lavergne Marie-Jos Martinez
Catherine Trottier
Accepted: 15 September 2006 / Published online: 14 July 2007 Springer-Verlag 2007
Abstract This paper focuses on model selection in generalized linear mixed models using an information criterion approach. In these models in general, the response marginal distribution cannot be analytically derived. Thus, for parameter estimation, two approximations are revisited both leading to iterative model linearizations. We propose simple model selection criteria adapted from information criteria and based on the linearized model obtained at convergence of the algorithm. The quality of derived criteria are evaluated through simulations.
Keywords Generalized linear models Random effects Model selection
Information criterion
1 Introduction
Model selection is a key step in the modeling process. A fundamental question, given a data set, is to choose the best approximating model among a class of competing models with different numbers of parameters. For this, a suitable model selection criterion is needed which takes parsimony into account. Akaike (1973, 1974) was probably one of the rst who developed such a criterion for the identication of an optimal and parsimonious model in data analysis. This criterion called AIC (Akaikes Information Criterion) is based on the log likelihood and can be derived as an approximation of the KullbackLeibler information (Kullback and Leibler 1951). Our aim in this paper
C. Lavergne M.-J. Martinez (B) C. Trottier
Institut de Mathmatiques et de Modlisation de Montpellier, UMR CNRS 5149, Equipe de Probabilits et Statistique, Universit Montpellier II,Cc 051, Place Eugne Bataillon, 34095 Montpellier Cedex 5, Francee-mail: [email protected]
123
100 C. Lavergne et al.
is to show how a model selection strategy can be set up using information criterion approach in a generalized linear mixed model (GLMM) framework. In these models, distribution assumptions are made conditionally on the non-observed random effects. Thus in order to evaluate the likelihood, integral calculus is required which is not analytically feasible from a general point of view. Therefore several kinds of approximations are considered. For the estimation problem, a classical approach consists of a numerical approximation of the integral by gaussian quadrature (Anderson and Aitkin 1985). Even with great computing capacities, this approach seems to be limited to relatively simple problems...