Full text

Turn on search term navigation

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models.

Details

Title
Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
Author
Gómez-Guerrero, Santiago 1   VIAFID ORCID Logo  ; Ortiz, Inocencio 1 ; Sosa-Cabrera, Gustavo 1   VIAFID ORCID Logo  ; García-Torres, Miguel 2   VIAFID ORCID Logo  ; Schaerer, Christian E 1   VIAFID ORCID Logo 

 Polytechnic School, National University of Asuncion, San Lorenzo 2111, Paraguay; [email protected] (I.O.); [email protected] (G.S.-C.); [email protected] (C.E.S.) 
 Data Science and Big Data Lab, Universidad Pablo de Olavide, ES-41013 Seville, Spain; [email protected] 
First page
64
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
10994300
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2621283710
Copyright
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.