Abstract

Background

Compositional data comprise the parts of a ‘whole’ (or ‘total’), which sum to that ‘whole’. The ‘whole’ may vary between units of analyses, or it may be fixed (constant). For example, total energy intake (a variable total) is the sum of intake from all foods or macronutrients. Total time in a day (a fixed total) is the sum of time spent engaging in various activities. There exist different approaches to analysing compositional data, such as the isocaloric or isotemporal model, ratio variables, and compositional data analysis (CoDA). Although the performance of the different approaches has been compared previously, this has only been conducted in real data. Since the true relationships are unknown in real data, it is difficult to compare model performance in estimating a known effect. We use data simulations of different parametric relationships, to explore and demonstrate the performance of each approach under various possible conditions.

Methods

We simulated physical activity time-use and dietary data as examples of compositional data with fixed and variable totals, respectively, using different parametric relationships between the compositional components and the outcome (fasting plasma glucose): linear, log2, and isometric log-ratios. We evaluated the performance of a range of generalised linear and additive models as well as CoDA, in estimating a 1-unit and either 10-unit (for physical activity) or 100-unit (for dietary data) reallocations under each parametric scenario. We simulated 10,000 datasets with 1,000 observations in each.

Results

The performance of each approach to analysing compositional data depends on how closely its parameterisation matches the true data generating process. Overall, we demonstrated that the consequences of using an incorrect parameterisation (e.g. using CoDA when the true relationship is linear) are more severe for larger reallocations (e.g. 10-min or 100-kcal) than for 1-unit reallocations. The implications of choosing an unsuitable approach may be starker in compositional data with variable totals. For example, while models with ratio variables are mathematically equivalent to linear models in compositional data with fixed totals, their estimates may be radically different for variable totals.

Conclusions

Compositional data with fixed and variable totals behave differently. All existing approaches to analysing such data have utility but need to be carefully selected. Investigators should explore the shape of the relationships between the compositional components and the outcome and chose an approach that matches it best.

Details

Title
A comparison of methods for analysing compositional data with fixed and variable totals: a simulation study using the examples of time-use and dietary data
Author
Tomova, Georgia D; Walmsley, Rosemary; Berrie, Laurie; Morris, Michelle A; Tennant, Peter W G
Pages
1-14
Section
Research
Publication year
2025
Publication date
2025
Publisher
BioMed Central
e-ISSN
14712288
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3201550544
Copyright
© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.