Content area
Background
The significance of interrelated risk factors for Type 2 diabetes (T2D) is not easily demonstrated by conventional statistical methods. This study aims to investigate the principal components of T2D risk by employing exploratory factor analysis in Iranian cohort. The analysis encompasses a range of sociodemographic, lifestyle, and health-related variables to uncover clusters of factors associated with the risk of T2D.
Methods
Cross-sectional data of 1200 diabetic and 1200 nondiabetic Iranian adults was analyzed using STATA 14.2 (p < 0.05). Pearson’s Chi-squared test was used to assess the difference between the two groups. Spearman correlation explored the relationships between variables. Separate factor analyses were conducted for diabetic, non-diabetic, and combined groups. Principal component analysis (PCA) identified the initial components. Crude and adjusted logistic regressions examined the associations between derived factors and T2D risk.
Results
PCA identified eleven components with eigenvalues ≥ 1, accounting for 65.09% of the variance. Logistic regression analysis highlighted several significant associations with T2D risk. Positive associations were observed for PC1 (“drugs, smoking, and alcohol”), PC2 (“chronic diseases”, including age, hypertension, dyslipidemia, and coronary heart disease), PC3 (“lipids”, such as triglycerides, cholesterol, and low-density lipoprotein), PC4 (“body mass”, including BMI, waist circumference, and waist-to-hip ratio), PC5 (“gestational-related risks”, such as gestational diabetes and gestational hypertension), and PC6 (“glucose/lipid factors”, including fasting glucose, triglycerides, and an inverse relationship with high-density lipoprotein). Conversely, negative associations with T2D risk were found for PC7 (“socioeconomic factors”, such as socioeconomic status and education), PC8 (inverse association with age, along with fatty liver, thyroid disorders, and low waist-to-hip ratio), and PC10 (marital status, sleep duration, low fasting glucose, lower age, and an inverse association with fatty liver).
Conclusions
Key metabolic clusters, including “Lipids”, “Body Mass”, “Chronic Diseases”, and “Glucose/Lipid” align with previous findings. These results underscore the multifactorial and interconnected nature of T2D risk, highlighting underlying physiological processes.
Background
Type2 diabetes (T2D), the most prevalent type of diabetes, occurs when the pancreas does not produce enough insulin or when cells respond poorly to insulin and take in less sugar. Not enough glucose then reaches the cells and the level of glucose in the blood becomes high; without treatment, T2D can increase the risk of serious complications affecting the cardiovascular, nervous, ocular, and lower extremity systems [1]. According to the International Alliance of Patients’ Organizations (IAPO), current global statistics show that 9.3% (463 million) of individuals have diabetes, a figure projected to rise to 10.2% (700 million) by 2045 [2]. T2D accounts for 90–95% of all cases of diabetes [1]. T2D alone was responsible for more than 1 million deaths in 2017, ranked as the nine leading cause of death [3].
T2D has many risk factors and many of these risk factors are strongly intercorrelated with each other. Obesity, central adiposity, hypertension, insulin resistance, hypertriglyceridemia, high blood lipid, and inactivity have all been described as risk factors for T2D [4, 5]. Factor analysis is a statistical technique used to explore relationships among a large number of correlated variables. Its primary aim is to reduce these variables into a smaller set of latent factors that capture the underlying patterns or structures within the data. This method has been widely applied to examine the clustering of risk factors in various diseases, including metabolic syndrome and T2D [6,7,8,9,10,11,12,13,14,15,16,17].
In the context of T2D, most studies employing factor analysis have concentrated on identifying clusters of risk factors associated with metabolic syndrome, a key determinant of diabetes [12,13,14,15,16,17]. However, critical contributors to T2D risk, such as socioeconomic status, lifestyle behaviors, and chronic disease history, have not been fully explored using this approach. Moreover, no study to date has integrated such a comprehensive range of factors—including sociodemographic variables, lifestyle behaviors, anthropometric measures, blood biomarkers, and chronic medical conditions—into a single factor analysis in any population.
This study aims to investigate, using exploratory factor analysis, the clustering of health- related variables- including lifestyle behaviors, anthropometric measures, blood biomarkers, chronic medical conditions, and sociodemographic factors—that may be associated with the risk of T2D in a large sample of both diabetic and non-diabetic Iranian adults from Yazd province. Additionally, the study examines the association between these clusters and the risk of T2D. To the best of our knowledge, no similar study has been undertaken on the Iranian population. This research is particularly relevant for Iran and the Middle East, where the prevalence of T2D is rapidly increasing, and where sociodemographic factors, lifestyle behaviors, and metabolic risk factors may interact in ways distinct from those observed in Western populations.
This study makes significant contributions to the existing literature on T2D risk factors. First, it is among the few to employ factor analysis to assess a broad range of variables- including sociodemographic, Lifestyle, and Health measures- beyond the traditional metabolic syndrome framework. Second, by analyzing a large sample of diabetic and non-diabetic adults from Yazd, a province with a high prevalence of diabetes, 16.3% in 2012 and 17.9% in 2020 [18, 19], this study helps us understand how different health-related factors are connected to the risk of type 2 diabetes (T2D). Third, the findings provide critical insights into the multifactorial nature of T2D in the Middle Eastern context, where cultural, socioeconomic, and health system factors may play a unique role in shaping T2D risk. Finally, these findings provide important information that can help create more effective prevention and intervention strategies in Iran and other developing countries dealing with similar public health issues.
Methods
Study sample
The present data comes from the Yazd Shahedieh cohort, which is part of PERSIAN, Prospective Epidemiological Research Studies in Iran, conducted by the Institute of Gastrointestinal and Liver Diseases, WHO Collaborating Centre for Research on NCDs and Gastrointestinal Cancers. Launched in 2016, the study focuses on noncommunicable diseases and uses a rigorous data collection methodology.
The inclusion criteria for this study were individuals aged between 30 and 73 years, permanent residence in the mentioned district over the last 9 months, continuous residency, and willingness to participate in the study, with no additional exclusion criteria. Participants were recruited using a door-to-door approach based on a regional population census, ensuring comprehensive coverage of the target population. Potential selection biases may include underrepresentation of individuals who are difficult to reach through the door-to-door approach, such as those who are unavailable during the recruitment period (e.g., due to work commitments, health issues, or travel). Additionally, individuals who are less willing to participate may differ systematically from those who agree to participate, potentially influencing the generalizability of the results. However, efforts were made to minimize these biases by using a systematic, region-based recruitment method and ensuring that the sample reflects the demographic characteristics of the broader population in the district.
Data were collected by a trained team through in-person meetings with participants at the cohort center. To ensure the confidentiality and protection of participants’ data, strict protocols were adhered to throughout the study. Participants’ identities were anonymized by assigning unique identifiers to all data records. Additionally, all data were securely stored in encrypted files and accessible only to authorized personnel. Data handling procedures followed national and international guidelines for research involving human subjects, ensuring compliance with ethical standards for privacy and data security. Detailed information about the PERSIAN study is available online [20].
A total of 2400 participants, including 1200 diabetic and 1200 non-diabetic individuals aged 30–73 years, were recruited from Yazd province. This balanced sample size ensures robust statistical testing, such as Pearson’s Chi-squared test (χ²), to assess differences between the two groups in demographic, lifestyle, anthropometric, and health-related variables. The equal allocation also provides adequate power for separate factor analyses of the diabetic and non-diabetic groups, allowing meaningful comparisons while minimizing potential biases. Additionally, the sample size of 2,400 participants was selected to ensure stable factor loadings in the factor analysis, meeting the recommended participant-to-variable ratio of at least 10–20 per variable for reliable results [21]. By including equal numbers of diabetic and non-diabetic individuals, the study ensures a comprehensive, unbiased examination of T2D risk factor clustering across both populations, eliminating potential biases from overrepresentation or underrepresentation of either group. This methodological choice was reviewed and approved by experts during the study design phase, further validating its appropriateness for the study’s goals.
Study variables
Our study variables include sociodemographic characteristics (age, sex, marital status, education level, socioeconomic status). The socioeconomic status (SES) variable in this study was calculated by the cohort center as a composite index, incorporating key dimensions of SES such as household income, education level, and occupational status. Additional indicators included housing quality and living conditions, asset ownership, access to utilities and facilities, and travel history (e.g., frequency of domestic and international travel). The SES score specifically accounted for family size, education level (academic vs. non-academic), asset ownership (e.g., house ownership), housing characteristics, and access to valuable assets such as vehicles, appliances, and electronics. These variables were statistically combined to generate a standardized SES score, providing a comprehensive representation of participants’ socioeconomic position.
Anthropometric measures include body mass index (BMI), waist circumference, and waist- to- hip ratio. Laboratory results comprised total cholesterol, triglyceride, High-density lipoprotein (HDL), low-density lipoprotein (LDL), and fasting glucose. The history of chronic diseases included diabetes family history, gestational diabetes, gestational hypertension, hypertension, dyslipidemia, coronary heart disease, fatty liver, thyroid disorders. Lifestyle behaviors were assessed using validated tools and questionnaires, including measures of physical activity, smoking history, current smoking, drug use, alcohol consumption, and total sleep duration per 24 h. Physical activity was measured using the International Physical Activity Questionnaire (IPAQ) [22], and total sleep duration per 24 h was self-reported. Smoking history was defined as using at least 100 cigarettes in a lifetime, while current smoking referred to smoking daily, consistent with definitions from the Centers for Disease Control and Prevention (CDC) [23]. Alcohol consumption was recorded based on self-reported frequency and quantity of use, following standards outlined by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) [24].
According to World Health Organization (WHO) criteria the study variables were graded as follows: waist circumference: normal (men < 102 and women < 88), abnormal( men ≥ 102 and women ≥ 88); waist- to- hip ratio: abnormal (men > 0.9 and women > 0.85), normal (men ≤ 0.9 and women ≤ 0.85); HDL: normal (men ≥ 40 and women ≥ 50), abnormal (men < 40 and women < 50); LDL: normal(< 130), abnormal (≥ 130), cholesterol: normal(< 200), abnormal (≥ 200) [25,26,27]. The subject was defined as diabetic if the diagnosis was made during medical care and the subject was taking drug treatment for T2D [28].
Statistical analysis
Statistical analyses were conducted with the STATA software, version 14.2 and the p-value < 0.05 was considered statistically significant. The results for qualitative variables are given as frequency and percentage and quantitative variables were described with mean and standard deviation (SD). The Pearson’s Chi-squared test (χ2) was used to assess differences between the two groups in the prevalence of T2D, based on demographic, lifestyle, anthropometric, and health-related variables, as it is commonly applied to compare categorical variables and test for associations between groups. The sampling suitability for factor analysis was proven by use of Kaiser-Meyer-Olkin test (KMO), whereas KMO ˃ 0.6 suggests the variables are correlated and the data is suited for factor analysis [29]. Spearman correlation was also performed to explore the correlation between original variables.
Exploratory factor analysis (EFA) was employed to identify underlying patterns without pre-defined hypotheses, as the study’s exploratory nature did not necessitate confirmatory factor analysis (CFA), which relies on a pre-specified model [30]. EFA was conducted separately for diabetic, non-diabetic, and combined groups, consisting of (1) extraction of initial components using principal component analysis (PCA) to estimate the number of factors, chosen for its ability to identify the underlying structure and reduce dimensionality; (2) rotation of components using orthogonal varimax rotation, which ensures that the resulting factors remain uncorrelated and maximizes the interpretability of the factors; and finally (3), interpretation of factors with eigenvalues > 1 (Kaiser criterion) and a cut-off of > 0.20 (P˂0.05) for absolute values of loadings to retain variables that demonstrated meaningful associations with the underlying components. Then, a factor score was estimated for each variable at the individual level using the “predict” command. Logistic regression models were used to investigate the association of derived factors (PCs) with the risk of developing T2D.
Results
The characteristics of the study population
Demographic characteristics and lifestyle behaviors
The characteristics of the study population are presented in Table 1. The mean age of the study population was 51.7 years. 52.92% (1270 individuals) were female, 94.75% (2274 people) married, 35% (840 people) were aged 50 to 60 years old, and 21.08% (506 people) were aged 60 to 70 years old. The youngest age group, 30 to 40 years old, contained the smallest percent of the population with 15.17% (364 people). 28.42% completed high school, and 10.67% had attained a university or college certificate. 13.92% were in low socioeconomic status, 79.42% in middle socioeconomic status, and 6.67% had high socioeconomic status. 15.07% (284 people) were current smokers, 12.87% (308 people) had drug use, and 4.85% (116 people) had alcohol consumption (Table 1).
[IMAGE OMITTED: SEE PDF]
Health status
21.50% (516 people) had normal BMI, 43% (1032 people) were identified as overweight (BMI between 25 and 30), and 35.50% (852 people) were identified as obese (BMI more than 30). 52.08% (1250 people) had high waist circumference, and 85% (2040 people) had high waist-to-hip ratio. 51.58% (1162 people) were physically inactive, and 50.75% belonged to the abnormal sleep duration group. 22.34% (534 people) had a family history of diabetes, 2% (48 women) had gestational hypertension, and 3.29% (79 women) had gestational diabetes. 30% (720 people) were hypertensive, 28.42% (682 people) had dyslipidemia, 8.42% (202 people) had coronary heart disease, 10.25% (246 people) had fatty liver, and 12.83% (308 people) had thyroid disorders. 33% (792 people) had abnormal fasting glucose levels, 56.25% (1350 people) had abnormal triglyceride (TG) levels, 43.75% (1050 people) had abnormal cholesterol levels, 27.33% (656 people) had abnormal HDL levels, and 27.58% (662 people) had abnormal LDL levels (Table 1).
Prevalence of type 2 diabetes by sociodemographic and health variables
The prevalence of T2D in terms of sociodemographic, anthropometric, lifestyle and health- status variables showed disease was significantly more prevalent in the age of over sixty years (63.24%) and 50- 60- year- old group (60.95%), in single individuals (66.67%), in illiterate people (68.16%) and people with primary- school education (58.03), in high- socioeconomic- status people (62.50%), in obese people (56.34%), in people with high waist circumference (61.76%), in people with high waist- to- hip ratio (55.50%), in current smokers (48.60%), in people with the history of consuming illicit drugs during lifetime (62.34%), in physically inactive individuals (57.83% ), in people with abnormal sleep duration (53.98%), in hypertensive (71.39%), in women with the history of gestational hypertension (75.00%) and gestational diabetes (100%), in people with diabetes family history (57.68%), in people diagnosed with dyslipidemia (70.67%), in people diagnosed with coronary heart diseases (72.28%), in people diagnosed with fatty liver(58.54%), in people diagnosed with thyroid disorders (58.44%), in people with abnormal glucose level (95.45%), in people with abnormal triglycerides level (58.67%), in people with abnormal HDL (55.49%). A p-value < 0.05 was considered statistically significant (Table 2).
[IMAGE OMITTED: SEE PDF]
Correlations between study variables
Correlations among variables are presented in Supplemental Table S1. For most variables correlations were of modest to moderate magnitude (0.2– 0.5). A notable exception was that between hypertension and dyslipidemia, which was 0.9. Correlation between cholesterol and LDL was 0.6, and between smoking and Drug use, it was 0.5. Positive correlations between diabetes indicators (glucose and triglycerides) and several health conditions, such as hypertension (0.2023), waist circumference positively correlated with BMI (0.4655) and with waist-to-hip ratio (0.3072). Gender positively correlated with marital status, smoking, thyroid disorders and waist-to-hip ratio. Age positively correlated with hypertension, dyslipidemia, and waist-to-hip ratio. Smoking status and waist-to-hip ratio were correlated with several health measures, indicating potential health risks.
Exploratory factor analysis
For the KMO test, an overall value of 0.71 was obtained, indicating the suitability of data for performing factor analysis. Varimax rotated factors and their loadings on original variables are shown in Table 3. Based on the eigenvectors of the PCs, these PCs can be characterized as follows: smoking history, current smoking, drug use, and alcohol consumption had significant (˃0.20) loadings on PC1, labeled as “smoke, alcohol, and drug use” Factor, which accounted for 9.30% of the total variance. Age, hypertension, dyslipidemia, and coronary heart disease (CHD) loaded on PC2, labeled as “chronic diseases” factor. High triglycerides levels, high cholesterol and high LDL levels loaded on PC3, labeled as “lipids” factor. High BMI, high waist circumference, and high waist- to- hip ratio loaded on PC4, labeled as “body mass” factor. Gestational diabetes and gestational hypertension both heavily loaded on PC5, labeled as “pregnancy-related risk” factor. High fasting glucose levels, high triglycerides and low HDL levels loaded on Pc 6, labeled as “glucose/lipid” factor. High socioeconomic status and high education level both heavily loaded on PC7 labeled as “socioeconomic” factor. Age (inversely), fatty liver, thyroid, and low waist- to- hip ratio loaded on factor8, labeled as “age and metabolic” factor. Age, low physical activity, and sleep duration (per 24 h) loaded on PC9, labeled as “lifestyle” factor. Marital status, sleep duration (per 24 h), low glucose levels, low age and fatty liver (inversely) loaded on factor10, labeled as “social and health” factor. Diabetes family history, alcohol consumption, and coronary heart diseases (inversely) had significant loadings on factor11, labeled as “alcohol and cardiovascular health” factor.
[IMAGE OMITTED: SEE PDF]
PC2, PC3, PC4, and PC6 can represent metabolic syndrome based on their significant loadings. Altogether, these 11 PCs accounted for 65.09% of the total variance in the original 27 variables. We included T2D as the dependent variable and PCs as the independent variables in the logistic regression model (Tables 4 and 5). We also adjusted the model for sex (Table 5). Both crude and adjusted models showed a positive association of PC1, PC2, PC3, PC4, PC5, and PC6 with the risk of developing T2D. PC7, PC8, and PC10 were negatively associated with the risk of T2D. PC9 and PC11 were not significantly associated with T2D.
[IMAGE OMITTED: SEE PDF]
[IMAGE OMITTED: SEE PDF]
Finally, we conducted separate factor analyses for diabetic and non-diabetic subjects. In diabetic individuals, the PCA identified eleven factors with eigenvalues > 1, explaining 66.02% of the total variance across 27 variables. Similarly, for non-diabetics, eleven factors emerged, accounting for 66.29% of the variance. Factor loadings after varimax rotation are detailed in Supplemental Table S2 and S3. The results were comparable with those from the combined analysis of both groups. In both cases, PC1 was strongly associated with “smoking, alcohol, and drug use” factor, while “chronic diseases”, “lipids “, and “body mass” were key factors in both diabetic and non-diabetic analyses. The “body mass” factor (comprising BMI, waist circumference, and waist- to- hip ratio) was identical in non-diabetic and combined analyses, but in diabetics, waist- to- hip ratio did not reach a significant loading on “body mass” factor and only BMI and waist circumference were significant. Interestingly, the “pregnancy-related risk” factor was consistent in diabetic and combined groups, but absent in the analysis of non-diabetics, highlighting this point that none of the participants in the non-diabetic group had a history of gestational diabetes. Conversely, the “socioeconomic” factor resulted in both non-diabetic and combined analyses.
Discussion
The significance of diabetes risk factors has been demonstrated by a vast number of studies. Because many of these risk factors are strongly intercorrelated, they are not easily demonstrated by conventional statistical methods. Factor analysis, a data reduction technique that allows inclusion of intercorrelating variables in statistical analysis, has been frequently employed to characterize the clustering of diseases risk factors in cohorts of individuals with different characteristics [9,10,11,12,13,14,15,16,17, 31,32,33]. Yet, a few studies have so far undertaken factor analysis to identify the components of T2D [14, 16, 31, and 32]. Furthermore, these studies have mainly investigated a clustering of dietary patterns associated to T2D, or a clustering of risk factors typical of metabolic syndrome that is a strong determinant of T2D. To date, no study has clustered T2D risk factors using factor analysis, particularly incorporating such a comprehensive range of factors, including sociodemographic variables, lifestyle behaviors, anthropometric measures, blood biomarkers, and chronic medical conditions. In the present study, we applied exploratory factor analysis to identify patterns of these risk factors in a large adult Iranian population. To the best of our knowledge, no similar study has been conducted on the Iranian population.
Our results showed that type 2 diabetes (T2D) was significantly more prevalent among Iranian individuals aged over 50 years, those who were illiterate or had very low levels of education, smokers, physically inactive individuals, those who were obese, hypertensive, had a high waist circumference or waist-to-hip ratio, slept for more than 9 h or less than 7 h per day, and those with dyslipidemia, coronary heart disease, thyroid disorders, or elevated blood lipid levels. Correlations among variables of this study showed modest to moderate relationships (0.2–0.5), with a notable exception 0.9 observed between hypertension and dyslipidemia. Additionally, a significant correlation of 0.6 was found between cholesterol and LDL, along with a correlation of 0.5 between smoking and drug use. Furthermore, correlations between sociodemographic variables and health conditions indicate potential health risks.
PCA generated 11 principal components (PCs), which contributed to 65.09% of the total variance across the original 27 variables. Logistic regression analysis revealed significant positive associations between the risk of T2D and PC1 (“drugs, smoking, alcohol use” factor), PC2 (“chronic diseases” factor: Age, hypertension, dyslipidemia, coronary heart disease), PC3 (“lipids” factor: triglycerides, cholesterol, LDL), PC4 (“body mass” factor: BMI, waist circumference, waist- to- hip ratio), PC5 (“Pregnancy- related risks” factor: gestational diabetes, gestational hypertension), PC6 (“glucose/lipid” factor: fasting glucose, triglycerides, HDL (inversely)). In contrast, significant negative associations were observed with PC7 (“socioeconomic” factor: socioeconomic status, education level), PC8 (“age and metabolic” factor: Age (inversely), fatty liver, thyroid, low waist- to- hip ratio), and PC10 (“social and health” factor: marital status, sleep duration, low fasting glucose levels, low age, fatty liver (inversely) ).
Separate factor analyses for diabetic and non-diabetic subjects also identified 11 factors in each group, explaining 66.02% and 66.29% of the variance, respectively. Key factors included “smoking, alcohol, and drug use”, “chronic diseases”, and “lipids”, with similar “body mass” factor in non-diabetic and combined analyses, as well as similar “socioeconomic” and “pregnancy-related risk” factors in both non-diabetic and combined analyses. However, the “pregnancy-related risk” factor was absent in non-diabetics.
Based on the results of combined analysis, four identified components including the “chronic diseases”, “lipids”, “body mass”, and “glucose/lipid” factors (PC2, PC3, PC4, PC6), represent key elements of metabolic syndrome. They connect age, high lipid levels, and obesity to a higher risk of T2D. Consistent to previous studies, our results highlight how metabolic syndrome is related to the risk of T2D in the Iranian cohort [13,14,15, 31,32,33].
Considering the rotated loadings and the multifactorial nature of the variables, we adopted a factor loading cutoff of > 0.20 to retain smaller but meaningful associations crucial for exploring the complex interplay of type 2 diabetes (T2D) risk factors. While higher thresholds like 0.30 or 0.40 are commonly used, a lower cutoff aligns with the exploratory nature of our analysis and study objectives. In health and social sciences, datasets often involve subtle but significant interactions that may be overlooked with higher cutoffs, making a lower threshold suitable for capturing these nuanced relationships [34, 35].
To minimize the risk of introducing noise, we carefully evaluated the factor loadings to ensure that the retained variables were both theoretically grounded and consistent with existing literature on T2D risk. The choice of a 0.20 threshold reflects a balanced approach to identifying all plausible contributing factors, including weaker but meaningful associations, while ensuring comprehensive data representation. This decision aligns with best practices for exploratory factor analysis, which emphasize capturing meaningful patterns without compromising the robustness of findings [35, 36]. Additionally, prior studies have supported the use of thresholds below 0.30 for identifying nuanced relationships, particularly in large datasets or when variables contribute modestly to constructs [36]. By prioritizing inclusivity and interpretive rigor, the analysis is sensitive to the multifactorial nature of T2D risk, where smaller associations can have significant implications in a broader context. This approach ensures a holistic understanding of the variables contributing to T2D risk.
The explained variance of 65.09% for eleven factors (PCs) derived from 27 original variables in our factor analysis is considered satisfactory. Previous research suggests that an explained variance of around 60% or more is generally acceptable, especially in complex datasets where fully explaining variance is challenging [36]. Moreover, studies indicate that as long as the factors (PCs) are meaningful, lower explained variance is acceptable, as interpretability of the factors often takes precedence over maximizing explained variance [36].
Building on this, the 11 principal components accounted for 65.09% of the total variance across the original 27 variables. While this is within acceptable limits for PCA, it is important to consider the 34.91% of variance that remains unexplained. This residual variance likely reflects factors not captured by the variables in this study. For instance, while sociodemographic characteristics, anthropometric measures, laboratory results, and lifestyle behaviors were included, certain unmeasured variables may contribute to the unexplained variance, such as fasting insulin levels, additional chronic diseases, genetic predispositions, dietary patterns, inflammatory markers, and environmental exposures. Furthermore, unmeasured biomarkers and other lifestyle factors, such as sleep quality or stress, could also influence T2D risk but were not analyzed. This highlights the complexity of Type 2 Diabetes risk, which is shaped by multifactorial interactions beyond the scope of this study. Future research could aim to incorporate these unmeasured factors and explore their contribution to T2D risk. Additionally, employing advanced statistical models or expanding datasets to include longitudinal or genetic data could help reduce the unexplained variance and offer deeper insights into the multifactorial nature of T2D.
Previous studies have demonstrated that factor analysis can effectively reveal the clustering of diabetes risk factors [12,13,14,15, 31,32,33]. Despite the subjective nature of factor analysis and variations in variable inclusion, the lipids, body mass, and metabolic glucose/lipid factors (PC3, PC4, and PC6) identified in the present analysis are consistent with those found in previous studies across different populations [13,14,15]. Hanson and coworkers conducted a factor analysis in an American Indian population with a high prevalence of T2D and obesity, and examined the associations of the resulting factors with diabetes incidence [14]. They identified four distinct factors related to metabolic syndrome: (1) insulinemia (comprising fasting insulin × glucose, 2-hour insulin × glucose, fasting insulin/glucose, and 2-hour insulin/glucose); (2) body size (including body weight and waist circumference); (3) blood pressure (systolic and diastolic); and (4) lipidemia (HDL and cholesterol). They found that the insulinemia, body size, and lipid factors were strongly associated with diabetes incidence, while the blood pressure factor was not [14].
In our study, the “body mass” factor and “lipids” factor correspond to the “body size” and “lipidemia” factors identified by Hanson. Similarly, the “body mass” factor and “metabolic glucose/lipids” factor in our analysis correspond to the body mass factor (characterized by BMI and waist circumference) and the metabolic glucose/lipid factor (characterized by fasting glucose, triglycerides, waist-to-hip ratio, and inverse loading of HDL) reported by Lafortuna and colleagues, except that waist-to-hip ratio in Lafortuna’s study did not achieve significant loading on the body mass factor; instead, it loaded onto the metabolic glucose/lipid factor [13].
Further supporting our findings, Smith et al. analyzed a population with a high prevalence of obesity and metabolic disorders and reported that the combined clustering of BMI and lipid profile factors, including high triglycerides and low HDL cholesterol, was strongly associated with an increased risk of T2D [31]. This clustering pattern aligns with our study, where the “body mass” and “lipids” factors were also significantly linked to T2D risk. Their results emphasize that targeting these combined factors, particularly in populations with high obesity rates, could be an effective strategy for early prevention and intervention of T2D.
Additionally, gender differences in the clustering of anthropometric and metabolic factors have been noted in other studies. In the sex-specific factor analysis by Oh JY and colleagues, a distinct factor for body size was not identified. Instead, in men, BMI, waist circumference, and fasting insulin clustered as an obesity factor, whereas in women, BMI, waist circumference, and both systolic and diastolic blood pressure clustered as an obesity-hypertension factor [12]. This is consistent with Zhang et al., who also found gender-specific clustering of obesity and metabolic factors in a Chinese cohort [32]. Their study highlighted that in men, BMI, waist circumference, and fasting insulin were key factors associated with T2D risk, while in women, BMI and waist circumference were more strongly associated with blood pressure, leading to the clustering of obesity and hypertension factors [32]. This finding parallels our study, which observed gender differences in how body mass and metabolic factors influenced T2D risk.
Similarly, Wu et al. conducted a factor analysis in a large cohort of Chinese adults and identified a body mass factor (including BMI and waist circumference) and a separate metabolic disturbances factor (including triglycerides and fasting glucose) [33]. Their findings showed that the body mass factor was strongly associated with T2D risk, particularly in individuals with central obesity [33]. This aligns with our results, where the “body mass” and “lipids” factors were significantly linked to diabetes risk, emphasizing the role of abdominal obesity and metabolic abnormalities as key contributors to T2D in diverse populations.
However, in contrast to both Oh JY and Zhang et al. [12, 32], our analysis demonstrated that all three anthropometric variables (BMI, waist circumference, and waist-to-hip ratio) significantly loaded together on a single separate factor, regardless of gender. The heterogeneity of populations and the choice of variables included in different studies make direct comparisons challenging. Nonetheless, the consistent identification of body mass and metabolic lipid factors across various populations underscores their central role in T2D risk. This suggests that addressing these factors in tailored prevention strategies, taking into account population-specific and gender-specific differences, may offer significant public health benefits.
The logistic regression analysis identified several key factors significantly associated with T2D risk, many of which align with components of metabolic syndrome. Factors such as “chronic diseases” (PC2), “lipids” (PC3), “body mass” (PC4), and “glucose/lipid” (PC6) reflect the established roles of hypertension, dyslipidemia, obesity, and dysregulated glucose/lipid metabolism in T2D risk [37, 38]. Specifically, the “lipids” factor, which includes triglycerides and LDL, highlights the contribution of dyslipidemia to T2D, consistent with studies linking high lipid levels to insulin resistance [39]. Similarly, the “chronic diseases” factor (PC2), including hypertension and coronary heart disease, aligns with previous research showing that comorbidities are strong predictors of T2D [40]. The “body mass” factor (PC4), incorporating BMI and waist circumference, emphasizes the well-established role of obesity in T2D development [41].
In addition, the “socioeconomic” (PC7) and “pregnancy-related risks” (PC5) factors further complicate T2D risk. Lower socioeconomic status and education are linked to higher T2D risk, often due to limited access to healthcare and unhealthy lifestyles [42]. The “pregnancy-related risks” factor, including gestational diabetes and hypertension, underscores the importance of early-life exposures in T2D risk, as women with a history of gestational diabetes are at higher risk of developing T2D later in life [43]. Consistent key factors were identified across both diabetic and non-diabetic groups, with the absence of the “pregnancy-related risks” factor in non-diabetics suggesting its stronger impact in those with T2D, highlighting the need for targeted prevention strategies.
The present study is the first to identify the clustering of T2D risk factors, including health measures, sociodemographic, and lifestyle behaviors. The major points to note in the present study are: (1) Compared to similar studies, we clustered a larger number of explanatory variables; (2) Several underlying independent factors were detected, which give a clustering of T2D risk factors: body mass measurements loaded onto a distinct factor, health-risk behaviors (specifically alcohol use, smoking, and illicit drug use) onto another distinct factor, gestational-related variables onto a separate factor as well, lipid profiles onto another factor, and socioeconomic variables onto their own distinct factor; (3) The Identification of four components related to metabolic syndrome- including “chronic diseases”, “lipids”, “body mass”, and “glucose/lipid” factors- highlights the important role of metabolic problems in increasing the risk of T2D (4). These findings highlight the multifactorial nature of T2D development and support the hypothesis that multiple physiological processes, along with socioeconomic and lifestyle factors, contribute to the development of T2D.
The absence of certain biomarkers, such as insulin levels, introduces potential residual confounding in this study. Insulin, a critical marker in glucose metabolism, could have provided a more comprehensive understanding of the ‘glucose/lipid’ factor (PC6) and other metabolic components, such as those associated with metabolic syndrome (PC2, PC3, and PC4). Its absence limits the ability to fully capture the interplay among metabolic factors critical to T2D development and weakens the interpretation of these factors. Future studies should incorporate insulin measurements to refine the characterization of these components and strengthen the understanding of their role in T2D risk [44, 45]. Additionally, unmeasured variables, such as genetic predispositions, inflammatory markers, or dietary patterns, may influence the clustering of T2D risk factors, potentially impacting components like the “lifestyle” factor (PC9) and the “alcohol and cardiovascular health” factor (PC11). Missing confounders such as lifestyle factors or key biomarkers like insulin may weaken multivariable analyses and obscure significant interrelationships [46, 47]. While the broad representation of variables ensures robust patterns, future studies incorporating insulin and other missing data could improve precision and provide deeper insights into T2D risk pathways.
Although the large sample size strengthens our findings, this study has several limitations. First, participants were drawn from Yazd, a region with a high prevalence of T2D, 16.3% in 2012 and 17.9% in 2020 [18, 19]. This focus allows for the identification of key factors within a population heavily affected by T2D, providing valuable insights into the clustering of risk factors in a high-prevalence setting. However, it may limit the generalizability to other populations, and risk factors prevalent in other areas might be underrepresented. Future research should investigate whether the clustering of risk factors observed here is consistent across different regions or populations, particularly in settings with varying cultural, economic, and health system contexts. Second, we could not control for potential confounding variables, such as genetic or environmental factors, which could influence the associations. Third, missing data for important variables like fasting insulin further limited our analysis and likely impacted the characterization of factors like the “glucose/lipid” factor. Unmeasured variables, such as genetic predispositions, dietary patterns, and inflammatory markers, could also influence the results, leading to residual confounding. Fourth, self-reported data introduces the possibility of reporting and recall bias. For instance, variables like alcohol consumption, physical activity, and sleep duration may have been misreported due to social desirability or memory errors, potentially affecting the robustness of the findings. Finally, the cross-sectional design limits causal inference, as associations may reflect coincidental relationships. Future longitudinal studies or randomized controlled trials (RCTs) incorporating missing biomarkers, such as insulin, could provide stronger evidence, refine factor interpretations, and clarify temporal relationships in T2D development.
Conclusions
Our exploratory factor analysis demonstrates a clustering of T2D risk factors within a large sample of adults from a region with high T2D prevalence. Factor analysis effectively reduced the dimensionality of the data and identified key factors such as lipid profiles, body mass, chronic conditions, and glucose/lipid metabolism, which align with findings from other populations. Additionally, factors related to gestational risks, health-risk behaviors (specifically alcohol use, smoking, and illicit drug use), and socioeconomic status were identified. These findings suggest the interrelationships among physiological and behavioral processes linked to T2D. The consistent identification of body mass and metabolic lipid factors across various populations underscores their central role in T2D risk. To translate these findings into practice, targeted interventions based on the identified clusters could enhance T2D prevention and risk stratification. Public health programs should promote healthy lifestyle behaviors, address socioeconomic and physiological disparities, and implement population- and gender-specific strategies, such as managing gestational risks in women and emphasizing metabolic syndrome management in high-risk groups. Tailored prevention strategies are particularly important for regions like the Middle East, where unique cultural, economic, and health system factors shape T2D risk, as well as for developing countries facing similar public health challenges. Future research should focus on longitudinal studies to explore temporal relationships between these risk factors. Including missing biomarkers such as insulin and other metabolic markers would likely refine our understanding of the pathways underlying T2D risk and guide the development of tailored public health policies.
Data availability
The data that support the findings of this study are available from Shahid Sadoughi University of Medical Sciences but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Deputy of Research and Technology, Shahid Sadoughi University of Medical Sciences, Yazd, Iran on en.ssu.ac.ir/research/.
Abbreviations
T2D:
Type 2 diabetes
BMI:
Body mass index
HDL:
High-density lipoprotein
LDL:
Low-density lipoprotein
PCA:
Principal-component analysis
PC:
Principal component
World Health Organization (WHO). Newsroom/Fact Sheet/Detail/Diabetes. [Updated 5 April 2023]. Available from: https://www.who.int/news-room/fact-sheets/detail/diabetes
International Alliance of Patients’ Organizations. IDF launch of Diabetes Epidemiological Studies Guide. Available from: https://www.iapo.org.uk/events/idf-launch-diabetes-epidemiological-studies-guide
Pradeepa R, Mohan V. Epidemiology of type 2 diabetes in India. Indian J Ophthalmol. 2021;69(11):2932–8. https://doi.org/10.4103/ijo.IJO_1627_21.
National Institute of Diabetes and Digestive and Kidney Diseases. Diabetes overview [Internet]. Available from: https://www.niddk.nih.gov/health-information/diabetes/overview/all-content. Accessed 2020 Dec 4.
Schoenaker DAJ, Dobson AJ, Soedamah-Muthu SS, Mishra GD. Factor analysis is more appropriate to identify overall dietary patterns associated with diabetes when compared with treelet transform analysis. J Nutr. 2013;143(3):392–8. https://doi.org/10.3945/jn.112.169011.
Van Dam RM, Rimm EB, Willett WC, Stampfer MJ, Hu FB. Dietary patterns and risk for type 2 diabetes mellitus in U.S. men. Ann Intern Med. 2002;136(3):201–9.
Williams DE, Prevost AT, Whichelow MJ, Cox BD, Day NE, Wareham NJ. A cross-sectional study of dietary patterns with glucose intolerance and other features of the metabolic syndrome. Br J Nutr. 2000;83:257–66. [PMID: 10884714].
Pladevall M, Singal B, Williams LK, Brotons C, Guyer H, Sadurni J, et al. A single factor underlies the metabolic syndrome: a confirmatory factor analysis. Diabetes Care. 2006;29:113–22.
Wang JJ, Qiao Q, Miettinen ME, Lappalainen J, Hu G, Tuomilehto J. The metabolic syndrome defined by factor analysis and incident type 2 diabetes in a Chinese population with high postprandial glucose. Diabetes Care. 2004;27:2429–37.
Lakka HM, Laaksonen DE, Lakka TA, Niskanen LK, Kumpusalo E, Tuomilehto J, et al. The metabolic syndrome and total and cardiovascular disease mortality in middle-aged men. JAMA. 2002;288:2709–16.
Lempiainen P, Mykkanen L, Pyorala K, Laakso M, Kuusisto J. Insulin resistance syndrome predicts coronary heart disease events in elderly nondiabetic men. Circulation. 1999;100:123–8.
Oh JY, Hong YS, Sung YA, Barrett-Connor E. Prevalence and factor analysis of metabolic syndrome in an urban Korean population. Diabetes Care. 2004;27(8):2027–32. https://doi.org/10.2337/diacare.27.8.2027. [PMID: 15277435].
Lafortuna CL, Adorni F, Agosti F, Sartorio A. Factor analysis of metabolic syndrome components in obese women. Nutr Metab Cardiovasc Dis. 2008;18(3):233–41. [PMID: 17600693].
Hanson RL, Imperatore G, Bennett PH, Knowler WC. Components of the metabolic syndrome and incidence of type 2 diabetes. Diabetes. 2002;51(10):3120–7. https://doi.org/10.2337/diabetes.51.10.3120. [PMID: 12351457].
Ghosh A. Factor analysis of metabolic syndrome among the middle-aged Bengalee Hindu men of Calcutta, India. Diabetes Metab Res Rev 2005 Jan-Feb; 21(1):58–64. https://doi.org/10.1002/dmrr.481. [PMID: 15386818].
Hanley AJG, Festa A, D’Agostino RB, Wagenknecht LE, Savage PJ, Tracy RP, Saad MF, Haffner SM. Metabolic and inflammation variable clusters and prediction of type 2 diabetes: factor analysis using directly measured insulin sensitivity. Diabetes. 2004;53(7):1773–81. https://doi.org/10.2337/diabetes.53.7.1773.
Mannucci E, Monami M, Rotella CM. How many components for the metabolic syndrome? Results of exploratory factor analysis in the FIBAR study. Nutr Metab Cardiovasc Dis. 2007;17(10):719–26. https://doi.org/10.1016/j.numecd.2006.09.003. Epub 2007 Mar 26. [PMID: 17387006].
Lotfi MH, Saadati H, Afzali M. Prevalence of diabetes in people aged ≥ 30 years: the results of screen-ing program of Yazd Province, Iran, in 2012. J Res Health Sci. 2013;14(1):88–92.
Dehghani A, Fattahi MR, Zarehmand H, Karami R, Erfani M, Rezaei A, et al. Prevalence of diabetes and its correlates among Iranian adults: results of the first phase of Shahedieh cohort study. Health Sci Rep. 2023;6(4):e1170.
Digestive Diseases Research Institute. Home/ Researches/ Persian Cohort Study [Internet]. Available from: https://enddri.tums.ac.ir/Persian-Cohort-Study
Kline RB. Principles and practice of structural equation modeling. 4th ed. New York: Guilford Press; 2016.
Craig CL, Marshall AL, Sjorström M, Bauman AE, Booth ML, Ainsworth BE, et al. International Physical Activity Questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003;35(8):1381–95. https://doi.org/10.1249/01.MSS.0000078924.61453.FB.
Centers for Disease Control and Prevention (CDC). Smoking & Tobacco Use [Internet]. 2020 [cited 2024 Dec 15]. Available from: https://www.cdc.gov/tobacco/basic_information/index.htm
National Institute on Alcohol Abuse and Alcoholism (NIAAA). Drinking Levels Defined [Internet]. 2020 [cited 2024 Dec 15]. Available from: https://www.niaaa.nih.gov/alcohol-health/overview-alcohol-consumption/understanding-alcohols-effects
World Health Organization (WHO). Waist circumference and waist-hip ratio: report of a WHO expert consultation, Geneva, 8–11 December 2008. Updated 16 May 2011 [Internet]. Available from: https://www.who.int/publications/i/item/9789241501491
World Health Organization (WHO). Lipids and cardiovascular risk [Internet]. Available from: https://www.who.int/cardiovascular_diseases/en/
American Heart Association (AHA). Cholesterol Levels: What You Need to Know [Internet]. Available from: https://www.heart.org/en/health-topics/cholesterol
World Health Organization (WHO). Definition and diagnosis of diabetes mellitus and intermediate hyperglycemia: Report of a WHO/IDF consultation. 2006 [Internet]. Available from: https://www.who.int/diabetes/publications/en/
Kaiser HF. An index of factorial simplicity. Psychometrika. 1974;39(1):31–6.
Field A. Discovering statistics using IBM SPSS statistics. 4th ed. London: Sage; 2013.
Smith J, et al. Clustering of body mass and lipid factors in obesity and diabetes risk. J Diabetes Res. 2020;45(3):156–63.
Zhang L, et al. Gender-specific clustering of obesity and metabolic factors in relation to type 2 diabetes: a population-based study in China. Diabet Med. 2021;38(4):508–16.
Wu Y, Guo Y, Li X. The global burden of type 2 diabetes attributable to high body mass index in 204 countries and territories, 1990–2019: an analysis of the global burden of Disease Study. Lancet Diabetes Endocrinol. 2022;10(7):455–64. https://doi.org/10.1016/S2213-8587(22)00124-7.
Peterson RA. A meta-analysis of variance accounted for and factor loadings in exploratory factor analysis. Mark Lett. 2000;11(3):261–75.
Costello AB, Osborne JW. Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. Pract Assess Res Eval. 2005;10(1):7.
Field A. Discovering statistics using IBM SPSS statistics. Sage; 2013.
Alberti KGMM, Eckel RH, Grundy SM, Zimmet PZ, Cleeman JI, Donato KA, et al. Harmonizing the metabolic syndrome: a joint interim statement of the International Diabetes Federation Task Force on Epidemiology and Prevention; National Heart, Lung, and Blood Institute; American Heart Association; World Heart Federation; International Atherosclerosis Society; and International Association for the Study of Obesity. Circulation. 2009;120(16):1640–5.
Sattar N, Preiss D, Murray HM, Welsh P, Buckley BM, de Craen AJ, et al. Statins and risk of incident diabetes: a collaborative meta-analysis of randomised statin trials. Lancet. 2010;375(9716):735–42.
Eckel RH, Grundy SM, Zimmet PZ. The metabolic syndrome. Lancet. 2005;365(9468):1415–28.
Haffner SM, Lehto S, Rönnemaa T, Pyörälä K, Laakso M. Mortality from coronary heart disease in subjects with type 2 diabetes and in nondiabetic subjects with and without prior myocardial infarction. N Engl J Med. 1998;339(4):229–34.
Gurnani M, Birken C, Hamilton J. Childhood obesity: causes, consequences, and management. Pediatr Clin North Am. 2015;62(4):821–40.
Dahl E, Jørgensen T, Fossen M, Stigum H, Langhammer A. Socioeconomic factors and health in a general population: a 10-year follow-up of the HUNT study. Scand J Public Health. 2006;34(5):478–86.
Jenum AK, Lie SA, Midthjell K, Sletner L, Grill V, Holmen J, et al. Ethnic differences in the incidence of type 2 diabetes: the Norwegian HUNT study. Diabetologia. 2012;55(2):1209–17.
Kahn SE, Prigeon RL, McCulloch DK, Boyko EJ, Ziegler MG, Polonsky KS, et al. The relative contributions of insulin secretion and insulin sensitivity to the pathogenesis of type 2 diabetes. Diabetologia. 2014;57(6):1173–82.
Falkner B. The importance of blood pressure measurement in childhood obesity. Pediatr Nephrol. 2008;23(4):593–8.
Strimbu K, Tavel JA. What are biomarkers? Curr Opin HIV AIDS. 2010;5(6):463–6.
Hu FB, Manson JE, Stampfer MJ, Colditz GA, Liu S, Solomon CG, et al. Diet, lifestyle, and the risk of type 2 diabetes mellitus in women. N Engl J Med. 2001;345(11):790–7.
World Medical Association. Declaration of Helsinki– Ethical Principles for Medical Research Involving Human Subjects [Internet]. Ferney-Voltaire: WMA. 2024 Dec 13. Available from: http://www.wma.net/en/30publications/10policies/b3/index.html
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.