Introduction
The Balance Evaluation Systems Test (BESTest) and two abbreviated versions, Mini-BESTest and Brief-BESTest are used to assess functioning of balance control systems [1]. Balance control is quite complex and results from a set of interacting systems [2–7]. Six underlying balance systems contribute to balance control using a systems model of motor control as the theoretical framework [1]: biomechanical constraint, stability limits/verticality, anticipatory postural adjustments, postural responses, sensory orientation and gait stability. An impairment in one or more of these systems leads to postural instability or balance problems.
Balance impairments or problems can be present in patients with a medical condition such as stroke, Parkinson´s disease, multiple sclerosis, spinal cord injury, cervical spondylotic myelopathy, myotonic dystrophy type 1, spinocerebellar ataxia, femoral or vertebral fracture, type 2 diabetes, total knee arthroplasty, cancer, end-stage disease, or chronic obstructive pulmonary disease, as well as in older adults, people with increased risk of falling and school-aged children. Impairments or deficits in balance control lead to limitations in daily life activities, reduced ambulatory capacity, limitation in social participation, affect life quality, and increased risk of falls [8–11].
These scales are applied manually to determine whether the patient has balance problems and assess their cause, unlike other outcome measures, which only reveal the existence of an equilibrium problem such as One Leg Stand, Functional Reach Test and Timed Up and Go [12]. The BESTest, development by Horak et al. [1], contains 36 items to assess balance impairments in 6 categories or systems previously indicated. Each item was scored on a 0-to-3-point scale, with a higher score indicating better balance. Its administration takes a considerable amount of time (20–30 minutes), which may not be feasible and practical for routine clinical use. Thus, two abbreviated versions of the BESTest take approximately half of the time to be administrated (10–15 minutes) for Mini-BESTest and 7–10 minutes for Brief-BESTest [13]. The Mini-BESTest developed by Franchignoni et al. [14] consists of 14 items from 4 of 6 sections from the BESTest (sections III, IV, V and VI) but does not include the biomechanical constraints and stability limits from the six sections of the BESTest. Each item is scored on a 3 level from 0 to 2 (total score equals 28 points) [15]. The Mini-BESTest´s lack of items assessing mechanical constraints or limits of stability could inhibit its sensitivity when applied to people with musculoskeletal impairments or impaired limits of stability [16]. Brief-BESTest, developed by Padget et al. [16] assesses all sections of the BESTest using the most representative item of each section [15]. Of these three scales, that most used in observational and experimental studies is the Mini-BESTest, followed by the BESTest and the Brief-BESTest, according to the search conducted in different databases.
From their original validation in the USA, the BESTest, Mini-BESTest and Brief-BESTest have been used in many cultures and countries, such as Sweden [17,18], Thailand [19–23], Brazil [24–26], Portugal [27–28], Iran [29,30], Canada [31], Belgium [32], Greece [33], Japan [34], Norway [35], Slovenia [36], Croatia [36], Turkey [37–39], Germany [40], China [41,42], Spain [43], Saudi Arabia [44,45] and Italy [18,46–49].
In order to be efficient, a measurement tool must have good psychometric properties like reliability, measurement error verified by the Standard Error of Measurement (SEM) and/or Bland-Altman plot, validity and responsiveness. This study focused on the internal consistency and inter and intra-rater reliability (test-retest) of the BESTest, Mini-BESTest and Brief-BESTest. Reliability and internal consistency are not inherent test properties and may vary each time it is applied to a different sample of participants [50,51]. Whenever a study makes use of a scale, authors should report a reliability estimate with data available [52]. However, in experimental study’s authors often do not report reliability estimates based on their own participants’ scores, rather it is common to find references to reliability obtained in the original validation study of the test. Checking reliability of test application scores is of paramount importance in ensuring that the measurement itself is reliable and because reliability affects effect sizes. If test scores are less reliable, the effect size on these instruments can be attenuated [53]. In short, if the scale does not produce reliable scores, diagnosis might be inaccurate and effectiveness of treatments to improve or maintain balance cannot be adequately tested.
Nevertheless, a representative reliability value of an instrument can be obtained by integrating the various reliability estimates obtained in studies using meta-analytic methods. This is often referred to as reliability generalization (RG) [54]. Additionally, if heterogeneity exists between reliability estimates based on the same test, an RG meta-analysis enables us to examine whether some study characteristics (i.e., moderators) could explain the variability of reliability coefficients [55,56]. Examples of study characteristics which may affect reliability are mean and variability of test scores, target population, or whether the original version or an adaptation (cultures or countries) of the test has been used.
Currently, no meta-analysis has been performed to generalize the reliability of the BESTest, MiniBESTes and Brief-BESTest. The objectives of this RG study are to (i) estimate an average internal consistency and inter and intra-rater reliability for the BESTest, MiniBESTest and Brief-BEStest, and (ii) assess whether there exists large heterogeneity between reliability estimates for the same instrument and, if so, perform moderator analyses to identify study characteristics which account for such variability.
Methods
We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to guide the reporting of the current review [57]. The review protocol was registered at the International Prospective Register of Systematic Reviews (PROSPERO: CRD42024540512).
Identification and selection of studies
The identification and selection of studies to conduct this reliability generalization study was carried out according to five criteria: a) empirical studies (observational and experimental), b) the sample is from patients with a clinical disorder or a normal population, c) studies had to report at least an alpha coefficient to assess internal consistency and/or an intraclass correlation coefficient to assess inter-rater and/or intra-rater/test-retest reliability, d) must be published before 10 February 2024, and e) must be published in English or Spanish. Thesis or dissertations, conference abstracts, letters to editors, study protocols, guidelines, case reports, narrative review, systematic review, meta-analysis, book chapter, qualitative study and consensus-based recommendations were excluded.
To locate studies, the following electronic databases were consulted: PubMed, Embase, PsycINFO, Web of Science, Scopus and CINAHL. Forward and backward citation tracing was used, and reference lists of studies were manually checked for additional studies. Supplementary S1 Table summarizes search strategies for all databases.
After the bibliographic search phase, in the first screening, duplicated articles were removed. After that, retrieved articles were filtered based on title and abstract. All titles and abstracts were independently screened by two blinded reviewers (ABMH, JJLG) and full-text of the potential relevant articles were analyzed in-depth to examine their eligibility. If an eligible article assessed different population samples, each sample was considered as separate sample. Disagreements were resolved by consensus, with a third assessor (JALP) consulted if necessary.
Assessment of study characteristics
Substantive and methodological characteristics were extracted with a view to examining the influence of moderating variables on reliability estimates [57]. For BESTest, Mini-BESTest and Brief-BESTest, the following methodological characteristics were coded: scale version (original vs adapted), design type (observational vs experimental), study approach (psychometric vs applied), sample size, experience of raters (yes vs no, mean in years), interrater interval (in days), number of raters, sample size for interrater agreement, intra-rater interval (in days), number of raters, and sample size for intra-rater agreement. In the case of the Mini-BESTest, the maximum scale score (28 vs 32) was included according to two possible lengths (14 and 16 items). In addition, the following substantive variables were coded: age of sample (mean and standard deviation), reference population (adults 18–65 years, adults over 65 years, children and adolescents), country and continent where study was conducted, gender distribution (%female), target population (clinical, normal non-institutionalized population, normal institutionalized population), disease type, disease history (mean and standard deviation in years), experience of raters (physical therapist, medical doctor, other), and year of study.
Data extraction
To assess reliability of data extraction, two assessors independently (ABMH, JALP) coded characteristics from all studies containing information from BESTest, Mini-BESTest and Brief-BESTest. If a study contained more than one sample with relevant information on reliability, separate coding was performed for each sample. Cohen’s kappa coefficients were calculated for inter-rater agreement of the categorical moderator variables, while intraclass correlations were calculated for the continuous moderator variables. Cohen’s kappa coefficient ranged from 0.883 to 1, while the intraclass correlation for continuous variables ranged from 0.569 to 1. Inconsistencies among raters were resolved by consensus.
Reliability coefficients were a source of heterogeneity as one or more alpha coefficient and/or inter-rater and/or intra-rater agreement could appear in articles. Table 1 shows number of studies, number of samples, and sample size for BESTest, Mini-BESTest y Brief-BESTest.
[Figure omitted. See PDF.]
Since these coefficients are based on different assumptions, a reliability generalization meta-analysis has been separately performed for each coefficient in each of the three versions of the BESTest.
Evaluating the methodological quality of studies.
The quality of each study on a measurement property was independently assessed by two reviewers (ABMH and JALP) with the updated COSMIN (Consensus-based Standards for the selection of health Measurement Instruments) Risk of Bias checklist [58] regarding the 3 domains of measurement properties: reliability, validity and responsiveness. Each study was rated as follows: very good, adequate, doubtful or inadequate according to each specific item description. Methodological quality was rated with the lowest category obtained in the study. Specifically, internal consistency and inter- and intra-rater reliability have been assessed in each study included in this study because the main objective is to perform a meta-analysis of reliability generalisation.
In addition, the result of each study on a measurement property was rated against the updated criteria for good measurement properties using three values: sufficient (+), insufficient (−), or indeterminate (?). The details of how to score the quality of each study on a psychometric property and the result of each study on a psychometric property are fully described in the COSMIN guideline [58].
Reliability estimates
Prior to meta-analysis, reliability coefficients were transformed to normalize their distributions and stabilize their variances. The alpha coefficient was transformed with the formula proposed by Bonnet [59], , with log being the natural logarithm. The intraclass correlation coefficients to evaluate inter and intra-rater agreement were transformed with Fisher’s Z: .
Statistical analysis
Meta-analyses were conducted for internal consistency, inter or intra-rater reliability in the BESTest and in its two abbreviated versions. In all cases, a random-effects model was used and the confidential limits of 95% were calculated around the reliability coefficient with the improved method proposed by Hartung [60,61]. Between-study variance was estimated by restricted maximum likelihood [62].
To investigate heterogeneity of reliability coefficients in each meta-analysis, the Q statistic and the I2 index were calculated and a forest plot was created. If studies exhibited heterogeneity, a moderator analysis was then performed to identify study characteristics explaining why. Weighted ANOVA and simple meta-regression assuming a mixed-effects model were conducted for qualitative and quantitative moderators, respectively. A mixed-effects model was assumed using the improved method proposed by Knapp and Hartung to test the significance of moderating variables [63]. The proportion of variance explained for each moderating variable was estimated using the R2 index [64,65]. Statistical analysis was conducted with the metafor package in R [66].
To facilitate interpretation of results of each meta-analysis, the average reliability coefficients obtained with the Bonnet and Fisher Z transformations were back-transformed to the original metric of alpha coefficient and intraclass correlation, respectively. To determine whether publication bias could be a threat to validity of analytical results, funnel plots with the trim-and-fill imputation method of Duval and Tweedie [67] as well as the Egger test were applied [68,69].
Results
Study selection
Fig 1 presents the PRISMA flow diagram describing the study selection process which met selection criteria. Overall, the search strategies identified a total of 875 articles. Following the removal of duplicates, 468 records screening of title/abstracts and 376 were excluded because they were narrative reviews, systematic reviews, scoping review, letters to editor, conference abstracts, study protocols, guidelines, studies not written in English and thesis or dissertations. In total, 92 full-text articles assessed for eligibility, of which 29 were excluded and a total of 63 [1,13,15,17–45,47–50,70–95] studies were eligible for the quantative analysis and were included in the current systematic review (Supplementary S2 Table). Supplementary S3 Table contains the completed PRISMA checklist.
[Figure omitted. See PDF.]
Descriptive characteristics of selected studies
Supplementary S4 Table presents the characteristics of 73 samples in 63 included studies. Some had coefficients such as internal consistency, inter and intra-rater (or test-retest) reliability [20,33,41,42,45,49,75,82,89,91], internal consistency and intra-rater (or test-retest) reliability [44,71], internal consistency and inter-rater reliability [32,43,76–78,87] or intra and inter-rater reliability [13,15,22,24,25,27,31,35,37,39,40,69,72–74,80,81,85,92,95] whereas others reported a single reliability coefficient such as internal consistency [17,18,36,46–48,83,88,90], intra-rater reliability (or test-retest) [19,21,23,26,28,30,34,38,70,84,86,93,94] or inter-rater reliability [1,29,79].
The sample size ranged from 10 to 709 [18,32]. Samples contained participants with or without illnesses. Some studies contained samples of participants with no pathology [13,19,25,26,40,43,45,72,86], while others included samples with pathology [1,15,17,18,20–24,26–31,33–35,37–39,41,42,44,47–49,69–71,73–78,80–87,89–95]. Most studies had samples from persons with a single illness such as stroke [22,24,29,33,37,38,77,82,85,88,91], multiple sclerosis [70,71,76,79,84] Parkinson’s disease [15,17,18,26,47,73–75,81,87], intellectual disability [30], chronic pain [23,83], spinal deformity [32], spinocerebellar ataxia [34], type 2 diabetes with peripheral neuropathy [20,21], spinal cord injury [80,90,93], total knee arthroplasty [41], cervical spondylotic myelopathy [42], cancer [69], chronic obstructive pulmonary disease [27,92], end-stage renal disease [28]. The most common pathology was stroke. Few studies included participants with different pathologies in their samples [1,31,35,39,44,46,48,49,78,89,95].
The methodological quality of studies.
Four, twenty-four and six studies (seven samples) evaluated the internal consistency of the BESTest [32,41,43,71], Mini-BESTest [17,20,33,36,38,41,43–48,75–78,82,83,87–91,95] and Brief-BESTest [41,42,48,49,78,91], respectively. All studies were of very good and sufficient quality. Only one had inadequate and sufficient quality for Mini-BESTest [17] (Supplementary S5 Table).
Regarding the BESTest, eighteen and twenty-two studies (nineteen and twenty-four samples) assessed inter-rater[1,13,22,24,25,27,30,32,35,40–43,69,72–74,78] and intra-rater/test-retest reliability [13,19,22–28,34,35,40–42,69,70–74,86,94], respectively. All studies were of doubtful and sufficient quality. Only one was of very good and sufficient quality [35] (Supplementary S5 Table).
Thirty and thirty-one studies assessed the inter-rater [13,15,20,25,27,29,31,33,35,39–43,45,69,71,73,75–82,85,87,89,95] and intra-rater/test-retest reliability [13,15,19–21,25–28,31,33–35,38–40,42,44,45,69,73,75,80–82,84–86,89,93,95] of the Mini-BESTest, respectively. Most studies were of doubtful and sufficient quality. Three studies had adequate inter-rater and intra-rater/test-retest reliability [29,33,80] and 4 studies very good for both types of reliability [35,76,81,95]. Only one study had inadequate inter-rater and intra-rater/test-retest reliability [89].
Finally, regarding the Brief-BESTest, thirteen and fourteen studies assessed inter-rater [13,15,25,27,37,41,42,49,69,78,85,91,92] and intra-rater/test-retest [13,15,25,27,28,34,37,41,42,49,69,85,91,92] quality, respectively. All studies were of doubtful and sufficient quality.
Mean reliability and heterogeneity
Reliability studies using the BESTest scale and its abbreviated versions Mini-BESTest and Brief-BESTest must collect one or more reliability coefficients (inter rater agreement or intra-rater) or an internal consistency coefficient (alpha). Separate meta-analyses were performed for each reliability coefficient and internal consistency in each of the three scales. Alpha coefficients were reported in only four studies for BESTest [32,41,43,71]. This low number of reported coefficients did not allow meta-analysis to be carried out. Thus, a total of 8 meta-analyses were conducted.
Table 2 presents results of each of the eight meta-analyses performed. Regarding the BESTest scale, nineteen samples reported a mean interrater ICC of 0.97 (95% CI = 0.94–0.98), with wide heterogeneity (90.69%) [1,13,22,24,25,27,30,32,35,40–43,69,72–74,78]. Fig 2 presents a forest plot of these coefficients. The 24 samples that reported an intra-rater ICC (Fig 3, forest plot) showed a mean ICC of 0.94 (95%CI: 0.91–0.96) with heterogeneity of 89.70% [13,19,22–28,34,35,40–42,69,70–74,86,94]. Twenty-four samples reported an alpha coefficient of internal consistency for the Mini-BESTest [17,20,33,36,38,41,43–48,75–78,82,83,87–91,95] (Fig 4; forest plot). This meta-analysis reported a mean alpha coefficient of 0.91 (95%CI: 0.89-0.94) with heterogeneity of 94.42%. As for inter-rater agreement, 30 reported a CCI; the mean ICC in meta-analysis was 0.95 (95%CI: 0.92–0.97) with heterogeneity of 94.67% [13,15,20,25,27,29,31,33,35,39–43,45,69,71,73,75–82,85,87,89,95] (Fig 5; forest plot). For intra-rater agreement, reported by 33 samples, in the Mini-BESTest, the mean ICC was 0.94 (95%CI: 0.91-0.96) with heterogeneity of 3.93% [13,15,19–21,25–28,31,33–35,38–40,42,44,45,69,73,75,80–82,84–86,89,93,95] (Fig 6; forest plot). Finally, on the Brief-BESTest scale, 7 samples reported an alpha coefficient, whose mean was 0.92 (95%CI: 0.85-0.95) with heterogeneity of 92.93% [41,42,48,49,78,91] (Fig 7; forest plot); mean ICC, in 13 samples, for inter-rater agreement was 0.97 (95%CI: 0.94–0.98) and heterogeneity 90.21% [13,15,25,27,37,41,42,49,69,78,85,91,92] (Fig 8; forest plot), while the mean ICC, in 14 samples, for intra-rater agreement was 0.95 (95%CI: 0.90–0.98) with heterogeneity of 93.97% [13,15,25,27,28,34,37,41,42,49,69,85,91,92] (Fig 9; forest plot).
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Moderator analyses
The eight meta-analyses found sufficient heterogeneity of ICC and alpha coefficients which led to a moderator analysis to partly explain heterogeneity of reliability estimates.
BESTest scale.
Table 3 presents the results of simple regression meta-analyses for continuous moderators on intraclass correlation (inter-rater agreement) on the BESTest scale. In this case, only the mean scores exhibited statistical significance with intraclass correlation (p = 0.014, R2 = 45.83). The negative sign of the regression slope for mean scores indicated a decrease in intraclass correlation as the sample mean increased.
[Figure omitted. See PDF.]
ANOVA weighted with respect to qualitative variables are shown in Table 4. Significant differences between intraclass correlations were found depending on the continent the study proceeded from (p = 0.016, R2 = 51.06) with 51.06% of explained variance. Thus, the lowest inter-rater ICC (ICC = 0.87, n = 1) was obtained in Australia and the highest (on average) in Asia (ICC = 0.988, n = 4).
[Figure omitted. See PDF.]
Meta-analyses for continuous moderators of intra-rater ICCs are shown in Table 5. Only the mean of disorder history (years) obtained marginal statistical significance in the intra-class intra-rater correlation (p = 0.076, R2 = 79.28). Mean of disorder history could also be considered marginally significant, but the number of studies is very small.
[Figure omitted. See PDF.]
Weighted ANOVA for the categorical moderating variables on the ICCs (intra-rater) of the BESTest scale are shown in Table 6. The continent the study proceeded from obtained marginal statistical significance in the intra-class intra-rater correlation (p = 0.064, R2 = 27.45) with 27.45% of explained variance. Thus, the lowest intra-rater ICC (ICC = 0.872, n = 5) was obtained in Europe and the highest (on average) in Asia (ICC = 0.969, n = 5).
[Figure omitted. See PDF.]
Mini-BESTest scale.
The results of applying simple meta-regressions to the continuous moderating variables to the alpha coefficient in the Mini-BESTest are shown in Table 7. Standard deviation of scores was marginally significant (p = 0.073; R2 = 15.29) with a positive regression weight indicating that an increase in standard deviation of the sample means an increase in the alpha coefficient. The gender moderator was significant (p = 0.042; R2 = 14.94%) a negative weight signifying an increase in the alpha coefficient when the number of women decreased.
[Figure omitted. See PDF.]
Weighted ANOVA for categorical variables on internal consistency (alpha coefficient) on the Mini-BESTest scale are shown in Table 8. The moderator of disease was marginally significant (p = 0.088; R2 = 33.24) with total knee arthroplasty than other diseases. The lowest coefficient was obtained in patients with type 2 diabetes.
[Figure omitted. See PDF.]
Regarding ICCs (inter-rater agreement), simple meta-regressions for continuous moderators are shown in Table 9. In this case, the raters’ experience variable was significant (p = 0.019; R2 = 32.04) with a negative regression weight, indicating that an increase in the experience of evaluators led to a decrease in inter-rater ICC. The rest of variables obtained no significant results.
[Figure omitted. See PDF.]
The weighted ANOVAs for the categorical variables on the Mini-BESTest scale for the ICC (inter-rater agreement) are shown in Table 10. The moderator of population type was significant (p = 0.013; R2 = 28.65) with the normal institutionalized population indicating a higher mean reliability (ICC+ = 0.992) than the mixed population (ICC+ = 0.982) or clinical population (ICC+ = 0.959). The lowest coefficient was obtained in the normal, non-institutionalized population (ICC+ = 0.79).
[Figure omitted. See PDF.]
The simple regression meta-analyses of the continuous moderating variables of the Mini-BESTest for the ICC (intra-rater agreement) are shown in Table 11. In this case, the mean history of the disorder was significant (p = 0.024, R2 = 35.51) with a negative weight, indicating an increase in number of years with the disorder suffered by patients implied a decrease in intra-rater agreement.
[Figure omitted. See PDF.]
The weighted ANOVA for categorical variables in the Mini-BESTest for ICC (intra-rater agreement) is shown in Table 12. No moderating variables were significant in this case.
[Figure omitted. See PDF.]
Brief-BESTest scale.
The simple regression meta-analyses of the continuous moderating variables for the alpha coefficient on the Brief-BESTest scale are shown in Table 13. In this case, only the mean age variables were marginally significant (p = 0.094; R2 = 39.2), with a positive regression weight, indicating an increase in the mean age of the sample led to an increase in the alpha coefficient. The rest of the moderators were not significant.
[Figure omitted. See PDF.]
The weighted ANOVA of the categorical variables for the alpha coefficient on the Brief-BESTest scale is shown in Table 14. No categorical moderator was significant in explaining variation in the alpha coefficient.
[Figure omitted. See PDF.]
The simple meta-regressions of the continuous variables for ICCs (inter-rater agreement) in the Brief-BESTest are shown in Table 15. Mean scores were again significant (p = 0.005; R2 = 67.13) with a negative weight, indicating an increase in the group mean led to a decrease in inter-rater ICC. The rest of the continuous moderators were not significant.
[Figure omitted. See PDF.]
The weighted ANOVA of the categorical moderators in the ICC (inter-rater agreement) for the Brief-BESTest is shown in Table 16. Only continent the study proceeded from was marginally significant (p = 0.092; R2 = 50.38) with Australia showing lowest interrater agreement (ICC+=0.860) and with South America showing highest interrater agreement (ICC+=0.993).
[Figure omitted. See PDF.]
The simple meta-regressions of the continuous variables for the ICC (intra-rater agreement) of the Brief-BESTest are shown in Table 17. In this case, the number of raters was significant (p = 0.009; R2 = 39.84) with a positive regression weight (0.435), indicating an increase in the number of raters led to an increase in intra-rater ICC. The mean age of the sample was also significant (p = 0.032; R2 = 32.79) with a negative weight, indicating an increase in the mean age led to a decrease in intra-rater ICC.
[Figure omitted. See PDF.]
The weighted ANOVA of the categorical variables for the ICC (intra-rater agreement) in the Brief-BESTest is shown in Table 18. Design type was significant (p = 0.028; R2 = 28.17), with experimental studies showing a higher mean reliability (ICC+ = 0.991) than observational studies (ICC+ = 0.932). Rater formation was also significant (p = 0.051; R2 = 37.89) where the combination of raters with completed and unfinished physiotherapy studies obtained higher intra-rater agreement (ICC+ = 0.998) than physiotherapists with completed studies (ICC+ = 0.925) or only physiotherapists in training (ICC+ = 0.960).
[Figure omitted. See PDF.]
Analysis of publication bias
The results of Egger’s test to examine publication bias in the eight meta-analyses in this study are shown in Table 19.
[Figure omitted. See PDF.]
The absence of significance in Egger’s test rules out publication bias. In addition, the funnel plot is presented and the trim and fill method for imputing missing data [67] was applied. Figs 10,11,12,13,14,15,16,17 present funnel-plots of the mean reliability coefficients in the eight meta-analyses carried out with the BESTest, the Mini-BESTest and the Brief-BESTest, respectively. In no case is it observed that the trim and fill method has imputed data, thus publication bias is ruled out as a threat against results of meta-analyses.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Discussion
We performed RG meta-analysis to determine how reliability of test scores varies in different test applications and which factors can explain that variability. This investigation is the first meta-analysis on the inter- and intra-rater (test-retest) reliability and internal consistency of the BESTest, Mini-BESTest and Brief-BESTest. This research is important as clinicians and researchers, to guide decision making, need outcome measures capable of accurately assessing balance control in patients with neurological pathology, those with musculoskeletal problems, older adults and children without pathology, and patients with other pathologies.
Regarding Intraclass Correlation Coefficient (ICC) for reliability, Roach and Toomey and Coote [96,97] showed ICC values over 0.75 as excellent, 0.40–0.75 as moderate and below 0.4 as poor reliability, and Munro et al. [98] proposed interpreting the clinical significance of ICC following this guide (acceptable alpha above 0.7, at values between 0.7–0.8 as good and at values above 0.8 as excellent). The mean intraclass correlations and Cronbach alpha obtained for BESTest, Mini-BESTest and Brief-BESTest in our meta-analysis exhibited excellent inter and intra-rater reliability (ICC = 0.94–0.97) and internal consistency (alpha = 0.92). Considering the guidelines of Munro et al. [98], the average reliability obtained in this study make three scales adequate to be applied to different populations for screening balance problems in different populations.
The methodological quality of most of the included studies of the three scales was very good and sufficient for internal consistency and doubtful and sufficient for inter-rater and intra-rater/test-retest reliability. Most obtained doubtful methodological quality as they did not indicate whether patients were stable or if test conditions were similar. Studies should provide any evidence that patients were stable to increase the methodological quality of studies for inter-rater and intra-rater/test-retest. Another aspect to consider when assessing the test-retest or intra-rater reliability of the test is an adequate time interval between both test administrations. This should be short enough to avoid significant changes in the patient’s condition and long enough to avoid recall bias.
Large heterogeneity among coefficients was found for BESTest, Mini-BESTest and Brief-BESTest, therefore we performed moderator analyses to identify which study characteristics could explain this variability. For continuous moderators, we found that mean scores were statistically associated with inter-rater reliability, and that mean of disorder history had marginal statistical significance for the intra-rater reliability of BESTest. As the mean of the scale scores increases, interrater reliability decreases. It seems that the higher the score on the BESTest scale, the lower the inter-rater reliability.
As regards the Mini-BESTest, the raters´ experience was statistically associated with inter-rater reliability. Thus, as the experience of raters increases, inter-rater reliability decreases. This may be because less experienced raters are more meticulous and rigorous in applying and evaluating the scales. Furthermore, the mean history of the disorder was significant for intra-rater reliability, indicating an increase in number of years with the disorder suffered by patients implied a decrease in intra-rater agreement. This may be because as a patient with a neurological or musculoskeletal pathology becomes chronic, they adopt a series of compensations that may influence assessment of balance control. Standard deviation of scores and gender were marginally and significant statistically associated with internal consistency, respectively. An increase in standard deviation of scores and a decrease in the number of women in the study sample implies an increase in internal consistency of the Mini-BESTest. Although standard deviation of test scores explains an important part of variance, this did not reach statistical significance. This lack of statistical significance could be due to low statistical power. Standard deviation of scores has previously been found to be a source of systematic variation of reliability coefficients [99]. Psychometric theory states the higher the SD of test scores, the higher reliability obtained [51].
As for the Brief-BESTest, mean scores, number of raters and mean age of sample were statistically associated with inter-rater and intra-rater reliability respectively, so as the average of scale scores increases and number of raters decreases interrater reliability and increases intra-rater reliability. The latter appears to be higher when several raters rather than a single rater administer the scale to patients on two different occasions. Furthermore, it appears that as the age of the sample increases, intra-rater reliability decreases. The mean age of the sample was also marginally significant indicating an increase in the mean age led to an increase in internal consistency.
As regards the qualitative moderator analysis (ANOVAs), we found that, in the BESTest, the continent the study proceeded from was a significant moderator for inter-rater reliability. The lowest inter-rater ICC was obtained in Australia the highest (on average) in Asia. Furthemore, the continent the study proceeded from obtained marginal statistical significance in the intra-class intra-rater correlation of BESTest and inter-rater correlation of Mini-BESTest. The disease and population type were marginally significant and significant moderators for internal consistency and inter-rater reliability of Mini-BESTest, respectively. In relation to disease, the lowest coefficient was obtained in patients with type 2 diabetes and the highest in patients with total knee arthroplasty. Balance problems may be more readily observed when assessed in patients suffering from a musculoskeletal problem associated with surgery than when patients have neuropathic involvement associated with a metabolic problem such as diabetes. Also the population type was significant with the normal institutionalized population indicating a higher inter-rater reliability than the mixed population or clinical population. The lowest coefficient was obtained in the normal, non-institutionalized population.
For the Brief-BESTest, the type of design was significant where experimental studies showed a higher mean intra-rater reliability than observation studies. An explanation is the number of experimental studies was significantly lower than that of observational studies and that in the former, evaluations can be conducted by expert judges. Rater formation was also significant where the combination of raters with completed and unfinished physiotherapy studies obtained higher intra-rater agreement than physiotherapists with completed studies or only physiotherapists in training.
Limitations
Our study has several limitations. The number of studies reporting reliability estimates with data at hand is considerably smaller for BESTest and especially for the Brief-BESTest. This, together with the lack of important data reported by authors reduced the possibility of analyzing their influence as potential moderating variables on reliability coefficients. In particular, many studies did not report the mean and standard deviation of disorder and experience with the scale (BESTest, Mini-BESTest or Brief-BESTest). Furthermore, some studies did not report the mean and standard deviation of test scores, two essential moderators in the context of RG studies.
Conclusions
The main findings of the current RG meta-analysis report that the BESTest, Mini-BESTest and Brief-BESTest instruments present, on average, excellent reliability and internal consistency values. These outcome measures can be recommended for the screening of balance control and balance impairments. Some continuous and categorical moderator variables increase reliability and internal consistency of these scales. Mean scores, standard deviation of scores, mean age, gender, population type, mean history of the disorder, disease, raters´ experience, number of raters, rater formation, continent of study and design type presented statistically significant relationships with ICC and/or Cronbach´s alpha for BESTest and the two abbreviated versions.
Supporting information
S1 Table. Search Strategy.
https://doi.org/10.1371/journal.pone.0318302.s001
(DOCX)
S2 Table. Studies included and excluded.
https://doi.org/10.1371/journal.pone.0318302.s002
(XLSX)
S3 Table. Checklist PRISMA.
https://doi.org/10.1371/journal.pone.0318302.s003
(DOCX)
S4 Table. Characteristics of the included studies.
https://doi.org/10.1371/journal.pone.0318302.s004
(DOCX)
S5 Table. Evaluating of methodological quality.
https://doi.org/10.1371/journal.pone.0318302.s005
(DOCX)
References
1. 1. Horak FB, Wrisley DM, Frank J. The balance evaluation systems test (BESTest) to differentiate balance deficits. Phys Ther. 2009;89(5):484–98. pmid:19329772
* View Article
* PubMed/NCBI
* Google Scholar
2. 2. Horak FB, Shupert CL, Mirka A. Components of postural dyscontrol in the elderly: a review. Neurobiol Aging. 1989;10(6):727–38. pmid:2697808
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Woollacott MH, Shumway-Cook A. Attention and the control of posture and gait: a review of an emerging area of research. Gait & Posture. 2002;16(1):1–14. ttps://doi.org/10.1016/s0966-6362(01)00156-4
* View Article
* Google Scholar
4. 4. Nutt J, Horak FB. Gait and balance disorders. In: Asbury AK, McKhann GM, McDonald WI, et al. eds. Diseases of the nervous system: clinical neuroscience and therapeutic principles. 3rd ed. Cambridge, United Kingdom: Cambridge University Press; 2002:581–591.
5. 5. Bernstein NA. The co-ordination and regulation of movements. Oxford, NY: Pergamon Press; 1967.
6. 6. Horak FB, Shumway-Cook A. Clinical implications of posture control research. In: Duncan P, ed. Balance: proceedings of the APTA forum. Alexandria, VA: American Physical Therapy Association; 1990:105–111.
7. 7. Horak FB. Effects of neurological disorders on postural movement strategies in the elderly. In: Vellas B, Toupet M, Rubenstein L, et al. eds. Falls, balance, and gait disorders in the elderly. Paris, France: Elsevier Science Publishers; 1992:137–151.
8. 8. Geurts ACH, Haart M, van Nes IJW, Duysens J. A review of standing balance recovery from stroke. Gait Posture. 2005;22(3):267–81.
* View Article
* Google Scholar
9. 9. Huxham FE, Goldie PA, Patla AE. Theoretical considerations in balance assessment. Aust J Physiother. 2001;47(2):89–100. pmid:11552864
* View Article
* PubMed/NCBI
* Google Scholar
10. 10. Rubenstein LZ. Falls in older people: epidemiology, risk factors and strategies for prevention. Age Ageing. 2006;35(2):ii37–41.
* View Article
* Google Scholar
11. 11. Gerdhem P, Ringsberg KA, Akesson K, Obrant KJ. Clinical history and biologic age predicted falls better than objective functional tests. J Clin Epidemiol. 2005;58(3):226–32. pmid:15718110
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Pollock CL, Eng JJ, Garland SJ. Clinical measurement of walking balance in people post stroke: a systematic review. Clin Rehabil. 2011;25(8):693–708. pmid:21613511
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Marques A, Almeida S, Carvalho J, Cruz J, Oliveira A, Jácome C. Reliability, validity, and ability to identify fall status of the balance evaluation systems test, mini-balance evaluation systems test, and Brief-balance evaluation systems test in older people living in the community. Arch Phys Med Rehabil. 2016;97(12):2166–73.e1. pmid:27497826
* View Article
* PubMed/NCBI
* Google Scholar
14. 14. Franchignoni F, Horak F, Godi M, Nardone A, Giordano A. Using psychometric techniques to improve the balance evaluation systems test: The mini-BESTest. J Rehabil Med. 2010;42(4):323–31. pmid:20461334
* View Article
* PubMed/NCBI
* Google Scholar
15. 15. Nakhostin-Ansari A, Nakhostin-Ansari N, Mellat-Ardakani M, Nematizad M, Naghdi S, Babaki M, et al. Reliability and validity of Persian versions of Mini-BESTest and Brief-BESTest in persons with Parkinson’s disease. Physiother Theroy Pract. 2022;38(9):1264–72.
* View Article
* Google Scholar
16. 16. Padgett PK, Jacobs JV, Kasser SL. Is the BESTest at its best? A suggested brief version based on interrater reliability, validity, internal consistency, and theoretical construct. Phys Ther. 2012;92(9):1197–207. pmid:22677295
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Wallén MB, Sorjonen K, Löfgren N, Franzén E. Structural validity of the mini-balance evaluation systems test (Mini-BESTest) in people with mild to moderate Parkinson disease. Phys Ther. 2016;96(11):1799–806.
* View Article
* Google Scholar
18. 18. Godi M, Arcolin I, Leavy B, Giardini M, Corna S, Franzén E. Insights into the Mini-BESTest scoring system: comparison of 6 different structural models. Phys Ther. 2021;101(10):pzab180.
* View Article
* Google Scholar
19. 19. Yingyongyudha A, Saengsirisuwan V, Panichaporn W, Boonsinsukh R. The mini-balance evaluation systems test (Mini-BESTest) demonstrates higher accuracy in identifying older adult participants with history of falls than do the BESTest, Berg balance scale, or timed up and go test. J Geriatr Phys Ther. 2016;39(2):64–70. pmid:25794308
* View Article
* PubMed/NCBI
* Google Scholar
20. 20. Phyu SN, Peungsuwan P, Puntumetakul R, Chatchawan U. Reliability and validity of mini-balance evaluation system test in type 2 diabetic patients with peripheral neuropathy. Int J Environ Res Publi Health. 2022a;19(11):6944.
* View Article
* Google Scholar
21. 21. Phyu SN, Wanpen S, Chatchawan U. Responsiveness of the mini-balance evaluation system test in type 2 diabetic patients with peripheral neuropathy. J Multidiscip Healthc. 2022b;15:3015–28.
* View Article
* Google Scholar
22. 22. Chinsongkram B, Chaikeeree N, Saengsirisuwan V, Viriyatharakij N, Horak FB, Boonsinsukh R. Reliability and validity of the balance evaluation systems test (BESTest) in people with subacute stroke. Phys Ther. 2014;94(11):1632–43. pmid:24925073
* View Article
* PubMed/NCBI
* Google Scholar
23. 23. Madsalae T, Thongprong T, Chinkulprasert C, Boonsinsukh R. Can the balance evaluation systems test be used to identify system-specific postural control impairments in older adults with chronic neck pain? Front Med (Lausanne). 2022;9:1012880. pmid:36388898
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Rodrigues LC, Marques AP, Barros PB, Michaelsen SM. Reliability of the balance evaluation systems test (BESTest) and BESTest sections for adults with hemiparesis. Braz J Phys Ther. 2014;18(3):276–81. pmid:25003281
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Pereira Viveiro LA, Vieira Gomes GC, Ribeiro Bacha JM, Carvas Junior NC, Esteves Kallas M, Reis M, et al. Reliability, validity, and ability to identify fall status of the Berg balance scale, balance evaluation systems test (BESTest), Mini-BESTest, and Brief-BESTest in older adults who live in nursing homes. J Geriatr Phys Ther. 2019;42(4):E45–54.
* View Article
* Google Scholar
26. 26. Maia AC, Rodrigues-de-Paula F, Magalhaes LC, Teixeira RLL. Cross-cultural adaptation and analysis of the psychometric properties of the balance evaluation systems test and miniBESTest in the elderly and individuals with Parkinson’s disease: application of the Rasch model. Braz J Phys Ther. 2013;17(3):195–217.
* View Article
* Google Scholar
27. 27. Jácome C, Cruz J, Oliveira A, Marques A. Validity, reliability, and ability to identify fall status of the berg balance scale, BESTest, mini-BESTest, and brief-BESTest in patients with COPD. Phys Ther. 2016;96(11):1807–15. pmid:27081201
* View Article
* PubMed/NCBI
* Google Scholar
28. 28. Jácome C, Flores I, Martins F, Castro C, McPhee C, Shepherd E, et al. Validity, reliability and minimal detectable change of the balance evaluation systems test (BESTest), mini-BESTest and brief-BESTest in patients with end-stage renal disease. Disabil Rehabil. 2018;40(26):3171–6.
* View Article
* Google Scholar
29. 29. Nagdhi S, Nakhostin- Ansari N, Forogh B, Khalifeloo M, Honarpisheh , Nakhostin-Ansari A. Reliability and validity of the Persian version of the mini-balance evaluation systems in patients with stroke. Neurol Ther. 2020;9(2). :567–74.
* View Article
* Google Scholar
30. 30. Bahirei S, Hosseini E, Amiri Jomi Lou R. The test-retest reliability and limits of agreement of the balance evaluation systems test (BESTEst) in young people with intellectual disability. Sci Rep. 2023;3(1):15968.
* View Article
* Google Scholar
31. 31. Lemay J-F, Roy A, Nadeau S, Gagnon DH. French version of the mini BESTest: a translation and transcultural adaptation study incorporating a reliability analysis for individuals with sensorimotor impairments undergoing functional rehabilitation. Ann Phys Rehabil Med. 2019;62(3):149–54. pmid:30594663
* View Article
* PubMed/NCBI
* Google Scholar
32. 32. Severijns P, Overbergh T, Scheys L, Moke L, Desloovere K. Reliability of the balance evaluation systems test and trunk control measurement scale in adult spinal deformity. PLoS One. 2019;14(8):e0221489. pmid:31449540
* View Article
* PubMed/NCBI
* Google Scholar
33. 33. Lampropoulou SI, Billis E, Gedikoglou IA, Michailidou C, Nowicky AV, Skrinou D, et al. Reliability, validity and minimal detectable change of the Mini-BESTest in Greek participants with chronic stroke. Physiother Theory Pract. 2019;35(2):171–82. pmid:29474129
* View Article
* PubMed/NCBI
* Google Scholar
34. 34. Kondo Y, Bando K, Ariake Y, Katsuta W, Todoroki K, Nishida D, et al. Test-retest reliability and minimal detectable change of the balance evaluation syste`ms test and its two abbreviated versions in persons with mild to moderate spinocerebellar ataxia: a pilot study. NeuroRehabilitation. 2020;47(4):479–86. pmid:33136076
* View Article
* PubMed/NCBI
* Google Scholar
35. 35. Hamre C, Botolfsen P, Tangen GG, Helbostad JL. Interrater and test-retest reliability and validity of the Norwegian version of the BESTest and mini-BESTest in people with increased risk of falling. BMC Geriatr. 2017;17(1):92. pmid:28427332
* View Article
* PubMed/NCBI
* Google Scholar
36. 36. Goljar N, Giordano A, Vrbanic TSL, Rudoff M, Banicek-Sosa I, Albensi C, et al. Rasch validation and comparison of Slovenian, Croatian, and Italian versions of the Mini-BESTest in patients with subacute stroke. Int J Rehabil Res. 2017;40(3):232–9.
* View Article
* Google Scholar
37. 37. Aydogan Arslan SA, Demirci CS, Kirmaci ZIK, Ugurlu K, Keskin ED. Reliability and validity of Turkish version of the brief-BESTest in stroke patients. Top Stroke Rehabil. 2021;28(7):488–97.
* View Article
* Google Scholar
38. 38. Göktas A, Colak FD, Kar I, Ekici G. Reliability and validity of the Turkish version of the Mini-BESTest balance scale in patients with stroke. Turk J Neurol. 2020;26:303–10.
* View Article
* Google Scholar
39. 39. Dogrouz Karatekin BD, Icagasioglu A, Pasin O. Validity, reliability and minimal detectable change of mini-BESTest Turkish version in neurological disorders. Acta Neurol Belg. 2023;123(4):1519–25.
* View Article
* Google Scholar
40. 40. Dewar R, Claus AP, Tucker K, Ware R, Johnston LM. Reproducibility of the balance evaluation systems test (BESTest) and the mini-BESTest in school-aged children. Gait Posture. 2017;55:68–74. pmid:28419876
* View Article
* PubMed/NCBI
* Google Scholar
41. 41. Chan ACM, Pang MYC. Assessing balance function in patients with total knee arthroplasty. Phys Ther. 2015;95(10):1397–407. pmid:25882482
* View Article
* PubMed/NCBI
* Google Scholar
42. 42. Chiu AYY, Pang MYC. Assessment of psychometric properties of various balance assessment tools in persons with cervical spondylotic myelopathy. J Orthop Sports Phys Ther. 2017;47(9):673–82. pmid:28704622
* View Article
* PubMed/NCBI
* Google Scholar
43. 43. Dominguez-Olivan P, Gasch-Gallen A, Aguas-Garcia E, Bengoetxea A. Validity and reliability testing of the Spanish version of the BESTest and mini-BESTest in healthy community-dwelling elderly. BMC Geriatr. 2020;20(1):444. pmid:33148216
* View Article
* PubMed/NCBI
* Google Scholar
44. 44. Alyousef NI, Shaheen AAM, Elsayed W, Alsubiheen AM, Farrag A. Pyschometric properties of the Arabic version of the Mini-Balance evaluation systems test in patients with neurological balance disorders. Eur Rev Med Pharmacol Sci. 2023;27(10):4337–47. pmid:37259714
* View Article
* PubMed/NCBI
* Google Scholar
45. 45. Alqahtani BA, Alhowimel AS, Alshehri MM, Alqahtani MA, Almuhaysh AA, Alshakarah AO, et al. Cross-cultural adaptation and validation of the Arabic version of the mini-BESTest among community-dwelling older adults in Saudi Arabia. Healthcare (Basel). 2022;10(10):1903. pmid:36292350
* View Article
* PubMed/NCBI
* Google Scholar
46. 46. Franchingnoni F, Godi M, Guglielmetti S, Nardone A, Giordano A. Enhancing the usefulness of the Mini-BESTest for measuring dynamic balance: a Rasch validation study. Eur J Phys Rehabil Med. 2015;51(4):429–37.
* View Article
* Google Scholar
47. 47. Franchignoni F, Godi M, Corna S, Giordano A. Rasch validation of the Mini-BESTest in people with Parkinson disease. J Neurol Phys Ther. 2022;46(3):219–26. pmid:35404882
* View Article
* PubMed/NCBI
* Google Scholar
48. 48. Godi M, Giardini M, Arcolin I, Ferrante S, Nardone A, Corna S, et al. Is the brief-BESTest brief enough? Suggested modifications based on structural validity and internal consistency. Phys Ther. 2019;99(11):1562–73. pmid:31348513
* View Article
* PubMed/NCBI
* Google Scholar
49. 49. Bravini E, Nardone A, Godi M, Guglielmetti S, Franchignoni F, Giordano A. Does the brief-BESTest meet classical test theory and rasch analysis requirements for balance assessment in people with neurological disorders? Phys Ther. 2016;96(10):1610–9. pmid:27103223
* View Article
* PubMed/NCBI
* Google Scholar
50. 50. Streiner D L, Norman GR. Health measurement scales: A practical guide to their development and use. (4th ed.). Oxford University Press; 2008.
51. 51. Appelbaum M, Cooper H, Kline RB, Mayo‐Wilson E, Nezu AM, Rao SM. Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. Am Psychol. 2018;3(1):3. ttps://doi.org/10.1037/amp.0000191
* View Article
* Google Scholar
52. 52. Vacha‐Haase T, Kogan LR, Thompson B. Sample compositions and variabilities in published studies versus those in test manuals: Validity of score reliability inductions. Educ Psychol Meas. 2000;60(4):509–22.
* View Article
* Google Scholar
53. 53. Vacha‐Haase T. Reliability generalization: exploring variance in measurement error affecting score reliability across studies. Educ Psychol Meas. 1998;58:6–20.
* View Article
* Google Scholar
54. 54. Henson RK, Thompson B. Characterizing measurement error in scores across studies: some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development. 2002;35(2):113–27.
* View Article
* Google Scholar
55. 55. Rodriguez MC, Maeda Y. Meta‐analysis of coefficient alpha. Psychol Methods. 2006;11(3):306–22. pmid:16953707
* View Article
* PubMed/NCBI
* Google Scholar
56. 56. Sánchez‐Meca J, López‐López JA, López‐Pina JA. Some recommended statistical analytic practices when reliability generalization studies are conducted. Brit J Math Stat Psy. 2013;66(3):402–25.
* View Article
* Google Scholar
57. 57. Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting ítems for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64. pmid:19622511
* View Article
* PubMed/NCBI
* Google Scholar
58. 58. Mokkink LB, Elsman EBM, Terwee CB. COSMIN guidelline for systematic reviews of patient-reported outcome measures version 2.0. Qual Life Res. 2024;33(11):2929–39. pmid:39198348
* View Article
* PubMed/NCBI
* Google Scholar
59. 59. Bonnet DG. Sample size requirements for testing and estimating coefficient alpha. J Educ Behav Stat. 2002;27:335–40.
* View Article
* Google Scholar
60. 60. Hartung J. An alternative method for me-analysis. Biom J. 1999;41(8):901–16.
* View Article
* Google Scholar
61. 61. Sánchez-Meca J, Marín-Martínez F. Confidence intervals for the overall effect size in random-effect size in random-effects meta-analysis. Psychol Methods. 2008;13(1):31–48. pmid:18331152
* View Article
* PubMed/NCBI
* Google Scholar
62. 62. López-López JA, Botella J, Sánchez-Meca J, Marín-Martinez F. Alternative for mixed-effects meta-regression models in the reliability generalization approach: A simulation study. J Educ Behav Stat. 2013;38:443–69.
* View Article
* Google Scholar
63. 63. Knapp G, Hartung J. Improved tests for a random effects meta-regression with a single covariate. Stat Med. 2003;22(17):2693–710. pmid:12939780
* View Article
* PubMed/NCBI
* Google Scholar
64. 64. Borenstein J, Hedges LV, Higgins JPT, Rothstein H. Introduction to meta-analysis. Chichester, UK: Wiley; 2009.
65. 65. López-López JA, Marín-Martínez F, Sánchez-Meca J, Van den Noortgate W, Wiechtbauer W. Estimation of the predictive power of the model in mixed-effects meta-regression: A simulation study. Br J Math Stat Psychol. 2014;67:30–48.
* View Article
* Google Scholar
66. 66. Viechtbauer W. Conducting meta-analysis in R with the metaphor package. J Sta Softw. 2010;36:1–48.
* View Article
* Google Scholar
67. 67. Duval SJ, Tweedie RL. A non-parametric “trim and fill” method of accounting for publication bias in meta-analysis. JASA. 2000;95:89–98.
* View Article
* Google Scholar
68. 68. Rohstein HR, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis: prevention, assessment, and adjustments. Chichester, UK: Wiley; 2005.
69. 69. Huang MH, Miller K, Smith K, Fredrickson K, Shilling T. Reliability, validity, and minimal detectable change of balance evaluation systems test and its short versions in older cancer survivors: a pilot study. J Geriatr Phys Ther. 2016;39(2):58–63. pmid:25695466
* View Article
* PubMed/NCBI
* Google Scholar
70. 70. Mitchell KD, Chen H, Silfies SP. Test-retest reliability, validity, and minimal detectable change of the balance evaluation systems test to assess balance in persons with multiple sclerosis. Int J MS Care. 2018;20(5):231–7. pmid:30374253
* View Article
* PubMed/NCBI
* Google Scholar
71. 71. Potter K, Anderberg L, Anderson D, Bauer B, Beste M, Navrat S, et al. Reliability, validity and responsiveness of the balance evaluation systems test (BESTest) in individuals with multiple sclerosis. Physiotherapy. 2018;104(1):142–8.
* View Article
* Google Scholar
72. 72. Wang-Hsu E, Smith SS. Interrater and test-retest reliability and minimal detectable change of the balance evaluation systems test (BESTest) and subsystems with community-dwelling older adults. J Geriatr Phys Ther. 2018;41(3):173–9. pmid:28079632
* View Article
* PubMed/NCBI
* Google Scholar
73. 73. Leddy A, Crowner BE, Earhart GM. Functional gait assessment and balance evaluation system test: reliability, validity, sensitivity, and specificity for identifying individuals with Parkinson disease who fall. Phys Ther. 2011a;91(1):102–13.
* View Article
* Google Scholar
74. 74. Leddy A, Crowner BE, Earhart GM. Utility of the mini-BESTest, BESTest, and BESTest sections for balance assessments in individuals with Parkinson disease. J Neurol Phys Ther. 2011b;35(2):90–7.
* View Article
* Google Scholar
75. 75. Löfgren N, Lenholm E, Conradsson D, Stahle A, Franzén E. The Mini-BESTest –a clinically reproducible tool for balance evaluations in mild to moderate Parkinson’s disease? BMC Neurol. 2014;14:235. pmid:25496796
* View Article
* PubMed/NCBI
* Google Scholar
76. 76. Molhemi F, Monjezi S, Mehravar M, Shaterzadeh-Yazdi M-J, Majdinasab N. Validity, reliability, and responsiveness of Persian version of mini-balance evaluation system test among ambulatory people with multiple sclerosis. Physiother Theory Pract. 2022;40(3):565–75.
* View Article
* Google Scholar
77. 77. Oyama C, Otaka Y, Onitsuka K, Takagi H, Tan E, Otaka E. Reliability and validity of the Japanese version of the mini-balance evaluation systems test in patients with subacute stroke. Prog Rehabil Med. 2018;3:20180015. pmid:32789240
* View Article
* PubMed/NCBI
* Google Scholar
78. 78. Padgett P, Jacobs JV, Kasser SL. Is the BESTest at its best? A suggested brief version base on interrater reliability, validity, internal consistency, and theoretical construct. Phys Ther. 2012;92(9):1197–207.
* View Article
* Google Scholar
79. 79. Ross E, Purtill H, Coote S. Inter-rater reliability of mini balance evaluation system test in ambulatory people with multiple sclerosis. Int J Ther Rehabil. 2016;23(12):583–9.
* View Article
* Google Scholar
80. 80. Roy A, Higgins J, Nadeau S. Reliability and minimal detectable change of the mini-BESTest in adults with spinal cord injury in a rehabilitation setting. Physiother Theory Pract. 2021;37(1):126–34. pmid:31156010
* View Article
* PubMed/NCBI
* Google Scholar
81. 81. Schenstedt C, Brombacher S, Hartwigsen G, Weisser B, Möller B, Deuschl G. Comparison of the fullerton advanced balance scale, mini-BESTest, and berg balance scale to predict falls in parkinson disease. Phys Ther. 2016;96(4):494–501.
* View Article
* Google Scholar
82. 82. Tsang CSL, Liao L-R, Chung RCK, Pang MYC. Psychometric properties of the mini-balance evaluation systems test (Mini-BESTest) in community-dwelling individuals with chronic stroke. Phys Ther. 2013;93(8):1102–15. pmid:23559522
* View Article
* PubMed/NCBI
* Google Scholar
83. 83. Wagner S, Bring A, Äsenlöf P. Construct validity of the Mini-BESTest in individuals with chronic pain in specialized pain care. BMC Musculoskelet Disord. 2023;24(1):391. pmid:37198616
* View Article
* PubMed/NCBI
* Google Scholar
84. 84. Wallin A, Kierkegaard M, Franzén E, Johansson S. Test-retest reliability of the mini-BESTest in people with mild to moderate multiple sclerosis. Phys Ther. 2021;101(5):pzab045.
* View Article
* Google Scholar
85. 85. Winairuk T, Pang MYC, Saengsirisuwan V, Horak FB, Boonsinsukh R. Comparison of measurement properties of three shortened versions of the balance evaluation system test (BESTest) in people with subacute stroke. J Rehabil Med. 2019;51(9):683–91. pmid:31448806
* View Article
* PubMed/NCBI
* Google Scholar
86. 86. Anson E, Thompson E, Ma L, Jeka J. Reliability and fall risk detection for the BESTest and Mini-BESTest in older adults. J Geriatr Phys Ther. 2019;42(2):81–5. pmid:28448278
* View Article
* PubMed/NCBI
* Google Scholar
87. 87. Bustamante-Contreras C, Ojeda-Gallardo Y, Rueda-Sanhueza C, Rosset PO, Martínez-Carrasco C. Spanish version of the mini-BESTest: a translation, transcultural adaptation and validation study in patients with Parkinson’s disease. Int J Rehabil Res. 2020;43(2):129–34.
* View Article
* Google Scholar
88. 88. Cramer E, Weber F, Faro G, Klein M, Willeke D, Hering T, et al. Cross-cultural adaptation and validation of the German version of the Mini-BESTest in individuals after stroke: an observational study. Neurol Res Pract. 2020;2:27. pmid:33324929
* View Article
* PubMed/NCBI
* Google Scholar
89. 89. Godi M, Franchignoni F, Caligari M, Giordano A, Turcato AM, Nardone A. Comparison of reliability, validity and responsiveness of the mini-BESTest and Berg Balance Scale in patients with balance disorders. Phys Ther. 2013;93(2):158–67. pmid:23023812
* View Article
* PubMed/NCBI
* Google Scholar
90. 90. Jorgensen V, Opheim A, Halvarsson A, Franzén E, Roaldsen KS. Comparison of the Berg balance scale and the mini-BESTest for assessing balance in ambulatory people with spinal cord injury: validation study. Phys Ther. 2017;97(6):677–87. pmid:28371940
* View Article
* PubMed/NCBI
* Google Scholar
91. 91. Huang M, Pang MYC. Psychometric properties of Brief-balance evaluation systems test (Brief-BESTest) in evaluating balance performance in individuals with chronic stroke. Brain Behav. 2017;7(3):e00649. pmid:28293482
* View Article
* PubMed/NCBI
* Google Scholar
92. 92. Leung RWM, Alison JA, McKeough ZJ. Inter-rater and intra-rater reliability of the brief-BESTest in people wiht chronic obstructive pulmonary disease. Clin Rehabil. 2019;33(1):104–12. pmid:30086676
* View Article
* PubMed/NCBI
* Google Scholar
93. 93. Chan K, Unger J, Lee JW, Johnston G, Constand M, Masani K, et al. Quantifying balance control after spinal cord injury: reliability and validity of the mini-BESTest. J Spinal Cord Med. 2019;42(1):141–48.
* View Article
* Google Scholar
94. 94. Levin I, Lewek MD, Giuliani C, Faldowski R, Thorpe DE. Test-retest reliability and minimal detectable change for measures of balance and gait in adults with cerebral palsy. Gait Posture. 2019;72:96–101. pmid:31177021
* View Article
* PubMed/NCBI
* Google Scholar
95. 95. Gylfadottir S, Arnadottir SA, Reynisdottir SM, Helgadottir B, Sigurgeirsson AT, Gudjonsdotir M. Evaluating the reliability and validity of the Icelandic translation of the Mini-BESTest in rehabilitation patients: an international implication for balance assessment. Physiother Theory Pract. 2023;1–10.
* View Article
* Google Scholar
96. 96. Roach K. Measurement of health outcomes: reliability, validity and responsiveness. JPO. 2006;18(1S):8–12.
* View Article
* Google Scholar
97. 97. Toomey E, Coote S. Between-rater reliability of the 6-minute walk test, Berg balance scale, and handheld dynamometry in people with multiple sclerosis. Int J MS Care. 2013;15:1–6.
* View Article
* Google Scholar
98. 98. Munro B. Statistical Methods for Health Care Research, 5th edn. Philadelphia: Lippincott Williams and Wilkins; 2005.
99. 99. Botella J, Suero M, Gambara H. Psychometric inferences from a meta‐analysis of reliability and internal consistency coefficients. Psychol Methods. 2010;15(4):386–97. pmid:20853953
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Meseguer-Henarejos A-B, López-García J-J, López-Pina J-A, Martínez-González-Moro I, Martínez-Carrasco Á (2025) The balance evaluation systems test (BESTest), mini-BESTest and brief-BESTest as clinical tools to assess balance control across different populations: A reliability generalization meta-analysis. PLoS ONE 20(4): e0318302. https://doi.org/10.1371/journal.pone.0318302
About the Authors:
Ana-Belén Meseguer-Henarejos
Roles: Conceptualization, Data curation, Investigation, Methodology, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Department of Physiotherapy, University of Murcia, Murcia, Spain
ORICD: https://orcid.org/0000-0002-9726-7558
Juan-José López-García
Roles: Formal analysis, Methodology, Software
Affiliation: Department of Basic Psychology and Methodology, University of Murcia, Murcia, Spain
José-Antonio López-Pina
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing
Affiliation: Department of Basic Psychology and Methodology, University of Murcia, Murcia, Spain
ORICD: https://orcid.org/0000-0003-1347-7759
Ignacio Martínez-González-Moro
Roles: Data curation, Visualization, Writing – review & editing
Affiliation: Department of Physiotherapy, University of Murcia, Murcia, Spain
ORICD: https://orcid.org/0000-0002-3664-2115
Ángel Martínez-Carrasco
Roles: Data curation, Visualization, Writing – review & editing
Affiliation: Department of Physiotherapy, University of Murcia, Murcia, Spain
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Horak FB, Wrisley DM, Frank J. The balance evaluation systems test (BESTest) to differentiate balance deficits. Phys Ther. 2009;89(5):484–98. pmid:19329772
2. Horak FB, Shupert CL, Mirka A. Components of postural dyscontrol in the elderly: a review. Neurobiol Aging. 1989;10(6):727–38. pmid:2697808
3. Woollacott MH, Shumway-Cook A. Attention and the control of posture and gait: a review of an emerging area of research. Gait & Posture. 2002;16(1):1–14. ttps://doi.org/10.1016/s0966-6362(01)00156-4
4. Nutt J, Horak FB. Gait and balance disorders. In: Asbury AK, McKhann GM, McDonald WI, et al. eds. Diseases of the nervous system: clinical neuroscience and therapeutic principles. 3rd ed. Cambridge, United Kingdom: Cambridge University Press; 2002:581–591.
5. Bernstein NA. The co-ordination and regulation of movements. Oxford, NY: Pergamon Press; 1967.
6. Horak FB, Shumway-Cook A. Clinical implications of posture control research. In: Duncan P, ed. Balance: proceedings of the APTA forum. Alexandria, VA: American Physical Therapy Association; 1990:105–111.
7. Horak FB. Effects of neurological disorders on postural movement strategies in the elderly. In: Vellas B, Toupet M, Rubenstein L, et al. eds. Falls, balance, and gait disorders in the elderly. Paris, France: Elsevier Science Publishers; 1992:137–151.
8. Geurts ACH, Haart M, van Nes IJW, Duysens J. A review of standing balance recovery from stroke. Gait Posture. 2005;22(3):267–81.
9. Huxham FE, Goldie PA, Patla AE. Theoretical considerations in balance assessment. Aust J Physiother. 2001;47(2):89–100. pmid:11552864
10. Rubenstein LZ. Falls in older people: epidemiology, risk factors and strategies for prevention. Age Ageing. 2006;35(2):ii37–41.
11. Gerdhem P, Ringsberg KA, Akesson K, Obrant KJ. Clinical history and biologic age predicted falls better than objective functional tests. J Clin Epidemiol. 2005;58(3):226–32. pmid:15718110
12. Pollock CL, Eng JJ, Garland SJ. Clinical measurement of walking balance in people post stroke: a systematic review. Clin Rehabil. 2011;25(8):693–708. pmid:21613511
13. Marques A, Almeida S, Carvalho J, Cruz J, Oliveira A, Jácome C. Reliability, validity, and ability to identify fall status of the balance evaluation systems test, mini-balance evaluation systems test, and Brief-balance evaluation systems test in older people living in the community. Arch Phys Med Rehabil. 2016;97(12):2166–73.e1. pmid:27497826
14. Franchignoni F, Horak F, Godi M, Nardone A, Giordano A. Using psychometric techniques to improve the balance evaluation systems test: The mini-BESTest. J Rehabil Med. 2010;42(4):323–31. pmid:20461334
15. Nakhostin-Ansari A, Nakhostin-Ansari N, Mellat-Ardakani M, Nematizad M, Naghdi S, Babaki M, et al. Reliability and validity of Persian versions of Mini-BESTest and Brief-BESTest in persons with Parkinson’s disease. Physiother Theroy Pract. 2022;38(9):1264–72.
16. Padgett PK, Jacobs JV, Kasser SL. Is the BESTest at its best? A suggested brief version based on interrater reliability, validity, internal consistency, and theoretical construct. Phys Ther. 2012;92(9):1197–207. pmid:22677295
17. Wallén MB, Sorjonen K, Löfgren N, Franzén E. Structural validity of the mini-balance evaluation systems test (Mini-BESTest) in people with mild to moderate Parkinson disease. Phys Ther. 2016;96(11):1799–806.
18. Godi M, Arcolin I, Leavy B, Giardini M, Corna S, Franzén E. Insights into the Mini-BESTest scoring system: comparison of 6 different structural models. Phys Ther. 2021;101(10):pzab180.
19. Yingyongyudha A, Saengsirisuwan V, Panichaporn W, Boonsinsukh R. The mini-balance evaluation systems test (Mini-BESTest) demonstrates higher accuracy in identifying older adult participants with history of falls than do the BESTest, Berg balance scale, or timed up and go test. J Geriatr Phys Ther. 2016;39(2):64–70. pmid:25794308
20. Phyu SN, Peungsuwan P, Puntumetakul R, Chatchawan U. Reliability and validity of mini-balance evaluation system test in type 2 diabetic patients with peripheral neuropathy. Int J Environ Res Publi Health. 2022a;19(11):6944.
21. Phyu SN, Wanpen S, Chatchawan U. Responsiveness of the mini-balance evaluation system test in type 2 diabetic patients with peripheral neuropathy. J Multidiscip Healthc. 2022b;15:3015–28.
22. Chinsongkram B, Chaikeeree N, Saengsirisuwan V, Viriyatharakij N, Horak FB, Boonsinsukh R. Reliability and validity of the balance evaluation systems test (BESTest) in people with subacute stroke. Phys Ther. 2014;94(11):1632–43. pmid:24925073
23. Madsalae T, Thongprong T, Chinkulprasert C, Boonsinsukh R. Can the balance evaluation systems test be used to identify system-specific postural control impairments in older adults with chronic neck pain? Front Med (Lausanne). 2022;9:1012880. pmid:36388898
24. Rodrigues LC, Marques AP, Barros PB, Michaelsen SM. Reliability of the balance evaluation systems test (BESTest) and BESTest sections for adults with hemiparesis. Braz J Phys Ther. 2014;18(3):276–81. pmid:25003281
25. Pereira Viveiro LA, Vieira Gomes GC, Ribeiro Bacha JM, Carvas Junior NC, Esteves Kallas M, Reis M, et al. Reliability, validity, and ability to identify fall status of the Berg balance scale, balance evaluation systems test (BESTest), Mini-BESTest, and Brief-BESTest in older adults who live in nursing homes. J Geriatr Phys Ther. 2019;42(4):E45–54.
26. Maia AC, Rodrigues-de-Paula F, Magalhaes LC, Teixeira RLL. Cross-cultural adaptation and analysis of the psychometric properties of the balance evaluation systems test and miniBESTest in the elderly and individuals with Parkinson’s disease: application of the Rasch model. Braz J Phys Ther. 2013;17(3):195–217.
27. Jácome C, Cruz J, Oliveira A, Marques A. Validity, reliability, and ability to identify fall status of the berg balance scale, BESTest, mini-BESTest, and brief-BESTest in patients with COPD. Phys Ther. 2016;96(11):1807–15. pmid:27081201
28. Jácome C, Flores I, Martins F, Castro C, McPhee C, Shepherd E, et al. Validity, reliability and minimal detectable change of the balance evaluation systems test (BESTest), mini-BESTest and brief-BESTest in patients with end-stage renal disease. Disabil Rehabil. 2018;40(26):3171–6.
29. Nagdhi S, Nakhostin- Ansari N, Forogh B, Khalifeloo M, Honarpisheh , Nakhostin-Ansari A. Reliability and validity of the Persian version of the mini-balance evaluation systems in patients with stroke. Neurol Ther. 2020;9(2). :567–74.
30. Bahirei S, Hosseini E, Amiri Jomi Lou R. The test-retest reliability and limits of agreement of the balance evaluation systems test (BESTEst) in young people with intellectual disability. Sci Rep. 2023;3(1):15968.
31. Lemay J-F, Roy A, Nadeau S, Gagnon DH. French version of the mini BESTest: a translation and transcultural adaptation study incorporating a reliability analysis for individuals with sensorimotor impairments undergoing functional rehabilitation. Ann Phys Rehabil Med. 2019;62(3):149–54. pmid:30594663
32. Severijns P, Overbergh T, Scheys L, Moke L, Desloovere K. Reliability of the balance evaluation systems test and trunk control measurement scale in adult spinal deformity. PLoS One. 2019;14(8):e0221489. pmid:31449540
33. Lampropoulou SI, Billis E, Gedikoglou IA, Michailidou C, Nowicky AV, Skrinou D, et al. Reliability, validity and minimal detectable change of the Mini-BESTest in Greek participants with chronic stroke. Physiother Theory Pract. 2019;35(2):171–82. pmid:29474129
34. Kondo Y, Bando K, Ariake Y, Katsuta W, Todoroki K, Nishida D, et al. Test-retest reliability and minimal detectable change of the balance evaluation syste`ms test and its two abbreviated versions in persons with mild to moderate spinocerebellar ataxia: a pilot study. NeuroRehabilitation. 2020;47(4):479–86. pmid:33136076
35. Hamre C, Botolfsen P, Tangen GG, Helbostad JL. Interrater and test-retest reliability and validity of the Norwegian version of the BESTest and mini-BESTest in people with increased risk of falling. BMC Geriatr. 2017;17(1):92. pmid:28427332
36. Goljar N, Giordano A, Vrbanic TSL, Rudoff M, Banicek-Sosa I, Albensi C, et al. Rasch validation and comparison of Slovenian, Croatian, and Italian versions of the Mini-BESTest in patients with subacute stroke. Int J Rehabil Res. 2017;40(3):232–9.
37. Aydogan Arslan SA, Demirci CS, Kirmaci ZIK, Ugurlu K, Keskin ED. Reliability and validity of Turkish version of the brief-BESTest in stroke patients. Top Stroke Rehabil. 2021;28(7):488–97.
38. Göktas A, Colak FD, Kar I, Ekici G. Reliability and validity of the Turkish version of the Mini-BESTest balance scale in patients with stroke. Turk J Neurol. 2020;26:303–10.
39. Dogrouz Karatekin BD, Icagasioglu A, Pasin O. Validity, reliability and minimal detectable change of mini-BESTest Turkish version in neurological disorders. Acta Neurol Belg. 2023;123(4):1519–25.
40. Dewar R, Claus AP, Tucker K, Ware R, Johnston LM. Reproducibility of the balance evaluation systems test (BESTest) and the mini-BESTest in school-aged children. Gait Posture. 2017;55:68–74. pmid:28419876
41. Chan ACM, Pang MYC. Assessing balance function in patients with total knee arthroplasty. Phys Ther. 2015;95(10):1397–407. pmid:25882482
42. Chiu AYY, Pang MYC. Assessment of psychometric properties of various balance assessment tools in persons with cervical spondylotic myelopathy. J Orthop Sports Phys Ther. 2017;47(9):673–82. pmid:28704622
43. Dominguez-Olivan P, Gasch-Gallen A, Aguas-Garcia E, Bengoetxea A. Validity and reliability testing of the Spanish version of the BESTest and mini-BESTest in healthy community-dwelling elderly. BMC Geriatr. 2020;20(1):444. pmid:33148216
44. Alyousef NI, Shaheen AAM, Elsayed W, Alsubiheen AM, Farrag A. Pyschometric properties of the Arabic version of the Mini-Balance evaluation systems test in patients with neurological balance disorders. Eur Rev Med Pharmacol Sci. 2023;27(10):4337–47. pmid:37259714
45. Alqahtani BA, Alhowimel AS, Alshehri MM, Alqahtani MA, Almuhaysh AA, Alshakarah AO, et al. Cross-cultural adaptation and validation of the Arabic version of the mini-BESTest among community-dwelling older adults in Saudi Arabia. Healthcare (Basel). 2022;10(10):1903. pmid:36292350
46. Franchingnoni F, Godi M, Guglielmetti S, Nardone A, Giordano A. Enhancing the usefulness of the Mini-BESTest for measuring dynamic balance: a Rasch validation study. Eur J Phys Rehabil Med. 2015;51(4):429–37.
47. Franchignoni F, Godi M, Corna S, Giordano A. Rasch validation of the Mini-BESTest in people with Parkinson disease. J Neurol Phys Ther. 2022;46(3):219–26. pmid:35404882
48. Godi M, Giardini M, Arcolin I, Ferrante S, Nardone A, Corna S, et al. Is the brief-BESTest brief enough? Suggested modifications based on structural validity and internal consistency. Phys Ther. 2019;99(11):1562–73. pmid:31348513
49. Bravini E, Nardone A, Godi M, Guglielmetti S, Franchignoni F, Giordano A. Does the brief-BESTest meet classical test theory and rasch analysis requirements for balance assessment in people with neurological disorders? Phys Ther. 2016;96(10):1610–9. pmid:27103223
50. Streiner D L, Norman GR. Health measurement scales: A practical guide to their development and use. (4th ed.). Oxford University Press; 2008.
51. Appelbaum M, Cooper H, Kline RB, Mayo‐Wilson E, Nezu AM, Rao SM. Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. Am Psychol. 2018;3(1):3. ttps://doi.org/10.1037/amp.0000191
52. Vacha‐Haase T, Kogan LR, Thompson B. Sample compositions and variabilities in published studies versus those in test manuals: Validity of score reliability inductions. Educ Psychol Meas. 2000;60(4):509–22.
53. Vacha‐Haase T. Reliability generalization: exploring variance in measurement error affecting score reliability across studies. Educ Psychol Meas. 1998;58:6–20.
54. Henson RK, Thompson B. Characterizing measurement error in scores across studies: some recommendations for conducting “reliability generalization” studies. Measurement and Evaluation in Counseling and Development. 2002;35(2):113–27.
55. Rodriguez MC, Maeda Y. Meta‐analysis of coefficient alpha. Psychol Methods. 2006;11(3):306–22. pmid:16953707
56. Sánchez‐Meca J, López‐López JA, López‐Pina JA. Some recommended statistical analytic practices when reliability generalization studies are conducted. Brit J Math Stat Psy. 2013;66(3):402–25.
57. Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting ítems for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64. pmid:19622511
58. Mokkink LB, Elsman EBM, Terwee CB. COSMIN guidelline for systematic reviews of patient-reported outcome measures version 2.0. Qual Life Res. 2024;33(11):2929–39. pmid:39198348
59. Bonnet DG. Sample size requirements for testing and estimating coefficient alpha. J Educ Behav Stat. 2002;27:335–40.
60. Hartung J. An alternative method for me-analysis. Biom J. 1999;41(8):901–16.
61. Sánchez-Meca J, Marín-Martínez F. Confidence intervals for the overall effect size in random-effect size in random-effects meta-analysis. Psychol Methods. 2008;13(1):31–48. pmid:18331152
62. López-López JA, Botella J, Sánchez-Meca J, Marín-Martinez F. Alternative for mixed-effects meta-regression models in the reliability generalization approach: A simulation study. J Educ Behav Stat. 2013;38:443–69.
63. Knapp G, Hartung J. Improved tests for a random effects meta-regression with a single covariate. Stat Med. 2003;22(17):2693–710. pmid:12939780
64. Borenstein J, Hedges LV, Higgins JPT, Rothstein H. Introduction to meta-analysis. Chichester, UK: Wiley; 2009.
65. López-López JA, Marín-Martínez F, Sánchez-Meca J, Van den Noortgate W, Wiechtbauer W. Estimation of the predictive power of the model in mixed-effects meta-regression: A simulation study. Br J Math Stat Psychol. 2014;67:30–48.
66. Viechtbauer W. Conducting meta-analysis in R with the metaphor package. J Sta Softw. 2010;36:1–48.
67. Duval SJ, Tweedie RL. A non-parametric “trim and fill” method of accounting for publication bias in meta-analysis. JASA. 2000;95:89–98.
68. Rohstein HR, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis: prevention, assessment, and adjustments. Chichester, UK: Wiley; 2005.
69. Huang MH, Miller K, Smith K, Fredrickson K, Shilling T. Reliability, validity, and minimal detectable change of balance evaluation systems test and its short versions in older cancer survivors: a pilot study. J Geriatr Phys Ther. 2016;39(2):58–63. pmid:25695466
70. Mitchell KD, Chen H, Silfies SP. Test-retest reliability, validity, and minimal detectable change of the balance evaluation systems test to assess balance in persons with multiple sclerosis. Int J MS Care. 2018;20(5):231–7. pmid:30374253
71. Potter K, Anderberg L, Anderson D, Bauer B, Beste M, Navrat S, et al. Reliability, validity and responsiveness of the balance evaluation systems test (BESTest) in individuals with multiple sclerosis. Physiotherapy. 2018;104(1):142–8.
72. Wang-Hsu E, Smith SS. Interrater and test-retest reliability and minimal detectable change of the balance evaluation systems test (BESTest) and subsystems with community-dwelling older adults. J Geriatr Phys Ther. 2018;41(3):173–9. pmid:28079632
73. Leddy A, Crowner BE, Earhart GM. Functional gait assessment and balance evaluation system test: reliability, validity, sensitivity, and specificity for identifying individuals with Parkinson disease who fall. Phys Ther. 2011a;91(1):102–13.
74. Leddy A, Crowner BE, Earhart GM. Utility of the mini-BESTest, BESTest, and BESTest sections for balance assessments in individuals with Parkinson disease. J Neurol Phys Ther. 2011b;35(2):90–7.
75. Löfgren N, Lenholm E, Conradsson D, Stahle A, Franzén E. The Mini-BESTest –a clinically reproducible tool for balance evaluations in mild to moderate Parkinson’s disease? BMC Neurol. 2014;14:235. pmid:25496796
76. Molhemi F, Monjezi S, Mehravar M, Shaterzadeh-Yazdi M-J, Majdinasab N. Validity, reliability, and responsiveness of Persian version of mini-balance evaluation system test among ambulatory people with multiple sclerosis. Physiother Theory Pract. 2022;40(3):565–75.
77. Oyama C, Otaka Y, Onitsuka K, Takagi H, Tan E, Otaka E. Reliability and validity of the Japanese version of the mini-balance evaluation systems test in patients with subacute stroke. Prog Rehabil Med. 2018;3:20180015. pmid:32789240
78. Padgett P, Jacobs JV, Kasser SL. Is the BESTest at its best? A suggested brief version base on interrater reliability, validity, internal consistency, and theoretical construct. Phys Ther. 2012;92(9):1197–207.
79. Ross E, Purtill H, Coote S. Inter-rater reliability of mini balance evaluation system test in ambulatory people with multiple sclerosis. Int J Ther Rehabil. 2016;23(12):583–9.
80. Roy A, Higgins J, Nadeau S. Reliability and minimal detectable change of the mini-BESTest in adults with spinal cord injury in a rehabilitation setting. Physiother Theory Pract. 2021;37(1):126–34. pmid:31156010
81. Schenstedt C, Brombacher S, Hartwigsen G, Weisser B, Möller B, Deuschl G. Comparison of the fullerton advanced balance scale, mini-BESTest, and berg balance scale to predict falls in parkinson disease. Phys Ther. 2016;96(4):494–501.
82. Tsang CSL, Liao L-R, Chung RCK, Pang MYC. Psychometric properties of the mini-balance evaluation systems test (Mini-BESTest) in community-dwelling individuals with chronic stroke. Phys Ther. 2013;93(8):1102–15. pmid:23559522
83. Wagner S, Bring A, Äsenlöf P. Construct validity of the Mini-BESTest in individuals with chronic pain in specialized pain care. BMC Musculoskelet Disord. 2023;24(1):391. pmid:37198616
84. Wallin A, Kierkegaard M, Franzén E, Johansson S. Test-retest reliability of the mini-BESTest in people with mild to moderate multiple sclerosis. Phys Ther. 2021;101(5):pzab045.
85. Winairuk T, Pang MYC, Saengsirisuwan V, Horak FB, Boonsinsukh R. Comparison of measurement properties of three shortened versions of the balance evaluation system test (BESTest) in people with subacute stroke. J Rehabil Med. 2019;51(9):683–91. pmid:31448806
86. Anson E, Thompson E, Ma L, Jeka J. Reliability and fall risk detection for the BESTest and Mini-BESTest in older adults. J Geriatr Phys Ther. 2019;42(2):81–5. pmid:28448278
87. Bustamante-Contreras C, Ojeda-Gallardo Y, Rueda-Sanhueza C, Rosset PO, Martínez-Carrasco C. Spanish version of the mini-BESTest: a translation, transcultural adaptation and validation study in patients with Parkinson’s disease. Int J Rehabil Res. 2020;43(2):129–34.
88. Cramer E, Weber F, Faro G, Klein M, Willeke D, Hering T, et al. Cross-cultural adaptation and validation of the German version of the Mini-BESTest in individuals after stroke: an observational study. Neurol Res Pract. 2020;2:27. pmid:33324929
89. Godi M, Franchignoni F, Caligari M, Giordano A, Turcato AM, Nardone A. Comparison of reliability, validity and responsiveness of the mini-BESTest and Berg Balance Scale in patients with balance disorders. Phys Ther. 2013;93(2):158–67. pmid:23023812
90. Jorgensen V, Opheim A, Halvarsson A, Franzén E, Roaldsen KS. Comparison of the Berg balance scale and the mini-BESTest for assessing balance in ambulatory people with spinal cord injury: validation study. Phys Ther. 2017;97(6):677–87. pmid:28371940
91. Huang M, Pang MYC. Psychometric properties of Brief-balance evaluation systems test (Brief-BESTest) in evaluating balance performance in individuals with chronic stroke. Brain Behav. 2017;7(3):e00649. pmid:28293482
92. Leung RWM, Alison JA, McKeough ZJ. Inter-rater and intra-rater reliability of the brief-BESTest in people wiht chronic obstructive pulmonary disease. Clin Rehabil. 2019;33(1):104–12. pmid:30086676
93. Chan K, Unger J, Lee JW, Johnston G, Constand M, Masani K, et al. Quantifying balance control after spinal cord injury: reliability and validity of the mini-BESTest. J Spinal Cord Med. 2019;42(1):141–48.
94. Levin I, Lewek MD, Giuliani C, Faldowski R, Thorpe DE. Test-retest reliability and minimal detectable change for measures of balance and gait in adults with cerebral palsy. Gait Posture. 2019;72:96–101. pmid:31177021
95. Gylfadottir S, Arnadottir SA, Reynisdottir SM, Helgadottir B, Sigurgeirsson AT, Gudjonsdotir M. Evaluating the reliability and validity of the Icelandic translation of the Mini-BESTest in rehabilitation patients: an international implication for balance assessment. Physiother Theory Pract. 2023;1–10.
96. Roach K. Measurement of health outcomes: reliability, validity and responsiveness. JPO. 2006;18(1S):8–12.
97. Toomey E, Coote S. Between-rater reliability of the 6-minute walk test, Berg balance scale, and handheld dynamometry in people with multiple sclerosis. Int J MS Care. 2013;15:1–6.
98. Munro B. Statistical Methods for Health Care Research, 5th edn. Philadelphia: Lippincott Williams and Wilkins; 2005.
99. Botella J, Suero M, Gambara H. Psychometric inferences from a meta‐analysis of reliability and internal consistency coefficients. Psychol Methods. 2010;15(4):386–97. pmid:20853953
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Meseguer-Henarejos et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background
The Balance Evaluation Systems Test (BESTest) and two abbreviated versions, Mini-BESTest and Brief-BESTest are used to assess functioning of balance control systems. Its reliability across different populations remains to be determined.
Objective
The present study followed reliability generalization procedures to estimate an average internal consistency and inter and intra-rater reliability for the BESTest, Mini-BESTest and Brief-BEStest. In this study, the heterogeneity of reliability coefficients in each instrument is evaluated. If heterogeneity is significant, a moderator analysis is performed to identify the characteristic which explains such variability.
Methods
A search of the PubMed, Embase, PsycINFO, Web of Science, Scopus and CINAHL databases was carried out to February 10th 2024. Two reviewers independently selected empirical studies published in English or Spanish that applied the BESTest, Mini-BESTest and/or Brief-BESTest and reported any reliability coefficient and/or internal consistency with data at hand.
Results
Sixty-four studies reported any reliability estimate BESTest, Mini-BESTest and/or Brief-BESTest scores (N. = 5225 participants). Mean Cronbach alpha for the Mini-BESTest and Brief-BESTest (total score = 0.92) indicating no variability in estimated internal consistency. Likewise, no variability was obtained for inter-rater and intra-rater mean agreement of the BESTest (ICC = 0.97; 0.94), Mini-BESTest (ICC = 0.95; 0.94) and Brief-BESTest (ICC = 0.96; 0.95). Mean scores, standard deviation of scores, mean age, gender, population type, mean history of the disorder, disease, raters´ experience, number of raters, rater formation, continent of study and design type presented statistically significant relationships with ICC and/or Cronbach´s alpha for BESTest and the two abbreviated versions.
Conclusions
The mean intraclass correlations and Cronbach alpha obtained for BESTest, Mini-BESTest and Brief-BESTest exhibited an excellent inter and intra-rater reliability and internal consistency. The average reliability obtained three scales adequate to be applied for screening balance problems in different populations. Some continuous and categorical moderator variables increase reliability and internal consistency of these scales.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer