Content area
Objective
To identify the available Patient-reported Outcome Measures (PROMs) for assessing musculoskeletal complaints in para-athletes and to evaluate their psychometric properties, ultimately providing recommendations for their use.
MethodsPubMed, Embase, Scopus, Cochrane Library, Web of Science, ProQuest, and COSMIN Database were searched until October 2024. Full report studies evaluating psychometric properties of PROMs in para-sport participants with musculoskeletal pain were included. The methodology followed the Consensus-based Standards for selection of Health Measurement Instruments (COSMIN) and PRISMA-COSMIN for OMIs 2024 Guidelines.
ResultsSix articles reported information on Wheelchair User's Shoulder Pain Index (WUSPI); Shoulder Pain Scale for Wheelchair Basketball (SPS-WB); and, Shoulder Pain Index for Wheelchair Basketball (SPI-WB). Content validity was inconsistent for WUSPI and SPS-WB. Structural validity was indeterminate for SPS-WB. Internal consistency was sufficient for SPS-WB but indeterminate for WUSPI and SPI-WB. Reliability was sufficient for WUSPI and SPI-WB. Convergent validity was sufficient for WUSPI but indeterminate for SPS-WB and SPI-WB. Discriminative validity was insufficient for WUSPI and SPI-WB and indeterminate for SPS-WB.
ConclusionsEvidence on PROMs for musculoskeletal complaints in para-sports focuses on shoulder pain in wheelchair basketball, showing low to very low-quality evidence. The WUSPI and SPS-WB can be used considering their limitations.
Para-athletes' participation in competitions has risen considerably ( Derman et al., 2018; Heyward et al., 2017), with further growth expected ( International Paralympic Committee, 2019). Sport offers critical health benefits ( 2018 Physical A ctivity Guidelines Advisory Committee, 2018; Blauwet et al., 2016; Pinheiro et al., 2021) but poses a risk of musculoskeletal injuries ( Heyward et al., 2017; Pinheiro et al., 2021; Soligard et al., 2019). Consequently, epidemiological research to protect para-athlete health has significantly increased ( Derman et al., 2018; Pinheiro et al., 2021; Soligard et al., 2022), including methodological guidelines to improve data collection on Paralympic sport-related injuries and illnesses (SRIIPS). ( Fagher, Jacobsson, et al., 2016).
Musculoskeletal injuries are highly prevalent among para-athletes with a pooled prevalence of 40.8 % across 30 studies involving 12,151 participants, as estimated in a prior systematic review ( Pinheiro et al., 2021). During the Tokyo 2020 Paralympic Games, 342 sports-related injuries were reported among 316 para-athletes, reflecting a 7.7 % injury rate and an incidence of 6.3 per 1000 athlete days ( Derman et al., 2022). Acute injuries accounted for, 66 % of cases, while 34 % were attributed to overuse ( Derman et al., 2022). Longitudinal studies reveal that overuse injuries surpass acute injuries beyond competitions ( Derman et al., 2021; Fagher et al., 2016a, 2020; Hirschmüller et al., 2021). A cross-sectional study found that 50 %–79 % of injuries developed gradually mainly affecting the shoulder, knee, and lower back ( Fagher et al., 2020). Hirschmüller et al. reported a mean weekly prevalence of 13 % for overuse injuries compared to 5 % for acute injuries, with incidence rates of 4.8 and 2.8 per 1000 athlete days, respectively ( Hirschmüller et al., 2021). These findings indicate an elevated risk of overuse injuries in para-athletes ( Brook et al., 2019; Derman et al., 2013, 2016, 2018, 2020, 2022; Pinheiro et al., 2021), particularly those with limb deficiencies ( Fagher et al., 2020), which can lead to pain and functional limitations ( Derman et al., 2021; Fagher et al., 2020; Hirschmüller et al., 2021; Liampas et al., 2021).
Injuries should be evaluated based on their functional and symptomatic impact ( Bahr, 2009), particularly overuse injuries, common in para-athletes ( Derman et al., 2020, 2021, 2022; Fagher et al., 2016a, 2020; Heyward et al., 2017; Hirschmüller et al., 2021). Symptoms often fluctuate over time ( Esteve et al., 2020), complicating assessment and management. Para-athletes frequently experience comorbidities related to their impairment during sports practice ( Derman et al., 2021). Many complaints go unreported as they may be perceived as part of the natural progression of a chronic condition ( Shrier et al., 2017). Over time, impairment's evolution can worsen baseline health ( Derman et al., 2021), affecting not only athletic performance, but also daily living activities, social participation, and quality of life ( Martin, 2013; Weiler et al., 2016). This highlights the need for periodic follow-up to fully understand the extent of the issue ( Derman et al., 2021; van Mechelen et al., 1992; Weiler et al., 2016). When reporting severity, and frequency, all symptoms —not just pain— and the athlete's perspective should be considered alongside pathophysiology ( Bahr, 2009).
Patient-reported outcome measures (PROMs) are the gold standard for registering athletes’ perspectives and reliably assessing their functional status ( Nasser et al., 2022; Padua et al., 2021; Prinsen et al., 2018). There are many generic or condition-specific questionnaires to track symptoms in overuse injuries over time ( Clarsen et al., 2013; Gómez-Valero et al., 2017; Hawker, 2017). Many of these tools have been validated in cross-cultural contexts ( Collins et al., 2016; de Souza et al., 2015; Hernandez-Sanchez et al., 2011; Hirschmüller et al., 2017; Jorgensen et al., 2016; Karaismailoglu et al., 2020; Keller et al., 2018; Martin et al., 1996; Martínez-Cal et al., 2023; Wang et al., 2015) and as short-form versions ( Beaton et al., 2005; Swiontkowski et al., 1999). However, their effectiveness in tracking complaints among para-athletes remains unclear ( Curtis et al., 1995a, 1995b; Fagher, Jacobsson, et al., 2016; García-Gómez et al., 2020). Consequently, there is insufficient knowledge to recommend PROMs for use in research or clinical settings. A systematic investigation of their psychometric properties is required ( Prinsen et al., 2018).
This study aimed to identify PROMs developed and/or validated for assessing musculoskeletal complaints in para-athletes, and to evaluate their psychometric properties, ultimately providing recommendations for their effective use in clinical practice.
2 Materials and methodsThis review was conducted according to the Guideline for reporting systematic reviews of Outcome Measure Instruments PRISMA-COSMIN for OMIs 2024 ( Elsman et al., 2024) and COSMIN guidelines for systematic review of PROMs ( Prinsen et al., 2018). The review protocol was registered in PROSPERO ( PROSPERO. PROSPERO, 2023).
2.1 Literature searchSearch strategy aimed to identify all available PROMs to assess musculoskeletal complaints in para-athletes. It was performed from inception until October 2024 in Medline-PubMed, Embase, Scopus, Cochrane Library, Web of Science, ProQuest, and COSMIN Database. All terms were searched as keywords or Mesh terms and search strategies were adapted for each database.
2.2 Eligibility criteria for selecting studiesInclusion criteria were: (1) studies evaluating psychometric properties of a PROM, concerning validity, reliability, and/or responsiveness; (2) involving patients with musculoskeletal pain or disability; (3) from any para-sport at a professional or recreational level; (4) published as full reports in any language. Studies using PROMs solely as outcome measures or with different designs of the inclusion criteria (narrative reviews, master theses, clinical commentaries, clinical studies) were excluded.
2.3 Study selectionArticles identified were uploaded to Rayyan software ( Ouzzani et al., 2016). After removing duplicates, two reviewers independently screened publications from title and abstracts. In case of disagreement, a third reviewer was consulted for consensus. All selected publications were retrieved in full text and eligibility criteria were applied by the reviewers, and a third person in case of disagreement. Reference lists of included documents were hand-searched to identify further relevant publications.
2.4 Data extractionStudies characteristics and all psychometric properties data were extracted into a standardized form by two independent reviewers. Discrepancies were resolved by a third reviewer. Extracted information from the identified PROMs included name, acronym, assessment dimensions, and the number of rating scales. Authors were contacted for missing or unclear data.
2.5 Quality assessmentThe methodological quality of each study was assessed by two independent reviewers using the Cosmin Risk of Bias Checklist ( Mokkink et al., 2018). Only the boxes corresponding to the study were evaluated. Each box contains a set of items rated as ‘very good’, ‘adequate’, ‘doubtful’, or ‘inadequate’ quality. A third reviewer resolved disagreements. The "worst score counts" principle was applied, meaning if any item was rated as inadequate, the overall rating for that measurement property was considered inadequate ( Mokkink et al., 2018; Prinsen et al., 2018).
2.6 Primary outcomeThe available measurement properties of PROM assessing musculoskeletal pain in para-athletes were analysed. Content validity, internal consistency, test-retest reliability, structural validity, hypothesis testing for construct validity, interpretability and feasibility were evaluated.
2.7 Data analysisThe first step assessed PROM development quality and content validity studies through the COSMIN Risk of Bias Checklist ( Mokkink et al., 2018). Results were individually rated against 10 criteria for good content validity ( Terwee et al., 2017). Two reviewers rated each item as sufficient (+), insufficient (-), or indeterminate (?), and a third reviewer in case of disagreement. Relevance, comprehensiveness, and comprehensibility were rated overall as sufficient (+), insufficient (-), inconsistent (±), or indeterminate (?). Reviewers assessed the ratings of each outcome measure together. If an individual study rating was insufficient, the overall rating remained unchanged. In case of inconsistencies between studies, reviewers sought clarification (e.g., differences in target population). If unresolved, the overall rating was inconsistent (±). ( Terwee et al., 2018).
The COSMIN criteria for good measurement properties were used to rate all other properties by two independent reviewers and a third for consensus ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Each item was rated as sufficient (+), insufficient (-), or indeterminate (?). Evidence from studies on the same PROM was pooled by measurement property when consistent. Unexplained inconsistencies led to an overall rating of "inconsistent" (±). Construct validity was assessed using preformulated hypotheses ( Table 1 ).
2.8 Quality of the body of evidenceThe quality of the evidence was rated as ‘high’, ‘moderate’, ‘low’, or ‘very low’ using an adapted GRADE approach, considering risk of bias, inconsistency, imprecision, and indirectness ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Each PROM and each measurement property were evaluated separately. Starting with the assumption of high-quality evidence, it could be downgraded by one to three levels (Risk of bias) or one to two levels (inconsistency, imprecision, and indirectness) to moderate, low, and very low-quality. If applicable, content validity was rated first ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
2.9 Downgrading criteriaRisk of bias: Downgraded one level for questionable quality content validity studies (high to moderate), two levels for no content validity studies and a questionable quality PROM development study (high to low), and three levels for no content validity studies and inadequate PROM development study (high to very low) ( Terwee et al., 2017). For other properties, downgraded one level for multiple studies of doubtful quality or one study of adequate quality, two levels for multiple studies of inadequate quality or one study of doubtful quality, and three levels for one study of inadequate quality ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
Inconsistency: Downgraded one level if there were discrepancies between two studies of the same PROM and two levels for discrepancies between more than two studies. If all studies were inconsistent, results were rated as ‘inconsistent’ and no quality level was assigned.
Imprecision: Downgraded one level if the pooled sample of all studies was below 100 participants and two levels if it was below 50 ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
Indirectness: Downgraded one level if studies were partly (below 50 %) performed in a different target population or context of use than the interest of the review and two levels if studies were partly performed in a different target population or context of use above 50 % (e.g., studies involving a broader target population than the population of interest).
2.10 Formulating recommendationsA PROM with sufficient content validity and at least low-quality evidence of sufficient internal consistency (+) could be recommended. Conversely, it should not be recommended if there is high-quality evidence of an insufficient measurement property (-). Any other conditions than (+) or (-) could be recommended, as long as there is no evidence of higher quality (±). ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
3 ResultsInitially, 2953 results were obtained. After removing duplicates, 2190 articles were screened by title and abstract. Six articles were selected for full-text screening. One article was excluded, and one was included after a manual reference list search. Finally, six articles met the inclusion criteria ( Fig. 1 ).
3.1 Characteristics of included studiesThree PROMs designed to measure shoulder pain in wheelchair athletes were found ( Table 2 ). The Wheelchair User's Shoulder Pain Index (WUSPI) ( Curtis, Roach, Brooks Applegate, et al., 1995), Shoulder Pain Scale for Wheelchair Basketball Players (SPS-WB) ( NÜ et al., 2019), and Shoulder Pain Index for Wheelchair Basketball Players (SPI-WB) ( García-Gómez et al., 2020). In all studies, participants' disability was mainly due to spinal cord injury.
3.2 Wheelchair User's shoulder pain index (WUSPI)Curtis et al. ( Curtis, Roach, Brooks Applegate, et al., 1995) developed a preliminary version based on the Shoulder Pain and Disability Index (SPADI) ( Roach et al., 1991) to assess: 1) difficulty associated with shoulder problems during functional activities and 2) shoulder pain during functional activities in wheelchair users. Common functions and activities of daily living among wheelchair users were included for item development. Final version measures shoulder pain severity during functional activities in wheelchair users on a single 15-item dimension. Each item is scored on a visual analog scale (VAS) from 0 (no pain) to 10 (worst pain) based on perceived intensity. WUSPI has been translated into Spanish (WUSPI-SP) ( Arroyo-Aljaro & González-Viejo, 2009) and Korean (WUSPI-KO) ( Park & Cho, 2013). Total sample size of all WUSPI versions was 129 participants with a mean age of 39.5 years. Most participants were wheelchair athletes, except in WUSPI-SP, where only 27 % were athletes ( Arroyo-Aljaro & González-Viejo, 2009).
3.3 Shoulder Pain Scale for Wheelchair Basketball Players (SPS-WB)This is a two-dimensional PROM designed to measure pain perception during sports and self-care activities in wheelchair basketball players. It comprises 15 items anchored to VAS scale and the total score ranges from zero to 150, as WUSPI. One development study assessed content validity, structural validity, and internal consistency. The sample size was 143 male basketball players with a mean age of 32 years and at least one year of practice ( NÜ et al., 2019).
3.4 Shoulder Pain Index for Wheelchair Basketball Players (SPI-WB)The whole instrument consists of three components: a) Collection of demographic data to identify factors relevant to the athlete's lifestyle, perceived disability based on WUSPI, ( Curtis et al., 1995a, 1995b), and additional data regarding pain location, onset, and qualitative description. b) 15 items matched with WUSPI-SP ( Arroyo-Aljaro & González-Viejo, 2009) to assess ADL-related shoulder pain, and c) 4 items related to shoulder pain perception during specific wheelchair basketball skills ( García-Gómez et al., 2020). A single study evaluated internal consistency and test-retest reliability for all 19 items and discriminative validity for 8 items in a sample of 17 wheelchair basketball players with a mean age of 30.24 ± 7.40. All participants had been practicing wheelchair basketball for at least one year ( García-Gómez et al., 2020).
3.5 Psychometric properties3.5.1 Content validity
Two studies reported regarding content validity ( Curtis, Roach, Brooks Applegate, et al., 1995; NÜ et al., 2019). WUSPI was developed to assess shoulder pain during ADLs in wheelchair users ( Curtis, Roach, Brooks Applegate, et al., 1995). Interviews were conducted for consensus on concept elicitation, but the process is not clearly described. It is unclear whether patients were asked about relevance, comprehensibility, and comprehensiveness, and no professionals were involved ( Curtis, Roach, Brooks Applegate, et al., 1995). Overall, relevance and comprehensiveness were insufficient and comprehensibility was inconsistent with very low-quality evidence ( Terwee et al., 2017).
SPS-WB was developed to assess shoulder pain during sports and self-care activities in wheelchair basketball players ( NÜ et al., 2019). Interviews were conducted to generate an initial draft for concept elicitation. It is unclear if patients were asked about relevance and comprehensiveness. They were asked about comprehensibility although poorly described ( NÜ et al., 2019). Professionals were asked about relevance, but it is unclear if they were asked about comprehensiveness ( NÜ et al., 2019). Overall rating determined low-quality evidence of inconsistent relevance, sufficient comprehensibility, and very low-quality evidence of inconsistent comprehensiveness ( Table 3 ) ( Terwee et al., 2017).
3.5.2 Structural validityExploratory factor analysis was used to assess the structural validity of the SPS-WB ( NÜ et al., 2019). The results supported a two-factor structure related to shoulder pain during sports (10 items) and self-care activities (5 items) ( NÜ et al., 2019). However, the eigenvalue of the first factor was 7.43 times higher than the second, suggesting the scale may be unidimensional with a general "pain perception" factor ( NÜ et al., 2019). The lack of specification and justification for the rotation method led to a doubtful rating with low-quality evidence. Risk of bias was downgraded two levels ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
3.5.3 Internal consistencyAll PROMs were assessed for internal consistency using Cronbach's alpha ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; NÜ et al., 2019; Park & Cho, 2013). The statistic was calculated for each subscale except for SPI-WB, which was reported for a multidimensional total scale and rated as inadequate ( García-Gómez et al., 2020). In the overall rating, SPS-WB was the only PROM providing high-quality evidence for sufficient internal consistency ( NÜ et al., 2019). All WUSPI versions ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Brooks Applegate, et al., 1995; Park & Cho, 2013) were rated as indeterminate with very low-quality evidence due to not meeting at least low-quality evidence for sufficient structural validity ( Prinsen et al., 2018).
3.5.4 Test-retest reliabilityFour articles of inadequate quality assessed test-retest reliability using the intraclass correlation coefficient (ICC) ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; Park & Cho, 2013). Only WUSPI-SP reported an appropriate time interval between test and retest ( Arroyo-Aljaro & González-Viejo, 2009). Considering important methodological flaws, it was rated as inadequate ( Mokkink et al., 2018). The original WUSPI ranged from 0.84 to 0.99 ( Curtis et al., 1995); WUSPI-KO ranged from 0.88 to 0.98 ( Park and Cho, 2013) and, WUSPI-SP reported a total index score of 0.96 ( Arroyo-Aljaro & González-Viejo, 2009). SPI-WB provided a varied ICC from 0.46 to 1 with an average of 0.80 ( García-Gómez et al., 2020). SPS-WB was not assessed for reliability ( NÜ et al., 2019) ( Table 4 ). Overall rating was very low-quality evidence for sufficient reliability for WUSPI ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013). Evidence was downgraded to three ( Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013) and two ( Arroyo-Aljaro & González-Viejo, 2009) levels for risk of bias and two more for imprecision ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013). SPI-WB provided low-quality evidence for sufficient reliability ( García-Gómez et al., 2020). Three levels were downgraded for risk of bias ( García-Gómez et al., 2020).
3.5.5 Hypothesis testing for construct validityOne study faced WUSPI-SP with the VAS scale to assess convergent validity ( Arroyo-Aljaro & González-Viejo, 2009). Despite reporting a high correlation (r = 0.90), consistent with the formulated hypotheses ( Appendix 2), it was rated as inadequate due to methodological flaws and insufficient information ( Mokkink et al., 2018). Overall, there was very low-quality evidence of sufficient construct validity (comparison with other outcome measures) ( Arroyo-Aljaro & González-Viejo, 2009). Corrected item-total correlation was used to calculate item validity in one study, rated as inadequate for SPS-WB's construct validity ( NÜ et al., 2019). Two studies rated as inadequate assessed WUSPI's discriminative validity by correlating the total index score and shoulder active range of motion ( Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013). Standardized goniometry values from the American Academy of Orthopaedic Surgeons (AAOS) were used to justify decreased ROM in subjects with shoulder pain ( Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013). Overall, there was very low-quality evidence of insufficient construct validity (discriminative) ( Curtis, Roach, Amar, et al., 1995; Park & Cho, 2013). One study evaluated SPI-WB's discriminative validity by correlating the score with ROM, using AAOS and American Medical Association (AMA) values as reference points for the cut-off ( García-Gómez et al., 2020). It was rated as inadequate with very low-quality evidence of insufficient construct validity ( García-Gómez et al., 2020) ( Table 3).
3.5.6 InterpretabilityLimited information is available on interpretability. Mean baseline scores for all WUSPI versions range from 14.2 ( Curtis et al., 1995) to 47.32/150 ( Park & Cho, 2013). There is no information on missing items, floor and ceiling effects, minimal important change (MIC), or minimal important difference (MID) ( Arroyo-Aljaro & González-Viejo, 2009; Curtis et al., 1995a, 1995b; Park & Cho, 2013). One study used the Performance Corrected version (PC-WUSPI) ( Curtis & Black, 1999) to calculate the total score for subjects unable to perform any items ( Park & Cho, 2013). Mean baseline scores for SPS-WB range from 0.622 to 0.872, with no data on missing items, floor and ceiling effects, or MIC/MID ( NÜ et al., 2019). SPI-WB's mean baseline and retest scores were 9.98/190 and 11.54/190, respectively ( García-Gómez et al., 2020).
3.5.7 FeasibilityNo PROMs provided relevant information on patient or clinician comprehensibility. All studies reported self-administered measures ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Brooks Applegate, et al., 1995; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; NÜ et al., 2019; Park & Cho, 2013). Detailed information on interpretability and feasibility are shown in Appendices 5 and 6.
3.5.8 Unrated propertiesCross-cultural validity, criterion validity, measurement error, and responsiveness were not assessed in any included studies and were rated as indeterminate ( Prinsen et al., 2018).
3.6 RecommendationsThe SPS-WB had sufficient comprehensibility and internal consistency ( NÜ et al., 2019). Despite having inconsistent relevance and comprehensiveness, it can be considered for use in research and clinical practice as a condition-specific PROM in wheelchair basketball players, as there is no higher-quality evidence ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). The WUSPI can be used as a generic questionnaire in the absence of specific questionnaires for all other conditions ( Curtis et al., 1995a, 1995b), but its limitations must be considered ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
4 DiscussionThis systematic review is the first to critically evaluate the psychometric properties of PROMs for musculoskeletal complaints in para-athletes. The present study found that existing evidence is limited to shoulder complaints in wheelchair athletes, with outcome measures showing low to very low-quality evidence ( Curtis, Roach, Brooks Applegate, et al., 1995; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; Nü et al., 2019; Park and Cho, 2013).
4.1 Characteristics of measures identifiedPROMs assess functionality and symptoms for a specific problem from the patient's perspective ( Padua et al., 2021; Prinsen et al., 2018). Despite variability in para-athletic conditions and associated issues, only instruments measuring shoulder complaints in wheelchair athletes have been identified ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Brooks Applegate, et al., 1995; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; NÜ et al., 2019; Park & Cho, 2013). This suggests that many conditions affecting para-athletes remain unassessed from an individual perspective due to the lack of appropriate measurement tools.
Although shoulder problems are a major cause of musculoskeletal complaints, their prevalence depends on the para-sport discipline and the impairment ( Blauwet et al., 2016; Brownlow et al., 2024; Derman et al., 2020; Fagher, Forsberg, et al., 2016). Shoulder problems are common in wheelchair athletes, while lower extremity issues are more frequent in ambulant para-athletes ( Derman et al., 2020, 2022; Fagher et al., 2019). Impairment also influences injury mechanisms: Wheelchair athletes are more prone to overuse injuries from the high demand on their upper extremities, whereas visually impaired athletes are more susceptible to traumatic injuries due to altered coordination and balance ( Brownlow et al., 2024; Fagher et al., 2019). When assessing musculoskeletal complaints, these factors should be considered as they directly affect the individual's perception of health ( Fagher, Forsberg, et al., 2016).
For PROM results to be reliable, the target population must be included in the development process. Otherwise, items may lack relevance or comprehensiveness ( Terwee et al., 2017). Therefore, evaluating psychometric properties is crucial to identify those best suited to the subject's reality ( Terwee et al., 2017). Content validity is particularly important ( Terwee et al., 2018), ensuring that items are relevant, comprehensive, and understandable concerning the construct of interest and study population ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Although the included questionnaires aimed to measure similar problems in comparable populations, their constructs differ slightly.
4.2 Quality of measuresAlthough the WUSPI was developed in a para-sport population ( Curtis, Roach, Brooks Applegate, et al., 1995), it does not include sport-related items, so the construct is limited to functional activities ( Curtis et al., 1995a, 1995b). The SPI-WB incorporates four sports-specific items into the WUSPI-SP based on coaches' suggestions to assess basketball-specific skills ( García-Gómez et al., 2020). The SPS-WB contains items to measure pain during basketball practice and self-care activities ( NÜ et al., 2019). Given the results, the SPS-WB was the only questionnaire recommended for use in the review context ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
4.2.1 Content validityWUSPI has been translated into several languages and is the most widely used questionnaire ( Harvey & Glinsky, 2019), although its content validity is inconsistent ( Arroyo-Aljaro & González-Viejo, 2009; Park & Cho, 2013). It lacks sensitivity for measuring shoulder pain in sports-related discomfort, periods of accumulated load, or fatigue, and is also unsuitable for wheelchair users with shoulder pain and functional dependence ( Harvey & Glinsky, 2019). To improve its applicability, it may be advisable to redefine WUSPI's construct to "shoulder pain during functional activities in autonomous wheelchair users" or validate it across different subgroups (athletes, sedentary individuals, functionally dependent individuals, etc). However, with very low-quality evidence, future research on content validity is essential.
4.2.2 Internal structureStructural validity is necessary to interpret internal consistency ( Mokkink et al., 2018; Prinsen et al., 2018). It was assessed only in the SPS-WB with a high level of evidence ( NÜ et al., 2019). In contrast, the WUSPI and SPI-WB were rated as indeterminate due to a lack of structural validity information ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Brooks Applegate, et al., 1995; García-Gómez et al., 2020; Park & Cho, 2013). Cross-cultural evaluation is also important, as item relevance and interpretation may vary across cultures and socioeconomic levels ( Elsman et al., 2022). Once translated, it is essential to ensure that the questionnaire is interpreted consistently across population subgroups while maintaining its original meaning ( Krogsgaard et al., 2021). Despite translation ( Arroyo-Aljaro & González-Viejo, 2009; Park & Cho, 2013), no cross-cultural validity information is available for WUSPI.
4.2.3 Remaining propertiesSufficient reliability was found for WUSPI and SPI-WB, but all studies were rated as inadequate due to insufficient information ( Arroyo-Aljaro & González-Viejo, 2009; Curtis et al., 1995a, 1995b; García-Gómez et al., 2020; Park & Cho, 2013). There was no information on participants' stability during the interitem period, blinding, administration method, or sample size selection ( Gagnier et al., 2021; Kottner et al., 2011). Kottner et al., (2011) recommended reporting multiple coefficients (ICC and Standard error of measurement) to provide a detailed understanding of reliability and agreement ( Kottner et al., 2011). This is important because a large measurement error can create uncertainty in the scores ( Mokkink et al., 2023). If two measures have a similar ICC, the one with a smaller measurement error will be more sensitive and, therefore, preferable. The included articles reported the mean and standard deviation for the test and retest, as well as the average ICC, but omitted absolute reliability and confidence intervals, making interpretation difficult. Construct validity based on subgroup comparisons was insufficient for the WUSPI and SPI-WB ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; Park & Cho, 2013). These measures used goniometry to differentiate subjects with varying shoulder pain intensity. In older adults, decreased ROM may contribute to greater disability due to age-related tissue changes ( Ibounig et al., 2021; Requejo-Salinas et al., 2022). Conversely, younger, healthier athletes usually show no signs of tissue degeneration except in specific cases. However, nociceptive sources and factors affecting ROM are complex and multifactorial ( Ballinger et al., 2000; Haik et al., 2020; Xie et al., 2020), making causality difficult to establish. Goniometry requires a validated, reliable procedure, and a well-trained examiner, as accurately defining the axis and reference points can be challenging ( Kiatkulanusorn et al., 2023). Additionally, ROM was measured in isolated movements ( Arroyo-Aljaro & González-Viejo, 2009; Curtis, Roach, Amar, et al., 1995; García-Gómez et al., 2020; Park & Cho, 2013), detached from real-life situations, making it difficult to distinguish features between subgroups ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Given these factors, discriminative validity correlations should be interpreted with caution, as they may not accurately reflect the target population or context, and the reliability of the measurement procedure remains uncertain, due to insufficient information.
Only the WUSPI assessed convergent validity, facing a single hypothesis ( Arroyo-Aljaro & González-Viejo, 2009). When confirmed, it is assumed that more than 75 % of the results align, indicating sufficient construct validity ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). The comparator instrument was the VAS, a single-item scale designed to assess pain intensity ( Boonstra et al., 2008), frequently used in clinical and research settings ( Chiarotto et al., 2019; Crossley et al., 2004), but not validated for disability perception ( Boonstra et al., 2008). Disability, as a construct, is complex and reflects pain's impact on life, particularly musculoskeletal pain. Therefore, a comprehensive assessment is necessary ( Hawker, 2017). Disability is usually assessed with multi-item instruments measuring various activities forming the construct, allowing identification of specific difficult-to-perform activities. This is challenging with a single-item scale ( Boonstra et al., 2008). The strong correlation between pain and disability suggests convergence, but the available information is insufficient and imprecise to confirm that both instruments measure the same construct. Construct validity of the SPS-WB was indeterminate because it was not assessed according to COSMIN guidelines ( NÜ et al., 2019), complicating interpretation of discriminative validity ( Prinsen et al., 2018). Some authors assessed concurrent validity through correlations, but without a gold standard, the data were considered evidence for construct validity ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018).
4.3 Requirements for improving outcome measures in para-sportPresent findings highlight the urgent need for valid, reliable PROMs to assess musculoskeletal complaints in para-athletes in clinical and research settings. This requires comprehensive evaluations of psychometric properties, particularly content and structural validity, and not just questionnaire development. The absence of content validity can bias other properties and affect result interpretation ( Terwee et al., 2017, 2018), whereas structural validity is essential for understanding an instrument's internal structure ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Interpretability and feasibility should also be considered to identify the most appropriate PROMs when quality differences are minimal ( Mokkink et al., 2018; Prinsen et al., 2018; Terwee et al., 2018). Clinically relevant measures, such as minimum detectable change (MDC) and minimum important change (MIC), should be calculated and reported. Along with the standard error of measurement (SEM), these indicators benchmark the interpretation of change scores and determine an instrument's ability to detect clinically significant changes ( de Vet et al., 2006).
Assessing musculoskeletal complaints in para-athletes also requires considering the relationship between an athlete's impairment and complaint aetiology, as it influences the development and progression of musculoskeletal issues ( Fagher, 2021). Accurate tools are needed to assess symptom progression at the individual level and differentiate impairment impact across health conditions and perspectives ( Fagher et al., 2016b, 2020). Given their impact on function and quality of life, tracking symptom progression is crucial to identify key clinical factors, understand treatment effects, optimize care and improve outcomes.
4.4 LimitationsLiterature on PROM development and validation in para-athletes is scarce. Included studies often provided limited and unclear information, with many properties rated as indeterminate, hindering PROM recommendations. While excluding some publications may have overlooked findings, the broad inclusion criteria were unlikely to capture relevant information. Therefore, expanding the criteria further would not have altered the conclusions.
5 ConclusionCurrent evidence on PROMs for para-athletes is limited to shoulder complaints in wheelchair users, with low to very low-quality evidence for content validity, internal consistency, reliability and construct validity. No study consistently demonstrated sufficient measurement properties, which hampers making strong recommendations. The WUSPI may be used as a generic questionnaire, though its limitations should be considered. The SPS-WB shows potential for assessing shoulder pain in wheelchair basketball players as a condition-specific PROM, but results should be interpreted cautiously. Future research should validate new PROMs for broader musculoskeletal complaints in para-athletes and high-quality studies are needed in this area to evaluate the psychometric properties of both new and existing PROMs, particularly regarding their structural validity, reliability, and responsiveness to clinical change.
CRediT authorship contribution statementNil Jodar-Boixet: Writing – review & editing, Writing – original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Cristina Torres-Pascual: Visualization, Validation, Investigation, Data curation. Rafel Donat-Roca: Visualization, Validation, Formal analysis, Data curation. Kristian Thorborg: Writing – review & editing, Visualization, Validation, Supervision. Anna Prats-Puig: Visualization, Validation, Supervision, Project administration. Ernest Esteve: Writing – review & editing, Visualization, Validation, Supervision, Project administration, Methodology, Formal analysis, Conceptualization.
PROSPERO registration numberCRD42023316228.
Ethical approvalNot required for this review.
FundingThis research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.
Declaration of competing interestThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
AcknowledgementsLaia Planella (PhD candidate) for highlighting the importance of this topic.
Appendix A Supplementary dataThe following are the Supplementary data to this article: Multimedia component 1 Multimedia component 1 Multimedia component 2 Multimedia component 2 Multimedia component 3 Multimedia component 3 Multimedia component 4 Multimedia component 4 Multimedia component 5 Multimedia component 5 Multimedia component 6 Multimedia component 6
Appendix A Supplementary dataSupplementary data to this article can be found online at https://doi.org/10.1016/j.ptsp.2025.03.007.
| Measurement property | Definition | Rating | Criteria |
| Content Validity | The degree to which the content of a PROM is an adequate reflection of the construct to be measured. In terms of
| | Relevance, Comprehensiveness and Comprehensibility rating is +
|
| | At least one rating is + and at least one rating is – | ||
| | Two or more of the ratings are rated as ? | ||
| | Relevance, Comprehensiveness and comprehensibility rating is − | ||
| Structural validity | The degree to which the scores of a PROM are an adequate reflection of the dimensionality of the construct to be measured | | CFA: CFI or TLI or comparable measure>0.95 OR RMSEA<0.06 OR RMSEA<0.082 No violation of unidimensionality: CFI or TLI or comparable measure>0.95OR RMSEA<0.06 OR SRMR<0.08 Rasch: infit and outfit mean squares≥0.5 and ≤ 1.5 OR Z standardized values >−2 and < 2 |
| | CTT: Not all information for ‘+’ reported IRT/Rasch: Model fit not reported | ||
| | Criteria for ‘+’ not met | ||
| Internal consistency | The degree of interrelatedness among the items | | At least low evidence for sufficient structural validity AND Cronbach's alpha(s) ≥0.7 for each unidimensional scale or subscale
|
| | Criteria for “at least low evidence for sufficient structural validity” not met | ||
| | At least low evidence for sufficient structural validity AND Cronbach's alpha(s) ≥0.7 for each unidimensional scale or subscale | ||
| Test-retest reliability | The proportion of the total variance in the measurements which is due to ‘true’ differences between patients | | ICC or weighted Kappa≥0.7
|
| | ICC or weighted Kappa not reported | ||
| – | ICC or weighted Kappa<0.70 | ||
| Hypotheses testing for construct validity | The degree to which the scores of a PROM are consistent with hypotheses based on the assumption that the PROM validly measures the construct to be measured (e.g., concerning relationships to scores of other instruments, or differences between relevant groups) | | 75 % of the results are in accordance with the hypotheses proposed by the review team:
|
| | No hypotheses were defined by the review team | ||
| | The results are not in accordance with the hypotheses proposed by the review team | ||
| Interpretability | The degree to which one can assign qualitative meaning to a PROM's quantitative scores or change in scores (e.g., distribution of scores in the study population, floor and ceiling effects, minimal important change (MIC) or minimal important difference, etc). | Narrative summary | |
| Feasibility | The ease of application of the PROM in its intended context of use, given constraints such as time or money. It is related to the concept of “clinical utility” (e.g., Length of the instrument, copyright, required equipment, etc). | Narrative summary | |
| Structural validity | Internal consistency | Cross-cultural validity/Measurement invariance | Reliability | Measurement error | Hypotheses testing for construct validity | Responsiveness | |||||||||||||||
| Ref, Year
| | | | | | | | | | | | | | | | | | | | | |
| Curtis, Roach, Brooks Applegate, et al., 1995, Curtis, Roach, Brooks Applegate, et al., 1995 Curtis, Roach, Amar, et al., 1995 (WUSPI) | NT | NT | NT | 64 | A | 0.97 (?) | NT | NT | NT | 16 | I | 0.99 (+) | NT | NT | NT | 64 | I | 3/5 in line (-) †
<75 % | NT | NT | NT |
|
Arroyo-Aljaro & González-Viejo, 2009
| NT | NT | NT | 42 | D | 0.88 (?) | NT | NT | NT | 8 | I | 0.96 (+) | NT | NT | NT | 42 | I | 1/1 in line (+) ‡
>75 % | NT | NT | NT |
| Park & Cho, 2013 (WUSPI-KO) | NT | NT | NT | 23 | D | 0.96 (?) | NT | NT | NT | 23 | I | 0-88 to 0.99 (+) | NT | NT | NT | 23 | I | 3/5 in line (-) †
<75 % | NT | NT | NT |
|
NÜ et al., 2019
| 143 | A | ? | 143 | V | 0.95 (?) | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT | NT |
| García-Gómez et al., 2020 (SPI-WB) | NT | NT | NT | 17 | I | 0.98 (?) | NT | NT | NT | 17 | I | 0.98 (+) | NT | NT | NT | 17 | I | 4/6 in line (-) †
<75 % | NT | NT | NT |
©2025. Elsevier Ltd