Content area

Abstract

Multidimensional scoring methods yield valuable information about communication abilities. However, issues of training demands for valid and reliable scoring, especially in current service delivery contexts, may preclude common usage. Alternatives to multidimensional scoring were investigated in a sample of adults with aphasia. One alternative method involved modified multidimensional scoring; the others incorporated correct/incorrect scoring. The scores for the 3 alternative methods were derived from the scores obtained using the traditional multidimensional method. Revised Token Test scores obtained using the traditional multidimensional method were collected from 10 participants with aphasia. These scores were manipulated to yield 3 additional sets of scores corresponding to the alternative methods. There were no significant differences between the traditional multidimensional method and 1 of the correct/incorrect methods. Significant differences were found between traditional multidimensional scoring and each of the other 2 methods. The study findings suggest that simpler scoring systems might yield similar data to traditional multidimensional scoring. If simpler alternative methods yield similar results, using alternative scoring methods with published tests based on multidimensional scoring will help expand their use in everyday clinical practice.

Full text

Turn on search term navigation
 
Headnote

Purpose: Multidimensional scoring methods yield valuable information about communication abilities. However, issues of training demands for valid and reliable scoring, especially in current service delivery contexts, may preclude common usage. Alternatives to multidimensional scoring were investigated in a sample of adults with aphasia.

Method: One alternative method involved modified multidimensional scoring; the others incorporated correct/incorrect scoring. The scores for the 3 alternative methods were derived from the scores obtained using the traditional multidimensional method. Revised Token Test scores obtained using the traditional multidimensional method were collected from 10 participants with aphasia. These scores were manipulated to yield 3 additional sets of scores corresponding to the alternative methods.

Results: There were no significant differences between the traditional multidimensional method and 1 of the correct/incorrect methods. Significant differences were found between traditional multidimensional scoring and each of the other 2 methods.

Conclusions: The study findings suggest that simpler scoring systems might yield similar data to traditional multidimensional scoring. If simpler alternative methods yield similar results, using alternative scoring methods with published tests based on multidimensional scoring will help expand their use in everyday clinical practice.

Key Words: multidimensional scoring, Revised Token Test, assessment

The strategic selection of an appropriate scoring system for aphasia assessment depends on the intended use and means of interpretation of test results as well as a variety of training and service delivery issues. Aphasia tests have long employed a variety of systems to record responses to linguistic stimuli. Recording or scoring methods include detailed descriptions of behavior (Goldstein, 1948; Head, 1926 [cited in Goldstein. 1948]; Lahey, 1988; Murray & Chapey, 2001; Shadden, 1998; Weisenberg & McBride, 1935), correct versus incorrect scoring (Eisenson, 1954; Goodglass & Kaplan, 1983; Goodglass, Kaplan, & Barresi. 2001; Helm-Estabrooks, 1992: Kertesz. 1982; Schuell. 1965), rating scales (Butfield & Zangwill, 1946 [as cited in Porch, 1967]; Goodglass & Kaplan, 1983; Goodglass et al., 2001; Helm-Estabrooks, 1992; Schuell, 1965; Taylor, 1965; Vignolo, 1964; Wepman & Jones, 1961), and multidimensional scoring (McNeil & Prescott, 1978; Porch, 1967, 1981).

Porch (1967), in the Porch Index of Communicative Ability (PICA), incorporated the descriptive nature of recording responses into a multidimensional scoring system by defining every response in terms of five dimensions: accuracy, responsiveness, completeness, promptness, and efficiency. Specification of these dimensions was based on the results of studies indicating disturbances in one or more of these dimensions in individuals with aphasia (Porch, 1967). The PICA scoring system is an ordinal scale that consists of 16 numbered categories ordered from 1, indicating no response, to 16, indicating completion of a complex response that is correctly performed in terms of all five dimensions. McNeil and Prescott (1978) later used the five dimensions of the PICA to construct a 15-point multidimensional scale in the Revised Token Test (RTT), in which each of the 15 scores represents a different combination of response adequacy on all five dimensions.

Advantages of multidimensional scoring systems include providing useful information regarding responses, describing responses in detail within a standardized format, documenting effects of treatment and predicting recovery, and developing good observation skills in clinicians. These scales provide valuable descriptions of an individual's behaviors and information about the deficits underlying those behaviors (McNeil & Prescott, 1978; Porch, 1967, 1981). This helps clinicians analyze and score ambiguous responses without having to place them in strict binary categories (Porch, 1964). Also, it enables distinctions within correct or incorrect responses as every response is evaluated in terms of multiple dimensions. For example, a patient may succeed in performing a particular task on a test but do so with hesitation. Marking "+" for the task would potentially mask some problematic aspects of the performance. Also, a "-" mark would potentially mask some of the patient's intact abilities in performing isolated aspects of the task. Multidimensional scoring, on the other hand, may elucidate these various aspects of performance. For instance, a "sub-vocal rehearsal" of an auditory command, as defined by McNeil and Prescott (1978), may be a compensatory strategy employed by the individual to circumvent poor auditory memory. Although supplementing the binary scores with additional notes may be helpful, such descriptions are not reflected in correct/ incorrect scores.

Multidimensional scoring incorporates substantial information about the individual's responses using a standardized format in which every category in the scale has a comprehensive and precise definition (Porch, 1967). While other descriptive scoring methods yield abundant information, they lack standardization and are prone to scorer bias (Porch, 1967). Further, as Duffy and Dale (1977) illustrated, it is possible to analyze the resulting ordinallevel data statistically.

An additional advantage of multidimensional scoring is that it aids in documenting treatment effects because it is sensitive in detecting small amounts of change in patients' behaviors (McNeil & Prescott, 1978). A response may not be fully correct but still show improvement in terms of completeness, promptness, efficiency, accuracy, or responsiveness. Furthermore, the improvement observed in some dimensions has been demonstrated to be of prognostic value (Porch, 1967). For example, patients' ability to self-correct has been shown to be a significant predictor of improvement in communication abilities (Marshall, 2001; Marshall, Neuburger, & Phillips, 1994; McNeil & Prescott, 1978; Porch, 1967, 2001; Wepman, 1958).

A final benefit of multidimensional scoring approaches relates to clinical training. These approaches require the identification of discrete behaviors. Thus, the use of such scales helps clinicians develop diagnostic skills and the ability to evaluate stimulability and plan treatment for individuals with aphasia.

Despite the many advantages of multidimensional scoring, there are several disadvantages, including that it is cumbersome, requires training to achieve intra- and interscorer reliability, is time-consuming, and may not be efficient and cost-effective within many current service delivery contexts. Multidimensional scoring may place unwarranted pressure on the scorer as decisions about multiple aspects of a response have to be made simultaneously and quickly (Porch, 1967). For instance, in an auditory comprehension task in which the patient is to point to an object corresponding to a spoken word, a score of 13 (complete-delayed) would be assigned when the patient pauses prior to pointing to the correct object. In this case, the patient demonstrates accuracy, responsiveness, and completeness but lack of promptness. Such decisions are to be made almost immediately subsequent to each response and may become cumbersome for the scorer. Also, as the adequacy of every response is to be gauged with respect to many dimensions, there is a chance of confusion and varied interpretations within and between scorers, leading to poor intra- and interscorer reliability (Odekar & Hallowell, in press).

The time-consuming nature of multidimensional scoring is especially problematic in current clinical service delivery environments (Hayes & Pindzola, 1998). The PICA normally takes 2 to 3 hr to administer and score (Porch, 1967, 1981). Supplementary computerized scoring methods, such as the PICApad, have helped reduce the time required to develop profiles of patient performance and compile the scores for summary and interpretation (Matesich, Porch, & Katz, 1997). However, the software is not readily available to clinicians in all clinical settings because of the cost and the requirement for computer access in clinical environments. In recent years, there has been an increasing focus on reducing the cost of speechlanguage pathology services (Hallowell & Chapey, 2001; Henri & Hallowell, 1999, 2001; Katz et al., 2000; Simmons-Mackie, Threats, & Kagan, 2005) with reduced frequency and duration of sessions due to cuts in reimbursement for services. This trend has resulted in increasing pressure to deliver services that are efficient and cost-effective, so it is particularly relevant to consider whether the benefits of multidimensional scoring outweigh the costs in some contexts.

Multidimensional scoring systems require not only time in administration but time in training scorers to be reliable. For example, McNeil and Prescott (1978) advised 24 hr of training to ensure reliability in scoring the RTT. Many clinicians do not have the available time, financial backing, and access to such training (Hallowell & Clark, 2002). Computerized scoring methods may reduce the time to analyze data, but the initial evaluation still must be performed by the clinician (Matesich et al., 1997). Future advances in the development of a touch screen version of the RTT may be helpful in reducing some clinical judgments essential to valid and reliable multidimensional scoring (M. R. McNeil, personal communication, June 4, 2005); however, they will not address all aspects of the immediate and complex decisions that must be made by trained scorers during the assessment process, and the necessary technology is not available in many clinical environments (Hageman & Hallowell, 2003).

The aim of this study was to compare traditional multidimensional scoring (TMS) with three alternate scoring forms that would require less time in administration: one simpler form of multidimensional scoring and two forms of correct/incorrect scoring. Patterns of significant and nonsignificant differences among the scoring methods may suggest methods that could be substituted for the multidimensional scoring process.

Method

Participants

The participants were 10 adults with aphasia (see Table 1 ) ranging in age from 40 to 72 years (M = 54.4). The participants were from 1 to 16 years postonset (M = 6.17), and their scores on the Auditory Comprehension subtest of the Western Aphasia Battery (WAB; Kertesz, 1982) ranged from 1.65 to 9.80 (M = 7.22; see Table 2). The history of neurological etiology and site of lesion were confirmed through case histories and reports from neuroradiological investigations and neurological evaluation.

Each participant passed a vision screening composed of observation of eye symmetry, lesions, eye swelling, drainage, and screening tests for visual acuity, visual field and visual attention deficits, central and peripheral visual fields, color vision, and nystagmus. All participants had hearing thresholds of at least 25 dB HL or less for a hearing screening at pure-tone frequencies of 500 Hz, 1000 Hz, and 2000 Hz. Each participant served as his or her own control for the scoring system comparisons. None of the participants had any prior knowledge of or experience with the experimental test. All participants provided informed consent according to Institutional Review Board requirements.

Stimuli and Procedure

Each participant was tested individually over two sessions. In the first session, a detailed case history was taken and the WAB was administered. During the second session, within 6 days of the first, a vision and hearing screening and the RTT were administered. The RTT (McNeil & Prescott, 1978) was selected for the study because it is a well-known test of auditory comprehension in individuals with aphasia and has multiple unique and important features in addition to multidimensional scoring. A modified version of the RTT (McNeil & Prescott, 1978) consisting of 5 commands in each of the 10 subtests was used because this shortened form of the test is more likely to be used in clinical practice owing to its shorter administration time compared with the standard 10command version. This shortened version of the original RTT has been demonstrated to predict accurately the standard mean overall score and subtest scores of the test, with the exception of Subtest 9 (Arvedson, McNeil, & West, 1985; Hallowell, Wertz, & Kruse, 2002; Park, McNeil, & Tompkins, 1999).

The materials for the test include tokens that are red, blue, green, white, or black. Each of these tokens is either a square or circle, in one of two sizes: big or small. Hence each token is characterized by three attributes: color, shape, and size. The commands necessitate touching or manipulating the tokens in various ways. An example of a command is "Touch the big green circle." An example of a command involving manipulation of tokens is "Put the big green square to the right of the little black square." Each of the underlined words constitutes an element of the command. Every element of the command is scored on a 15-point categorical scale (see the Appendix for a description of the 15 scoring categories). The 10 subtests differ from one another in terms of command length and sentence type. The complexity of sentence type increases from Subtest 1 through Subtest 10. Table 3 provides examples of stimulus commands from each of the subtests to illustrate increasing linguistic complexity across different subtests in RTT.

The data were scored using four methods. These were (a) TMS applied according to the published RTT instructions given by McNeil and Prescott (1978), (b) multidimensional scoring for the overall command (MSO), (c) binary scaling for each element of the command (BSE), and (d) binary scaling for each overall command (BSO). Brief descriptions of scoring methods and examples are provided in Table 4.

In the BSE method, each element was scored as either a 0 (incorrect) or a 1 (correct). All responses corresponding to ratings of 1-7 and 10 in the traditional scale were scored as incorrect. All corresponding to ratings of 8, 9, and 11-15 in the traditional scale were also scored as correct. A "reversal" during performance, which is given a score of 10 on the traditional multidimensional scale, was scored as incorrect. A need for command "repetition" and command "cuing," which is given a score of 9 and 8, respectively, on the traditional scale, was considered to be correct, because responses requiring repetition on other standardized aphasia tests such as the WAB are not scored as incorrect.

For BSO, the response for the entire command was scored O (incorrect) even if only one element of the command was incorrect. Conversely, the response for the overall command was scored 1 (correct) if all parts of the command were performed correctly.

The RTT session with each participant was videotaped. The three new sets of scores assigned using methods MSO, BSE. and BSO were derived from the traditional multidimensional scores, based on the rules for derivation specified under each of the three simpler scoring methods, and were not obtained by rescoring the participants' performances separately at four different times. This was done strategically to avoid introducing potential confounding effects of human error resulting in scoring discrepancies that are not the object of study.

To compare the traditional multidimensional scores and the multidimensional scores for the overall command ranging from 1 to 15, with the binary scoring methods (BSE and BSO) ranging from O to 1, each multidimensional score (X) was converted into an equivalent 0-to-l-ranging score by applying the formula (X - 1)/14. Given this transformation of data for the first two scoring methods, the data for each of the four scoring methods were based on scores ranging from O to 1.

To ensure reliable scoring according to the published instructions provided for the first scoring method (multidimensional scoring for each element of the command), two different scorers scored each individual's RTT performance. The two scorers were speech-language pathology graduate students who received training in scoring the RTT. The training consisted of (a) repeated viewing of two RTT training videotapes prepared by Hageman (2001a, 2001b) and (b) scoring practice on participants' videotaped performances for which reliable scores had been obtained by experienced scorers previously. The training for the two scorers was determined to be complete when the overall percentage agreement was 85% or greater between their scores and scores assigned by past experienced scorers for at least 10 individuals previously tested on the RTT.

View Image - TABLE 1. Participant description.

TABLE 1. Participant description.

View Image - TABLE 2. Scores on linguistic subtests of the Western Aphasia Battery.

TABLE 2. Scores on linguistic subtests of the Western Aphasia Battery.

The two scorers independently scored every element of every command of each participant's performance separately. The first rater administered and scored the RTT live, and the second rater independently scored the videotaped administration of the RTT. This method of contrasting live with repeat videotape scoring is consistent with methods typically used in published research on the assessment of inter- and intrarater scoring (e.g., Gordon, Tancredi, Binder, Wilkerson, & Shaffer, 2003; Odekar & Hallowell, 2004: Park et al.. 1999).

For every participant, the total number of elements for which the same scores were assigned by the two scorers was divided by the total number of elements in the entire test to obtain the scoring agreement for each participant. These values were averaged across participants to obtain overall percentage agreement values. Initial agreement by independent scorers was 82.01%. Scorers noted and tracked reasons for each initial interscorer disagreement. The interscorer discrepancies were the result of differences in interpreting responses as delayed and an occasional failure of one of the scorers to recognize errors, perseveration, and self-correction responses. The interscorer agreement varied considerably from participant to participant, with poorer interscorer reliability for ratings of participants who made errors. An agreement of 100% was obtained after video analysis and discussion by both scorers.

View Image - TABLE 3. Description of stimulus items from Revised Token Test subtests.

TABLE 3. Description of stimulus items from Revised Token Test subtests.

After deriving the raw scores for the three other scoring methods from the traditional multidimensional method, the command scores for each of the scoring methods were calculated by averaging the scores on all of the elements for each command. For the BSO method, the command scores were either 1 or 0. All of the command scores were averaged across each subtest to obtain scores for every subtest. Additionally, these subtest scores were averaged to yield the overall test score, consistent with score derivation methods specified for the RTT. The data were compared at the subtest level in addition to the level of the overall test score in light of the differences in command difficulty among the subtests. Repeated measures analysis of variance (ANOVA) and follow-up tests of significant contrasts were performed for the overall RTT scores and for each of the subtests of the RTT among the four different scoring methods.

Results

The scores obtained using the traditional multidimensional method for each participant and the corresponding percentiles according to RTT norms are given in Table 5. The means and standard deviations for RTT scores obtained through each of the four scoring methods are provided in Table 6. The results of repeated measures ANOVA and follow-up tests of significant contrasts among the four scoring methods are presented in Table 7.

View Image - TABLE 4. Description and illustration of scoring methods.

TABLE 4. Description and illustration of scoring methods.

Significant differences were found among the scoring methods with respect to the overall scores and all the subtests except Subtest 1. All follow-up tests (paired t tests) were conducted at an alpha level of .017, determined by Bonferroni adjustment for three follow-up contrasts. Significant differences were found between the scores obtained from TMS and MSO and between the TMS and BSO in terms of the overall scores and majority of the subtests. No significant difference between the scores obtained through the TMS and the BME was found for any of the subtests or the overall test scores. The statistical power to detect a significance at α = .05 for a medium value of F for the ANOVA tests was 78%. The power for paired follow-up contrasts was 76%.

Discussion

The study findings yielded important insights regarding the potential for using simpler scoring methods in place of TMS. First, it was surprising that there were significant differences among TMS and two of the three other methods (MSO and BSO), considering the inherent redundancy among the scoring methods. It may be that fundamental decision rules in the RTT and corresponding numeric scores influenced this result. The decision of assigning a 0 or 1 to an element or a command was derived from the rules for TMS. However, in TMS, repetitions and cues (still a correct response) yield a lower score than when the patient reverses elements of a command (an incorrect response). The need for repetitions and cues, within the parameters of the traditional RTT test administration instructions, was not counted as errors in the binary methods of scoring. This may be a reason as to why the binary methods differed significantly from the multidimensional methods for most subtests.

View Image - TABLE 5. Revised Token Test (RTT) subtest and overall scores.

TABLE 5. Revised Token Test (RTT) subtest and overall scores.

The direction of differences was such that the two methods involving element-by-element scoring (TMS and BSE) led to higher scores than the methods involving scoring of overall commands. It is questionable whether this is a clinically important finding, given that lower scores do not necessarily indicate more sensitivity than higher scores as long as scores tend to be below ceiling. There is no evidence to suggest that ceiling effects are likely with multidimensional scoring. Two key points indicate the contrary: (a) Ceiling effects are common in adults with normal language for many published aphasia tests using correct/incorrect scoring, and (b) even adults with normal language do not tend to score at ceiling for the RTT overall or for several of the RTT subtests, as is evident in the published RTT norms. It is likely that the scores for the MSO and the BSO are lower because certain subtle positive aspects of individuals' responses have not been given credit.

View Image - TABLE 6. Means and standard deviations for RTT subtest and overall scores.

TABLE 6. Means and standard deviations for RTT subtest and overall scores.

The fact that no differences were found among the scoring methods in Subtest 1 may be attributed to the short length of the commands, such as the command "touch the black circle." each having only three elements. At such a simplistic linguistic level, there is less chance of finding a wide variety of erroneous scores that would otherwise be captured with a multidimensional scoring system.

The absence of significant differences between the scores obtained by TMS and BSE for each subtest as well as for the overall test scores was not due to a lack of power in detecting a difference. The lack of a significant difference in these two methods may be explained by looking at the scoring rules for the two methods. In the case of the plus/minus method described here, every element within a command receives a score of 0 or 1. Therefore, the errors and deviations occurring in every element of a command are recognized in a manner similar to that done by the traditional scoring method, the only difference being that, in the traditional system of scoring, each element receives a graded score ranging from 1 to 15. depending on which type of error is demonstrated. This difference may not be a significant factor at the subtest level of analysis as long as all the elements in each command are considered and scored individually for errors. This is also reflected in the lack of a significant difference between these two methods in terms of the overall scores. In the case of the MSO and the BSO, the lack of consideration of individual elements of a command may compromise the details of scoring enough to create a significant difference from the traditional method of scoring at the subtest level.

View Image - TABLE 7. Analysis of variance (ANOVA) and follow-up contrasts.

TABLE 7. Analysis of variance (ANOVA) and follow-up contrasts.

The study findings suggest that simpler, less timeintensive scoring systems might yield equivalent data to TMS. There is, however, a need for further research in this area using larger participant groups to explore factors such as aphasia severity that might influence scale equivalence. It is possible that severity of comprehension deficits influences patterns of differences and similarities among scoring methods. Patients with severe comprehension deficits may be more likely to score lower on both multidimensional scoring and correct versus incorrect scoring. However, patients with mild comprehension problems may be more likely to demonstrate a discrepancy between correct versus incorrect as compared with multidimensional scoring. They may be likely to score higher when assessed with correct versus incorrect scoring criteria than with multidimensional scoring, given that the latter is designed to capture subtle deficits within an individual's performance. A larger number of participants with aphasia representing varying severity levels would be necessary for further exploration of this possibility.

Developments in computerized test administration and scoring also hold promise for increasing efficiency of assessment. Given that demands of time and financial resources for clinician training required for reliable and valid multidimensional scoring are increasingly difficult to meet in current service delivery contexts, and given that tests such as the RTT yield valuable and distinct information regardless of scoring methods, it is important to continue exploring the possibilities of alternative scoring systems.

Acknowledgments

This work was supported in part by National Institute on Deafness and Other Communication Disorders Grant DC00153-01A1 and in part by an Ohio University Graduate Research Fellowship. The authors wish to thank Dr. Malcolm McNeil for extensive editorial assistance on this article and consultation on the shortened form of the RTT; Natalie Douglas, Melissa Elliott, and Sabine Heuer for their help with data collection; and Dr. Carlin Hageman for assistance with training in RTT scoring. The authors would also like to extend their gratitude to Dr. Sunny Kim and Dr. Robert Roe for statistical consultation and to Dr. Richard Dean and Dr. Sally Marinellie for guidance regarding experimental design.

References

References

Arvedson, J. C., McNeil, M. R., & West, T. L. (1985). Prediction of Revised Token Test overall, subtest and linguistic unit scores by two shortened versions. Clinical Aphasiology, 15, 57-63.

Duffy, J. R., & Dale, B. J. (1977). The PICA scoring scale: Do its statistical shortcomings cause clinical problems? In R. H. Brookshire (Ed.), Clinical Aphasiology Conference proceedings (pp. 290-296). Minneapolis, MN: BRK.

Eisenson, J. (1954). Examining for aphasia. New York: The Psychological Corporation.

Goldstein, K. (1948). Language and language disturbances. New York: Grune and Stratton.

Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders (2nd éd.). Philadelphia: Lippincott Williams & Wilkins.

Goodglass, H., Kaplan, E., & Barresi, B. (2001). Boston Diagnostic Aphasia Examination-Third Edition. Philadelphia: Lippincott Williams & Wilkins.

Gordon, J. A., Tancredi, D. N., Binder, W. D., Wilkerson, W. M., & Shaffer, D. W. (2003). Assessment of a clinical performance evaluation tool for use in a simulator-based testing environment: A pilot study. Academic Medicine, 78, S45-S47.

Hageman, C. (Producer). (200Ia). RTT training tapes: #1 [Videotape]. (Available from School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701)

Hageman, C. (Producer). (200Ib). RTT training tapes: #2 [Videotape]. (Available from School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701)

Hageman, C., & Hallowell, B. (2003). Specifications and design considerations for a touchscreen version of the Revised Token Test. Athens: Ohio University, National Science Foundation Projects to Aid Persons With Disabilities.

Hallowell, B., & Chapey, R. (2001). Delivering language intervention services to adults with neurogenic communication disorders. In R. Chapey (Ed.), Language intervention strategies in aphasia and related neurogenic communication disorders (pp. 173-193). Philadelphia: Lippincott Williams & Wilkins.

Hallowell, B., & Clark, H. (2002, May). Dysphagia is taking over: Priorities for aphasia under managed care. Paper presented at the Clinical Aphasiology Conference, Ridgedale, MO.

Hallowell, B., Wertz, R. T., & Kruse, H. (2002). Using eye movement responses to index comprehension: An adaptation of the Revised Token Test. Aphasiology, 16, 587-594.

Hayes, W. O., & Pindzola, R. H. (1998). Diagnosis and evaluation in speech pathology. Boston: Allyn and Bacon.

Helm-Estabrooks, N. ( 1992). Aphasia diagnostic profiles. Austin, TX: Pro-Ed.

Henri, B., & Hallowell, B. (1999). Mastering managed care: Problems and possibilities. In B. S. Cornett (Ed.), Clinical practice management for speech-language pathologists (pp. 3-28). Gaithersburg, MD: Aspen.

Henri, B., & Hallowell, B. (2001). Improving access to speechlanguage pathology and audiology services. In R. Lubinski & S. Frattali (Eds.), Professional issues in speech-language pathology (2nd ed., pp. 337-357). San Diego, CA: Singular.

Katz, R. C., Hallowell, B., Code, C., Armstrong, E., Roberts, P., Pound, C., & Katz, L. (2000). A multinational comparison of aphasia management practices. International Journal of Language and Communication Disorders, 35(2), 303-314.

Kertesz, A. (1982). Western Aphasia Battery. Austin, TX: Pro-Ed.

Lahey, M. (1988). Language disorders and language development. New York: Macmillan.

Marshall, R. C. (2001). Management of Wernicke's aphasia: A context-based approach. In R. Chapey (Ed.), Language intervention strategies in aphasia and related neurogenic communication disorders (pp. 435-456). Philadelphia: Lippincott Williams & Wilkins.

Marshall, R. C., Neuburger, S. L, & Phillips, D. S. (1994). Verbal self correction and improvement by treated aphasie clients. Aphasiology, 8, 535-547.

Matesich, J., Porch, B. E., & Katz, R. (1997). PICApad for PC [Computer software]. Scottsdale, AZ: Sunset Software.

McNeil, M. R., & Prescott, T. E. (1978). Revised Token Test. Austin, TX: Pro-Ed.

Murray, L., & Chapey, R. (2001). Assessment of language disorders in aphasia. In R. Chapey (Ed.), Language intervention strategies in adult aphasia (pp. 55-126). Baltimore: Lippincott Williams & Wilkins.

Odekar, A., & Hallowell, B. (2004, June). Considering alternatives to multidimensional scoring in language comprehension assessment. Paper presented at the Clinical Aphasiology Conference, Park City, UT.

Odekar, A., & Hallowed, B. (in press). Exploring inter-rater agreement in scoring of the Revised Token Test. Journal of Medical Speech-Language Pathology.

Park, G., McNeil, M. R., & Tompkins, C. (1999, June). Reliability of the five-item Revised Token Test for individuals with aphasia. Paper presented at the Clinical Aphasiology Conference, Key West, FL.

Porch, B. E. (1967). Porch Index of Communicative Ability. PaIo Alto, CA: Consulting Psychological Press.

Porch, B. E. (1981). Porch Index of Communicative Ability, Third Edition. Palo Alto, CA: Consulting Psychological Press.

Porch, B. E. (2001). Treatment of aphasia subsequent to the Porch Index of Communicative Ability. In R. Chapey (Ed.), Language intervention strategies in aphasia and related neurogenic communication disorders (pp. 663-674). Philadelphia: Lippincott Williams & Wilkins.

Schuell, H. (1965). The Minnesota test for differential diagnosis of aphasia. Minneapolis: University of Minnesota Press.

Shadden, B. (1998). Information analysis. In L. R. Cherney, B. Shadden, & C. A. Coelho (Eds.), Analyzing discourse in communicatively impaired adults (pp. 85-114). Gaithersburg, MD: Aspen.

Simmons-Mackie, N., Threats, T. T., & Kagan, A. (2005). Outcome assessment in aphasia: A survey. Journal of Communication Disorders, 38(1), 1-27.

Taylor, M. (1965). A measurement of functional communication in aphasia. Archives of Physical Medicine and Rehabilitation, 46, 101-107.

Vignolo, L. S. (1964). Evolution of aphasia and language rehabilitation. Cortex, 1, 344-367.

Weisenberg, T., & McBride, K. E. (1935). Aphasia: A clinical and psychological study. New York: Commonwealth Fund.

Wepman, J. (1958). The relationship between self-correction and recovery from aphasia. Journal of Speech and Hearing Disorders, 23, 302-305.

Wepman, J. M., & Jones, L. V. (1961). The language modalities test for aphasia. Chicago: University of Chicago, The Industrial Relations Center.

AuthorAffiliation

Anshula Odekar

Brooke Hallowell

Ohio University, Athens

AuthorAffiliation

Received February 28, 2005

Revision received June 9, 2005

Accepted September 7, 2005

DOI: 10.1044/1058-0360(2005/032)

Contact author: Anshula Odekar, Ohio University, School of Hearing, Speech and Language Sciences, Athens, OH 45701. E-mail: [email protected]

View Image - AppendixCategorical Scale Used in the Revised Token Test (McNeil & Prescott, 1978)

AppendixCategorical Scale Used in the Revised Token Test (McNeil & Prescott, 1978)

Copyright American Speech-Language-Hearing Association Nov 2005