Content area
Considering the importance of graphical literacy in academic contexts together with the crucial role of diagnostic feedback in guiding and facilitating teaching and learning practices, this study was aimed at developing a descriptor-based checklist to diagnose test takers’ performance on IELTS graph-based integrated writing tasks. Think-aloud verbal protocols of six IELTS instructors during rating and six instructors during writing IELTS task 1 reports informed the checklist’s development. The checklist descriptors were validated by different groups of experts and cross-checked against the literature and existing checklists. The checklist’s capability to provide reliable and diagnostic information was examined by analyzing score consistency across raters, and the correlation between the binary and multilevel scores awarded using the developed checklist and the IELTS rubric. The findings highlight both task-specific writing skills unique to graph-based integrated tasks and more general writing skills common across different integrated writing tasks. Implications for teachers, learners, and professional development programs are discussed.
Introduction
Feedback serves as a formative assessment tool that promotes students’ second language writing performance and thus is an essential component of writing classroom assessment (Hyland & Hyland, 2019; I. Lee, 2017; I. Lee & Mao, 2024). Shohamy (1992) believes that assessment information should be relevant, detailed, and diagnostic. Diagnostic feedback includes a description of test takers’ strengths and weaknesses by skill and subcomponent (Doe, 2015), which helps learners notice the gap between their current knowledge level and the desired performance level or goal (Jang & Wagner, 2013).
Analytic rubrics can be utilized to guide learners toward improvement by allowing for detailed diagnostic feedback (Brookhart, 2013). In contrast to a holistic rubric, which only provides test takers with a single score, analytic rubrics break down the scale into different aspects of students’ compositions such as grammatical range and accuracy, organization, cohesion and coherence, and lexical resources. Therefore, analytic scales can be used to provide test takers with feedback on different dimensions of their writing. Although analytic rubrics are more appropriate than holistic rubrics for providing diagnostic feedback (Knoch, 2011), they typically fail to provide test takers with detailed information that could guide their actions (Author, 2023). Moreover, although over the last 30 years, there has been a movement toward data-driven rubrics, many of the rubrics are still designed based on the intuitive judgments of the experts (Author, 2023; Knoch, 2009; Knoch et al., 2021). As such, they present a subjective or even unfair assessment of test takers’ performances. Considering the limitations of intuitive-based rubrics, some researchers (e.g., Knoch, 2007, 2011; Turner & Upshur, 2002) have highlighted the importance of using empirical data to develop and validate rating scales. Consequently, to address the shortcomings of theoretically determined rubrics for delivering diagnostic feedback, researchers (e.g., Kim, 2011; Luoma, 2004; Author, 2023) have proposed the use of an empirically designed checklist to assess and diagnose language skills. A checklist breaks down the performance into a detailed list of descriptors that can be used to evaluate the extent to which test takers exhibit different skills and knowledge in their performance (Author, 2023).
A scarce number of studies have utilized rating checklists to offer detailed diagnostic feedback and make cognitive diagnostic modeling applicable to writing data in general (e.g., Kim, 2011; Shi et al., 2024) and integrated writing in particular (e.g., Author, 2023). Integrated tasks require test takers to write a composition based on a provided source material (e.g., reading texts, listening tapes, graphs). Given that academic writing often requires the use of source materials, the inclusion of integrated writing tasks in high-stakes tests such as IELTS and TOEFL can be justified (Uludag et al., 2019). Adopting integrated writing tasks as a part of large-scale proficiency tests improves the validity and authenticity of the mentioned tests, as it provides a more comprehensive measure of L2 test takers’ writing ability (Cumming, 2013; Grabe & Zhang, 2013).
As elucidated by some studies (e.g., Lee, 2015; Author, 2023; Sawaki, 2007), offering comprehensive diagnostic feedback on language learners’ performance in integrated writing tasks can help them develop the writing skills essential for academic success. Furthermore, providing such feedback aligns with recent developments in process-oriented approaches to writing instruction (Author, 2023).
However, as integrated writing involves comprehension and application of source materials, offering diagnostic feedback on test takers’ performance in these tasks presents a set of challenges (Plakans & Gebril, 2013; Author, 2023; Sawaki et al., 2013). This complexity arises because successful performance on such tasks requires the simultaneous use of multiple language abilities. Consequently, diagnosing integrated writing tasks is far more complex than diagnosing isolated language skills. Unlike other integrated writing tasks used in high-stakes tests such as TOEFL, which typically involve the integration of reading and listening skills, IELTS writing task 1 demands a different type of integration that includes graphical literacy in addition to language skills (Ahmadi & Mansoordehghan, 2015; Golparvar & Abolhasani, 2022; Yang, 2012). Providing diagnostic feedback on these tasks presents unique challenges due to the additional dimension of graphical literacy.
In a recent study, Author (2023) developed a descriptor-based checklist to provide detailed diagnostic feedback on TOEFL integrated writing tasks, which require test takers to synthesize information from both a reading and a listening passage and write a text (Cumming et al., 2005). However, the IELTS integrated writing task requires test takers to look at visuals (e.g., diagrams, graphs, tables, charts) and report the key ideas in their own words. The growing use of graphs in journal articles, textbooks, and various types of media, along with technological advancements, has led to an increased reliance on graphs for information representation (Healy, 2024; Okan et al., 2012; Pandey et al., 2014; Zacks et al., 2002). Therefore, the ability to analyze and interpret graphs is essential for academic success (Hyland & Rodrigo, 2007).
Despite the importance of graph-based writing in academic assessment, this skill has received less attention compared to other types of integrated writing (Golparvar & Abolhasani, 2022). Additionally, the available IELTS writing task 1 rubric provides test takers and decision-makers with a general evaluation across broad categories such as task achievement, grammatical range and accuracy, lexical resource, and coherence and cohesion. The primary purpose of this rubric is to assess overall language proficiency, particularly for admission, certification, or immigration purposes rather than to offer detailed diagnostic feedback. As a result, while test takers receive overall band scores based on broad performance categories, they are often left without sufficient insight into their specific strengths and areas for improvement.
Considering the crucial role of diagnostic feedback in guiding and facilitating teaching and learning practices, besides the limitations of the available IELTS rubric for providing specific diagnostic feedback, this study developed a descriptor-based diagnostic checklist for the IELTS graph-based integrated writing task. This checklist is designed to complement existing assessment tools by providing targeted diagnostic information to guide both instruction and learner progress. The developed checklist aims to offer teachers, learners, and test developers more actionable, detailed feedback to support targeted instruction, learning, and assessment practices.
An equally important factor that influenced the development of the checklist was the role of graphical literacy in completing these tasks. Graphical literacy is defined as the ability to read and interpret charts, graphs, and other pictorial presentations within nonfiction texts (Coleman et al., 2012). This definition is consistent with how graphical literacy has been conceptualized in previous research, particularly within the field of academic reading and writing (Coleman et al., 2012; Shah & Hoeffner, 2002). However, other studies, especially in mathematics and science education, have taken a broader view of graphical literacy, focusing on quantitative reasoning and data interpretation for problem-solving purposes (Friel et al., 2001). The present study adopts the academic-context definition but narrows its focus to the specific demands of IELTS integrated writing tasks, where interpreting graphs is combined with academic language use.
Literature review
Integrated writing
Compared with independent writing tasks, integrated tasks are more challenging for test takers as they require not only test takers’ prior knowledge but also their capacity to integrate information from different sources to compose a coherent text (Cumming et al., 2018; Nelson & King, 2023). While independent writing tasks are capable of providing some information about the test-takers’ writing ability (Kyle, 2020), such writing tasks are often criticized for not accurately reflecting the demands of academic settings; as a result, their construct validity is under question (Cumming et al., 2005; Weigle, 2004). To overcome the limitations of the independent writing tasks, integrated writing tasks have been increasingly employed to assess test takers’ academic language ability (Plakans et al., 2019).
Integrated writing tasks are considered more aligned with real-world academic challenges, as they require test takers to synthesize and respond to information from sources like a reading passage, a listening passage, and a diagram (Chapelle et al., 2011; Cumming et al., 2005). Therefore, to increase authenticity, integrated writing tasks have appeared more frequently in language tests (Mitchell, 2017). Integrated writing prompts such as those included in the TOEFL test require test takers to integrate the linguistic information from source materials (e.g., reading and listening) and write a text. However, in tests such as GEPT and IELTS, visual information should be interpreted and reported. Therefore, the successful completion of integrated reading-listening-writing tasks (i.e., TOEFL-integrated writing task) depends only on language skills, whereas the graph-based integrated tasks (i.e., IELTS-integrated writing task) need mastery of both language skills and non-linguistic skills (graphical literacy). The challenges faced by test takers while sitting for the graph-based writing as well as the crucial role of such tasks in academic settings call attention to the provision of diagnostic feedback in such tasks. Diagnostic feedback by drawing test takers’ attention to their strengths and weaknesses advances their learning and strategy development (Jang & Wagner, 2013), thus paving the way for their academic success.
Diagnostic checklists
In the writing assessment context, rubrics are vital for diagnosing performances and providing feedback (Author, 2023). Among them, analytic rubrics by focusing on different aspects of writing offer more benefits for diagnostic purposes compared with holistic ones (Jamieson & Poonpon, 2013; Knoch, 2009, 2011; Lestari & Brunfaut, 2023; Sawaki et al., 2013). However, not all analytic rubrics are equally effective for diagnostic purposes. Using impressionist or vague terminology on descriptors makes many rubrics inappropriate for diagnosing purposes (Knoch, 2009). Moreover, the rubric descriptors may represent a combination of different features that may not co-occur, which again is inappropriate for diagnosing purposes (Knoch, 2009; Turner & Upshur, 2002). Furthermore, many rubrics are criticized for their reliance on experts’ intuition and judgment in developing the rubric and for their lack of empirical support (e.g., Kim, 2010; Knoch, 2011; Author, 2023). Such intuition-based rubrics have several drawbacks. They can be highly subjective, thus leading to unfair assessment. Besides, without empirical backing, important language skills and constructs might be overlooked (Alderson et al., 1995), leading to a partial or incomplete assessment of test takers’ abilities.
To overcome the limitations of the available rubrics for diagnosing test takers’ strengths and weaknesses, some researchers (e.g., He et al., 2021; Kim, 2010; Lukácsi, 2021; Author, 2023; Shi et al., 2024) have used empirical evidence to delve into the capacity of analytic checklists for diagnosing test taker’s writing performance. In one of the earliest attempts, Kim (2010) used teachers’ verbal protocols while assessing writing samples to develop and validate a descriptor-based diagnostic checklist for diagnosing test takers’ performances on independent writing tasks. Her checklist was composed of 35 binary-choice items. In another study, Lukácsi (2021) constructed a checklist with 35 items for assessing L2 writing by integrating insights from existing literature and the research team’s perspectives regarding the characteristics of a successful essay. Author 2023 also developed a diagnostic checklist comprising 30 items by collecting and analyzing challenges faced by test takers while completing a TOEFL reading-listening-writing integrated task. One potential advantage of such diagnostic checklists is that, by offering a list of fine-grained descriptors rather than collapsing performance into a few broad categories, they can provide test takers with detailed feedback on specific aspects of their performance (Safari, 2023). As pinpointed by Safari (2023), another notable benefit of such checklists is that they can serve as a foundation for implementing cognitive diagnostic assessment (CDA). CDA is an approach that provides detailed feedback on learners’ strengths and weaknesses, helping to guide both instruction and learning (Ravand & Robitzsch, 2015). In all of these studies, a binary-choice checklist was developed. When using such checklists, the assessor should decide whether the criteria mentioned by the checklist descriptors are met or not. Developing a binary-choice checklist is a prerequisite for running a dichotomous CDA. Dichotomous CDA classifies test takers based on their profile on dichotomized status on the latent skill into master and non-master groups (Akaeze, 2020). As elaborated by Akaeze (2020), classifying latent attribute status into mastery and non-mastery not only leads to loss of information but also ignores the fact that learning is progressive and test takers in the same group (i.e., master and non-master) may process the skill to a varying degree. However, in one of the most recent studies by Shi et al. (2024) in an attempt to run a polytomous CDA, a three-level descriptor-based checklist was developed to diagnose EFL learners’ writing performance. polytomous CDA, instead of taking into account only two extremes (i.e., mastery and non-mastery), helps to create a middle ground between two extremes (Akaeze, 2020).
The above-discussed studies not only have enlarged our understanding of the potential usefulness of the descriptor-based checklists for diagnosing test takers’ writing performance but also have laid the foundation for the application of CDA to performance-based assessment. However, the literature still suffers from a paucity of research on the assessment of different types of integrated writing tasks. As can be inferred from the given literature, the existing literature on integrated writing performance does not fully cover different aspects of integrated writing tasks. While some studies have focused on reading-to-write integrated tasks, there is a notable gap in studies specifically exploring graph-based integrated tasks used in high-stakes tests such as IELTS. These tasks require not only language skills but also the ability to interpret and integrate graphical data, which adds an additional layer of difficulty to integrated writing. As such, the current investigation was set to fill this gap. Given the significance of graphical literacy in academic writing and the crucial role of diagnostic feedback to improve teaching and learning practices, the study was specifically set to design a descriptor-based diagnostic checklist for the graph-based integrated writing task of IELTS. Unlike previous studies that developed a binary-choice descriptor-based checklist, this study was aimed at designing both a binary-choice checklist that can be used to run a dichotomous CDA and a multilevel checklist that paves the way for conducting a polytomous CDA. To this end, the following research questions were put forward:
◦ What empirically derived diagnostic descriptors are relevant to the construct of IELTS-integrated writing?
◦ How consistent are the scores derived from the descriptor-based diagnostic checklist across raters?
◦ To what extent do the scores derived from the diagnostic checklist correlate with the scores derived from the IELTS rubric?
Design
The study used a mixed-methods design in which both qualitative and quantitative methods were involved to collect and analyze data. The research was conducted in two phases. Phase I involved qualitative methods to identify descriptors and develop a descriptor-based diagnostic checklist, whereas phase II employed quantitative techniques to evaluate the developed checklist. Table 1 provides an overview of the research design.
Table 1. Overview of the research design
Research qestions | Participants | Instrument/Data | Procedure/analysis | |
|---|---|---|---|---|
Phase I. escriptor identification and checklist development | RQ1 Identification of the descriptors of the checklist | 12 experienced L2 instructors 2 EFL university instructors 4 IELTS examiners | 20 IELTS task reports 2 IELTS writing prompts | Step 1: Think-aloud verbal protocol of 6 teacher-raters and 6 teacher-writers was used to come up with descriptors of the descriptor-based checklist Step 2: The descriptors were checked by 2 university instructors Step 3: 4 IELTS examiners besides 3 experienced IELTS instructors checked and confirmed the relevance of the descriptors |
Phase 2. Evaluating the developed checklist | RQ2 Score consistency across raters | 5 experienced IELTS instructors | 10 IELTS task 1 reports 2 IELTS writing prompts | Step 1: 5 IELTS instructors rated 10 IELTS reports using the developed checklist both dichotomously and polytomously Step 2: Fleiss’ Kappa for both sets of scores (dichotomous and polytomous) were estimated |
RQ3 Correlation between IELTS and checklist scores | 2 experienced IELTS instructors | 100 IELTS task 1 reports | Step 1: 2 experienced IELTS instructors rated 100 IELTS task 1 reports using the developed diagnostic checklist Step 2: The same 2 raters rated the same 100 reports using the IELTS scoring rubric Step 3: A correlation analysis was performed |
Phase I: Descriptor identification and checklist development
Participants
IELTS instructors: A purposive sample of 12 experienced EFL teachers served as the participants of the study to develop a descriptor-based diagnostic checklist. As Ary et al. (2018) explain, purposive sampling involves selecting participants who possess specific characteristics relevant to the study. In the present study, the selection of the participants was based on their extensive background in teaching English, ensuring that all had at least 5 years of language teaching experience in general and 2 years of IELTS teaching experience in particular. All participants were Iranian, with 11 having Persian as their first language, whereas one participant’s first language was Kurdish. The demographic, educational, and teaching backgrounds of the EFL teachers are presented in Table 2.
Table 2. The EFL Teachers’ Background
Age | Gender | First language | Education | Years of L2 teaching experience | Years of IELTS teaching experience | |
|---|---|---|---|---|---|---|
Teacher-raters | ||||||
1 (MM) | 32 | Female | Persian | B.A. in English Literature | 13 | 7 |
2 (FT) | 34 | Female | Persian | PhD in TEFL | 13 | 7 |
3 (B) | 36 | Male | Persian | B.A. in TEFL | 11 | 2 |
r4 (EJ) | 30 | Male | Persian | M.A. in TEFL | 12 | 8 |
5 (HF) | 30 | Male | Persian | M.A. in TEFL | 10 | 4 |
6 (MR) | 32 | Male | Persian | PhD Candidate | 15 | 11 |
Teacher-writers | ||||||
1 (FS) | 36 | Male | Kurdish | MA in TEFL | 15 | 5 |
2 (OQ) | 32 | Male | Persian | BA in English translation | 7 | 3 |
3 (SM) | 31 | Male | Persian | Software engineering | - | 13 |
4 (RB) | Male | Persian | PhD in TEFL | 20 | 11 | |
5 (KP) | 30 | Female | Persian | Law | 5 | 2 |
6 (SA) | 32 | Female | Persian | MA in language teaching | 10 | 2 |
TEFL teaching English as a Foreign language
Assessment experts: Two experienced university professors who had extensive research experience in the field of language testing and L2 writing checked the descriptors that emerged from think-aloud verbal protocols of experienced IELTS instructors and raters to come up with clear, non-redundant descriptors. The university professors were two male Iranians, and their first language was Persian.
Content experts: To further refine the descriptor-based diagnostic checklist’s descriptors, a purposive convenience sample of four experienced IELTS examiners (three males and one female examiner) was recruited. These individuals were specifically sought out based on their extensive experience and expertise in IELTS instruction and assessment (purposive sampling), but they were also recruited based on their accessibility and willingness to participate (convenience sampling). They were all certified IELTS examiners who had taught IELTS to non-native speakers for more than 15 years. They reviewed the checklist descriptors and expressed their concerns and suggestions.
All the participants took part in the study voluntarily and signed informed consent forms before their involvement in the study. Furthermore, they were assured that their data would only be used for research purposes, that they would remain anonymous, and that they could withdraw from the study at any time.
Instruments and materials
IELTS writing tasks 1: Two different tasks from Cambridge books publishing authentic IELTS tests were used in the present study (Appendix A: Fig. 1). The first task included a single bar graph, whereas the second one was composed of two bar charts. Bar graphs were selected because the most common graph-based task faced by test takers is a non-process task in the form of bar graphs and multiple-diagram tasks (Freimuth, 2016). Moreover, analysis of a sample of 30 publicly available integrated writing prompts from IELTS tests Held in 2019, 2020, and 2021 indicated that bar graphs were the most common prompts, followed by multiple-diagram tasks (e.g., a combination of pie charts and tables, or line graphs and bar graphs).
Think-aloud verbal protocol: The think-aloud verbal protocols of 12 EFL teachers were used to determine the most important criteria for assessing IELTS-integrated writing tasks, which in turn were used to develop a descriptor-based diagnostic checklist. The participants were first provided with the necessary instructions and were then trained on think-aloud verbal protocols before performing the task. To make sure that the participants were clear about what was required of them, a short video was also sent to them, which showed the exact process that was supposed to be done.
Interviews: In case any ambiguous or unclear statements were made in think-aloud sessions, a follow-up interview session was held with the participant to clarify the ambiguity. During the interviews, the participants answered the researchers’ specific questions regarding the unclear ideas. This process helped ensure the precision of data collection.
Data collection
A systematic analysis of sources of scale development by Knoch et al. (2021) has revealed that to construct a scale, researchers have used more than one source of scale development, and some have used up to six or seven different sources. Furthermore, the primary sources are (a) performance samples, (b) raters, and (c) theory and literature. Following this, multiple sources of data were used in the current study to ensure the dependability of the results. Using a combination of data sources (e.g., performance samples, rater feedback, literature review, and experts’ judgments) escalates the possibility of understanding a phenomenon from various points of view, thus providing more convincing evidence (Ary, Jacobs, Irvine, and Walker (2018). In the current study, the descriptor-based diagnostic checklist was developed based on both test internal sources (i.e., performance sample, rater and writer think-aloud verbal protocol) and text external (i.e., review of the literature, existing rating scales, and expert judgment) sources of scale construction.
To start the development of the diagnostic checklist, 450 IELTS reports (written responses produced by test takers in reaction to a writing prompt) were collected. Most of the reports were gathered from an IELTS writing-focused Telegram group, which belonged to one of the most recognized Iranian language institutes for IELTS instruction, and the members were Iranian language learners preparing for the IELTS test. These reports were written voluntarily by the learners as part of their preparation for the IELTS exam, and they were sent to the group to be randomly rated by experts and were not produced under formal exam conditions. Next, an experienced IELTS writing instructor (with 8 years of IELTS writing teaching experience) selected two parallel sets of 10 reports from a pool of 450 IELTS task 1 reports. The reports were purposefully selected to represent a range of writing quality based on the four key IELTS scoring criteria: task achievement, grammatical range and accuracy, cohesion and coherence, and lexical resource. That is, reports with varying levels of performance across these criteria were included to ensure diversity in quality. To create two parallel sets, each report was matched with another report of similar quality and writing features. The matched reports were then randomly assigned to sets 1 or 2, ensuring that both sets were comparable in terms of overall writing quality and difficulty.
Then, data were collected through think-aloud verbal protocols of six EFL teachers with experience in teaching IELTS courses. During think-aloud sessions, the EFL teachers verbalized their thinking while rating and providing diagnostic feedback on ten IELTS integrated writing samples (5 essays × 2 prompts). The think-aloud sessions were recorded. Besides, six other EFL teachers with experience in IELTS teaching wrote two IELTS task 1 reports. The purpose of involving teachers was to inform the checklist development by identifying features that experienced teachers consider when producing high-quality responses. While completing the two tasks, the test takers spoke their minds. Similar to the previous think-aloud sessions, these think-aloud sessions were also recorded. At the end of each of the think-aloud protocols, if any ambiguous or unclear statements were made by the participants, interviews were conducted to resolve the ambiguity. As the next step, the recorded think-aloud sessions were transcribed. While listening to recorded data, every single word articulated by the participants was transcribed faithfully. Four of the participants spoke their minds in English, while two others verbalized their thoughts in Persian. As the next step, the Persian texts were translated and checked by the researchers. Attempts were made to make sure that the participants’ words were conveyed accurately, maintaining the original meaning.
Data analysis
After transcribing teachers’ verbal reports and interviews, MAXQDA software was used for coding the data. MAXQDA was used to facilitate data management procedures and estimate the frequency of each of the descriptors. Content analysis was used as a method to make valid and replicable inferences from data. This method involves a systematic classification process, where the data are analyzed and coded to identify patterns and themes (Elo & Kyngäs, 2008).
To code the data, an inductive approach was used, moving from specific (participants’ comments on the qualities of an IELTS writing task 1) to general (categories and themes that describe the criteria that students’ responses must meet). At first, open coding was conducted through which the transcribed data were read, and headings that described the content of the transcribed data were written down (Burnard, 1991, 1996). Then, the headings were organized and categorized into more coherent categories (Burnard, 1991). As the next step, categories were used to create the descriptors. Then, the descriptors were checked against the literature and other developed checklists. Next, two professors of writing assessment reviewed and refined each descriptor to come up with clear, nonredundant descriptors. Finally, 3 IELTS raters, besides four experienced IELTS instructors as content experts, checked the relevance and usefulness of the descriptors for assessing non-native English test-takers’ performance on an integrated writing task, as well as content, clarity, and comprehensiveness.
Results
Think-aloud verbal protocol results
Coding the transcribed data of both sets of data (think-aloud verbal protocol of six EFL teachers while rating 10 IELTS task 1 reports and six other EFL teachers while writing two reports), 50 descriptors were extracted from 2606 coded segments, which were reduced to 28 descriptors after being reviewed and revised by two writing assessment experts. The descriptors and their frequency are presented in Appendix B: Table 4.
As it is demonstrated in Appendix B: Table 4, the descriptors can be divided into those related to general second language writing skills and those that specifically concern writing about graphs and figures in IELTS task 1. Out of the 28 descriptors, 8 descriptors (descriptors 1 to 7 and descriptor 11) are more directly related to graph-based writing and graphical literacy, while the remaining 20 descriptors reflect general writing skills that are expected in any academic writing task.
Among all the extracted descriptors, general writing skill descriptors such as the appropriate use of lexical items were the most frequently mentioned (274 mentions), followed by the lack of systematic grammatical errors (235 mentions), and the ease of following and understanding the report (224 mentions). However, addressing the topic and application of idiomatic items, phrasal verbs, and collocations have been the two least frequently addressed descriptors, with frequencies of 27 and 20, respectively.
As for the teacher-raters’ dataset, the descriptors mentioned by teacher-raters reflect their focus on general IELTS-integrated writing qualities. The most commonly cited descriptors were again general language features: appropriate lexical use (245 mentions), absence of systematic grammatical errors (232 mentions), and ease of comprehension (212 mentions). For instance, one rater remarked, “lexical resources are affected based on wrong choice of word,” and another said, “as area is not a synonym of city, the writer will lose some marks on the vocabulary section.” Concerning grammar, a rater stated, “a lot of grammatical errors that are systematic, so that would gravely harm the score of grammatical range and accuracy.” In terms of coherence, another rater noted, “the organization of the information is not logical and is not coherent at all. That’s what makes progression unclear and very confusing.” Descriptors such as addressing the topic (22 mentions) and use of idiomatic expressions, phrasal verbs, and collocations (13 mentions) were among the least mentioned by raters.
In contrast, regarding the teacher-writers’ dataset, descriptors more closely tied to the graph-based nature of IELTS writing task 1 were mainly highlighted by teacher-writers during the writing process. Supporting key features with data from the graph was the most frequently noted descriptor (83 mentions), followed by selecting and reporting key features (55 mentions), and writing an introduction that effectively paraphrases the prompt (37 mentions). For example, one writer said, “in IELTS writing task 1, I always support my claims with numbers,” emphasizing the importance of using data to back up descriptions. Another commented, “I talked about the highest numbers, now I want to talk about lowest numbers,” which illustrates the deliberate focus on key features. Regarding introductions, one writer stated, “in the introduction paragraph, you just phrase the question prompt itself first and say it in your own words, just convey the same message.” Finally, using sophisticated lexical items and accurate, flexible complex grammatical structures, while important for overall writing proficiency, were rarely mentioned in connection with graph-based writing (only one mention each).
Phase II: Evaluating the developed checklist
The developed diagnostic checklist consisted of 28 descriptors targeting essential criteria required for successful performance on graph-based integrated writing tasks. Test-takers’ performance on each descriptor was evaluated using both dichotomous and polytomous scoring systems. In the dichotomous version, each descriptor was scored as 0 or 1, indicating if the criteria were present (i.e., 1) or not (i.e., 0). In the polytomous version, performance was rated on a four-category scale, allowing for a more nuanced assessment of skill mastery.
The following sections present the results of inter-rater reliability analyses and the correlation between checklist scores and IELTS writing task 1 band scores.
(A) Assessing inter-rater reliability
Participants
To assess the inter-rater reliability of the scores assigned using the developed diagnostic checklist, five experienced IELTS instructors took part in the study. One of the instructors (ID = 1) had also participated in the first phase of the study as part of the expert review panel, whereas others were newly involved in phase 2. Table 3 provides demographic details about these participants.
Table 3. The Experienced IELTS Instructors'Background
ID | Age | Gender | First language | Education | Years of L2 Teaching experience | Years of IELTS teaching experience |
|---|---|---|---|---|---|---|
1 | 60 | Male | Persian | PhD. In TEFL | 35 | 20 |
2 | 29 | Male | Persian | M.A. in English Literature | 12 | 10 |
3 | 30 | Male | Persian | M.A. in TEFL | 12 | 8 |
4 | 32 | Female | Persian | B.A. in English Literature | 13 | 7 |
5 | 40 | Male | Persian | BSC in Electrical Engineering | 8 | 8 |
Data collection
To measure the agreement among raters across ratings, 10 similar IELTS task 1 reports on two prompts (5 reports × 2 prompts) were assigned to five experienced IELTS teachers to be rated based on the developed diagnostic checklist both dichotomously (0–1) and polytomously (1–2–3–4). The raters conducted the scoring independently. Before rating, all instructors reviewed the descriptors of the checklist, and any questions or uncertainties were discussed and resolved to ensure consistent understanding. The raters were trained by the researcher on how to assign both dichotomous and polytomous scores to ensure scoring consistency. In the dichotomous scoring, 0 indicated the absence of the performance feature described by the descriptor, while 1 indicated its presence in the test taker’s response. Moreover, the polytomous scores (1 to 4) represent increasing levels of performance, with 1 indicating the lowest and 4 indicating the highest level of mastery for each descriptor.
Data analysis
To evaluate the accuracy and consistency of scoring, consensus estimates were used. These estimates examine how often markers assign the same scores. Common indicators of consensus agreement include the percentage of exact and adjacent agreements. Cohen’s kappa statistic is used to measure the level of agreement beyond chance when there are two raters, whereas Fleiss’ kappa is used for more than two raters, with values above 0.60 indicating good agreement and values exceeding 0.80 reflecting very good agreement (Altman, 1990; Dettori & Norvell, 2020). In the present study, five raters assessed 10 common papers both dichotomously (0–1) and polytomously (1–2-3–4) to estimate scoring accuracy, and their percentage of exact agreements was calculated utilizing IBM SPSS Statistics 28.
Results
The average percentage of exact agreement, which is a measure of consensus agreement, on descriptors was equal to 0.75 for the dichotomous version. According to Landis and Koch (1977), the kappa value of 0.75 shows a good agreement (i.e., Kappa = 0.61–0.80) among the raters. Besides, a significant p value (p < 0.001) suggests that the observed value is significantly higher than what would be expected by chance. Moreover, the Fleiss multirater kappa for the overall agreement between raters while assigning polytomous scores of 1–2-3–4 similarly showed a good agreement between the raters (i.e., 0.73, p < 0.001).
(B) Correlation between the diagnostic checklist scores and IELTS band scores
Participants
To study the correlation between the scores coming from the developed checklist and those derived from the IELTS rubric, two raters who had also rated the reports to assess inter-rater reliability were recruited for this part of the study. They were both experienced IELTS examiners.
Data collection
To estimate the relationship between the scores drawn from the developed diagnostic checklist and the IELTS rubrics, two raters gave an overall score to 100 IELTS task 1 reports based on the IELTS rubric (2 raters* 2 prompts* 50 IELTS writing task 1 reports). These reports had already been scored by the same raters using the developed diagnostic checklist, both dichotomously and polytomously. However, to minimize the risk of rater bias, two steps were taken. First, the scoring of the reports with the IELTS rubric was conducted in a separate session with a time gap of 4 weeks to reduce memory effects. Second, during the IELTS rubric scoring session, the reports were presented in a different, randomized order, and the raters were not informed that these reports had been previously scored using the diagnostic checklist. The reports used in this section of the study were randomly selected from 340 IELTS task 1 reports written on the two assigned tasks (50 reports on each prompt).
Data analysis
To study the correlation between scores derived from the diagnostic checklist and IELTS band scores, Pearson correlation analysis was employed. To estimate the correlation, first, two total scores were obtained for each participant through the use of dichotomous and polytomous descriptor-based checklists. The first total score was estimated by summing the binary scores (i.e., 0–1) given to each descriptor of the checklist, and the second total score was estimated by summing the scores assigned on a range of 1–2-3–4 to each descriptor. As the next step, the correlation of the scores drawn from the IELTS rubric and the sum of the scores given based on the diagnostic checklist was calculated.
Results
The Pearson correlation between scores derived from the IELTS rubric and scores given based on the diagnostic checklist (i.e., 0–1) was found to be 0.73 (p < 0.001) indicating a significant positive correlation. This positive correlation indicates that higher IELTS band scores are associated with higher scores derived from the diagnostic checklist. Besides, the correlation between IELTS band scores and the polytomous analytic scores (i.e., 1,2,3,4) derived from the diagnostic checklist was 0.69, which again indicates a strong positive correlation (Cohen, 1988). This relationship is statistically significant (p < 0.001), implying that higher IELTS band scores correspond to higher scores on the developed checklist. Finally, the correlation between the dichotomous and polytomous scores was 0.80 (p < 0.001), showing a significant association between the dichotomous and polytomous scores given based on the developed diagnostic checklist.
Discussion
Graph-based tasks have been explored due to their focus on graphical literacy, the skill that plays a crucial role in academic contexts where data interpretation and representation are highly significant. The current study investigated empirically derived diagnostic descriptors relevant to the construct of IELTS-graph-based integrated writing. To come up with the relevant descriptors and address the growing emphasis on utilizing empirical data to construct a rating scale (e.g., Jamieson & Poonpon, 2013; Knoch & Sitajalabhorn, 2013; Plakans & Gebril, 2013), the present study utilized think-aloud verbal protocols of experienced IELTS instructors and raters. The descriptors that emerged from think-aloud verbal protocols were then reviewed by two university professors, four IELTS examiners, as well as three experienced IELTS instructors. The revised descriptors were turned into a data-driven diagnostic checklist. One potential advantage of such diagnostic checklists is that by offering a list of fine-grained descriptors rather than collapsing performance into a few broad categories, they can provide test takers with detailed feedback on specific aspects of their performance (Safari, 2023). Unlike holistic band descriptors such as those used in IELTS, which provide general evaluations of areas like grammar accuracy, vocabulary range, and organization, the proposed checklist breaks these broad areas down into more concrete, observable sub-skills. This allows for more targeted diagnostic feedback that can reveal learners’ specific strengths and weaknesses within each competency, rather than offering only an overall impression.
The use of experts’ insights as well as empirical evidence to develop the diagnostic checklist helped the researchers ensure that the checklist covered all the essential aspects of writing assessment. Furthermore, the Fleiss multirater kappa analysis provided support for the reliability of the scores assigned using the developed checklist. Thus, the overall results of the current study provide support for the use of descriptor-based diagnostic checklists to assign not only dichotomous but also polytomous scores to test takers’ performance on an integrated writing task in general and IELTS graph-based integrated writing tasks in particular.
The present study also investigated the relationship between the scores assigned using the developed checklist and the IELTS rubrics. The strong positive correlation indicates that the developed checklist taps the same underlying construct of language ability that the IELTS rubric targets. However, a lack of perfect unity means that the two measures may also focus on different aspects of integrated writing. This corroborates the findings of Author 2023. While the IELTS rubric is used to assess test takers’ ability to study or work where English is the primary means of communication, the developed diagnostic checklist intends to diagnose test takers’ strengths and weaknesses (Alderson, 2005). Unlike the IELTS rubric, which is used to provide an overall score across broad categories (i.e., task achievement, grammatical range and accuracy, lexical resource, and cohesion and coherence), the developed diagnostic checklist can be utilized to offer more concrete, item-level diagnostic feedback. For instance, if a test taker struggles with providing a clear overview of the main features (item 3) or fails to compare key points effectively (item 8), this can be directly flagged. This level of detail enables test takers to clearly identify which aspects of their writing need attention.
Moreover, comparison of the developed checklist and other diagnostic checklists (i.e., Kim, 2010; Author, 2023; Shi et al., 2024) reveals that some of the descriptors of the developed descriptor-based diagnostic checklist focus on the foundational elements of effective writing in general which are commonly found in such checklists. Across all the checklists, core competencies such as grammar accuracy, vocabulary range, and the logical organization of ideas are consistently emphasized. These elements are fundamental to all different types of writing tasks, whether independent essays, integrated listening-reading-writing tasks, or graph-based reports.
Moreover, the first descriptor of the developed checklist, similar to the other previously developed diagnostic checklists (i.e., Kim, 2010; Author, 2023; Shi et al., 2024) highlighted the importance of following the instructions and answering the questions. Another important issue in writing in general is termed communicative quality by Hamp‐Lyons and Henning (1991), which is defined as the writer’s skill to communicate the message to the reader(s). In writing a task 1 report, similar to other types of writing, the report must be clear and easy to understand and follow. In graph writing, this clarity ensures that the reader can easily grasp the presented visual information; it also shows the writer’s ability to interpret and convey data effectively. The emphasis on the same elements across all the developed diagnostic checklists (i.e., Kim, 2010; Author, 2023; Shi et al., 2024) suggests that the underlying skills required for effective writing remain constant.
Moreover, the checklists designed for the integrated writing tasks focus on synthesizing information from provided sources either linguistic or visual sources. For example, as presented in Safari’s (2023) checklist as well as the checklist developed in the present study, no further points other than those presented by the source materials should be presented in the integrated writings.
However, there were other descriptors in this study that were more specifically related to the construct of IELTS graph-based integrated writing. The graph-based integrated writing task unlike other integrated writing tasks requires an accurate interpretation and reporting of graphical data. Descriptors such as “accurate reporting of graph information” and “supporting key features with data” are specific to this task type, highlighting the need for precise analysis and presentation of the graph’s information. Moreover, as presented by the developed checklist, an overview is a crucial component of the IELTS writing task 1. The overview provides a summary of the key points and main trends in the data and highlights major patterns. Besides, another important consideration in writing an IELTS task 1 report is writing an introduction, which is a paraphrase of the prompt. Writing an introduction that is an effective paraphrase of the prompt sets the context for the report and makes it clear what will be discussed in the upcoming report. This not only demonstrates test takers’ ability to understand the prompt but also shows their proficiency in using different words and sentence structures.
Conclusion and implications of the study
The present study added to the literature on the development of descriptor-based diagnostic checklists for assessing writing skills in general and integrated writing skills in particular. The use of multiple sources of data provided evidence for the development of a descriptor-based diagnostic checklist in both binary and polytomous formats, which functioned well compared with the IELTS writing rubric and further showed the potential to offer more diagnostic feedback to test takers, especially IELTS test takers. The developed checklist further highlighted the task-specific writing skills, which are unique to the graph-based integrated tasks, along with more general writing skills, which are found in different integrated writing tasks as the core writing skills.
The findings can have implications for teachers, learners, and professional development programs. Teachers, especially IELTS instructors, can utilize the checklist to refine their instructional strategies by focusing on specific skills required for successful completion of the IELTS writing task. They can also use the checklist to provide diagnostic feedback to test takers. The learners can also use the checklist to understand the criteria that are essential for success on graph-based integrated writing tasks, guiding their practice and self-assessment. Besides, the insights gained from experts’ think-aloud protocols can inform professional development programs for educators and examiners, helping them to understand and apply important assessment criteria in integrated writing tasks.
Authors’ contributions
S.A.S designed and conducted the study as part of her Ph.D. dissertation, performed the analyses, and wrote the main manuscript text. A.A. and H.R. supervised the study, provided critical feedback, and contributed to revising and refining the manuscript. All authors reviewed and approved the final version of the manuscript.
Funding
This research received no external funding. The author has applied for an Article Processing Charge (APC) waiver.
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
All the participants took part in the study voluntarily and signed informed consent forms before their involvement in the study. Furthermore, they were assured that their data would only be used for research purposes, that they would remain anonymous, and that they could withdraw from the study at any time.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Abbreviations
Cognitive diagnostic assessment
Teaching English as a Foreign Language
English as a Foreign Language
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Ahmadi, A; Mansoordehghan, S. Task type and prompt effect on test performance: A focus on IELTS academic writing tasks. Teaching English as a Second Language Quarterly (Formerly Journal of Teaching Language Skills); 2015; 33,
Akaeze, H. O. (2020). Incorporating differential speed in cognitive diagnostic models with polytomous attributes. Doctoral dissertation. Michigan State University.
Alderson, J. C. (2005). Diagnosing foreign language proficiency: the interface between learning and assessment: A&C Black.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation: Cambridge University Press.
Altman, D. G. (1990). Practical statistics for medical research. Chapman and Hall/CRC.
Ary, D., Jacobs, L. C., Irvine, C. K. S., & Walker, D. (2018). Introduction to research in education: Cengage Learning.
Author, 2023.
Brookhart, S. M. (2013). How to create and use rubrics for formative assessment and grading: Ascd.
Burnard, P. A method of analysing interview transcripts in qualitative research. Nurse Education Today; 1991; 11,
Burnard, P. Teaching the analysis of textual data: An experiential approach. Nurse Education Today; 1996; 16,
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2011). Building a validity argument for the test of English as a foreign language: Routledge.
Cohen, J. Set correlation and contingency tables. Applied Psychological Measurement; 1988; 12,
Coleman, JM; Bradley, LG; Donovan, CA. Visual representations in second graders’ information book compositions. Reading Teacher; 2012; 66,
Cumming, A. Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly; 2013; 10,
Cumming, A; Kantor, R; Baba, K; Erdosy, U; Eouanzoui, K; James, M. Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing; 2005; 10,
Cumming, A; Yang, L; Qiu, C; Zhang, L; Ji, X; Wang, J; Wang, Ying; Zhan, Ju; Zhang, Fengjuan; Cao, Rongping; Yu, Lu; Chu, Meng; Liu, Meihua; Cao, Min; Lai, Conttia; Xu, C. Students’ practices and abilities for writing from sources in English at universities in China. Journal of Second Language Writing; 2018; 39, pp. 1-15. [DOI: https://dx.doi.org/10.1016/j.jslw.2017.11.001]
Dettori, JR; Norvell, DC. Kappa and beyond: Is there agreement?. Global Spine Journal; 2020; 10,
Doe, C. Student interpretations of diagnostic feedback. Language Assessment Quarterly; 2015; 12,
Elo, S; Kyngäs, H. The qualitative content analysis process. Journal of Advanced Nursing; 2008; 62,
Freimuth, H. An examination of cultural bias in IELTS task 1 non-process writing prompts: A UAE perspective. Learning and Teaching in Higher Education: Gulf Perspectives; 2016; 13,
Friel, SN; Curcio, FR; Bright, GW. Making sense of graphs: Critical factors influencing comprehension and instructional implications. Journal for Research in mathematics Education; 2001; 32,
Golparvar, SE; Abolhasani, H. Unpacking the contribution of linguistic features to graph writing quality: An analytic scoring approach. Assessing Writing; 2022; 53, [DOI: https://dx.doi.org/10.1016/j.asw.2022.100644] 100644.
Grabe, W; Zhang, C. Reading and writing together: A critical component of English for academic purposes teaching and learning. TESOL Journal; 2013; 4,
Hamp‐Lyons, L; Henning, G. Communicative writing profiles: An investigation of the transferability of a multiple‐trait scoring instrument across ESL writing assessment contexts. Language Learning; 1991; 41,
He, L; Jiang, Z; Min, S. Diagnosing writing ability using China’s standards of English language ability: Application of cognitive diagnosis models. Assessing Writing; 2021; 50, pp. 1-14. [DOI: https://dx.doi.org/10.1016/j.asw.2021.100565]
Healy, K. Data visualization: A practical introduction; 2024; Princeton University Press:
Hyland, K., & Hyland, F. (2019). Contexts and issues in feedback on L2 writing: Cambridge University Press.
Hyland, K., & Rodrigo, I. H. (2007). English for academic purposes: an advanced resource book. Miscelánea: A journal of english and american studies, pp. 99–108.
Jamieson, J; Poonpon, K. Developing analytic rating guides for TOEFL iBT’s integrated speaking tasks. ETS Research Report Series; 2013; 2013,
Jang, EE; Wagner, M. Diagnostic feedback in the classroom. The Companion to Language Assessment; 2013; 2, pp. 693-711. [DOI: https://dx.doi.org/10.1002/9781118411360.wbcla081]
Kim, Y.-H. (2010). An argument-based validity inquiry into the empirically-derived descriptor-based diagnostic (EDD) assessment in ESL academic writing. Doctoral dissertation. University of Toronto.
Kim, Y-H. Diagnosing EAP writing ability using the reduced reparameterized unified model. Language Testing; 2011; 28,
Knoch, U. Little coherence, considerable strain for reader: A comparison between two rating scales for the assessment of coherence. Assessing Writing; 2007; 12,
Knoch, U. (2009). Diagnostic writing assessment: the development and validation of a rating scale (Vol. 17): Peter Lang
Knoch, U. Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from?. Assessing Writing; 2011; 16,
Knoch, U; Sitajalabhorn, W. A closer look at integrated writing tasks: Towards a more focussed definition for assessment purposes. Assessing Writing; 2013; 18,
Knoch, U; Deygers, B; Khamboonruang, A. Revisiting rating scale development for rater-mediated language performance assessments: Modelling construct and contextual choices made by scale developers. Language Testing; 2021; 38,
Kyle, K. The relationship between features of source text use and integrated writing quality. Assessing Writing; 2020; 45, [DOI: https://dx.doi.org/10.1016/j.asw.2020.100467] 100467.
Landis, JR; Koch, GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics; 1977; [DOI: https://dx.doi.org/10.2307/2529786]
Lee, I. (2017). Classroom writing assessment and feedback in L2 school contexts: Springer.
Lee, I; Mao, Z. Writing teacher feedback literacy: Surveying second language teachers’ knowledge, values, and abilities. Journal of Second Language Writing; 2024; 63, [DOI: https://dx.doi.org/10.1016/j.jslw.2024.101094] 101094.
Lee, Y-W. Diagnosing diagnostic language assessment. Language Testing; 2015; 32,
Lestari, SB; Brunfaut, T. Operationalizing the reading-into-writing construct in analytic rating scales: Effects of different approaches on rating. Language Testing; 2023; 40,
Lukácsi, Z. Developing a level-specific checklist for assessing EFL writing. Language Testing; 2021; 38,
Luoma, S. (2004). Assessing speaking: Cambridge University Press.
Mitchell, R. (2017). IELTS writing task 1+ 2: the ultimate guide with practice to get a target band score of 8.0+ in 10 minutes a day.
Nelson, N; King, JR. Discourse synthesis: Textual transformations in writing from sources. Reading and Writing; 2023; 36,
Okan, Y; Garcia-Retamero, R; Cokely, ET; Maldonado, A. Individual differences in graph literacy: Overcoming denominator neglect in risk comprehension. Journal of Behavioral Decision Making; 2012; 25,
Pandey, AV; Manivannan, A; Nov, O; Satterthwaite, M; Bertini, E. The persuasive power of data visualization. IEEE Transactions on Visualization and Computer Graphics; 2014; 20,
Plakans, L; Gebril, A. Using multiple texts in an integrated writing assessment: Source text use as a predictor of score. Journal of Second Language Writing; 2013; 22,
Plakans, L; Gebril, A; Bilki, Z. Shaping a score: Complexity, accuracy, and fluency in integrated writing performances. Language Testing; 2019; 36,
Ravand, H., & Robitzsch, A. (2015). Cognitive diagnostic modeling using R. Practical Assessment, Research, and Evaluation, 20(1), 1-12.
Safari, F., & Ahmadi, A. (2023). Developing and evaluating an empirically-based diagnostic checklist for assessing second language integrated writing. Journal of Second Language Writing, 60, 101007. https://doi.org/10.1016/j.jslw.2023.101007
Sawaki, Y. Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite. Language Testing; 2007; 24,
Sawaki, Y; Quinlan, T; Lee, Y-W. Understanding learner strengths and weaknesses: Assessing performance on an integrated writing task. Language Assessment Quarterly; 2013; 10,
Shah, P; Hoeffner, J. Review of graph comprehension research: Implications for instruction. Educational Psychology Review; 2002; 14, pp. 47-69. [DOI: https://dx.doi.org/10.1023/A:1013180410169]
Shi, X; Ma, X; Du, W; Gao, X. Diagnosing Chinese EFL learners’ writing ability using polytomous cognitive diagnostic models. Language Testing; 2024; 41,
Shohamy, E. Beyond proficiency testing: A diagnostic feedback testing model for assessing foreign language learning. The Modern Language Journal; 1992; 76,
Turner, CE; Upshur, JA. Rating scales derived from student samples: Effects of the scale maker and the student sample on scale content and student scores. Tesol Quarterly; 2002; 36,
Uludag, P; Lindberg, R; McDonough, K; Payant, C. Exploring L2 writers’ source-text use in an integrated writing assessment. Journal of Second Language Writing; 2019; 46, [DOI: https://dx.doi.org/10.1016/j.jslw.2019.100670] 100670.
Weigle, SC. Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing; 2004; 9,
Yang, H-C. Modeling the relationships between test-taking strategies and test performance on a graph-writing task: Implications for EAP. English For Specific Purposes; 2012; 31,
Zacks, J., Levy, E., Tversky, B., & Schiano, D. (2002). Graphs in print. In Diagrammatic representation and reasoning (pp. 187–206): Springer.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.