Content area

Abstract

Objectives

The manual coding of job descriptions is time-consuming, expensive and requires expert knowledge. Decision support systems (DSS) provide a valuable alternative by offering automated suggestions that support decision-making, improving efficiency while allowing manual corrections to ensure reliability. However, this claim has not been proven with expert coders. This study aims to fill this omission by comparing manual with decision-supported coding, using the new DSS OPERAS.

Methods

Five expert coders proficient in using the French classification systems for occupations PCS2003 and activity sectors NAF2008 each successively coded two subsets of job descriptions from the CONSTANCES cohort manually and using OPERAS. Subsequently, we assessed coding time and inter-coder reliability of assigning occupation and activity sector codes while accounting for individual differences and the perceived usability of OPERAS, measured using the System Usability Scale (SUS; range 0–100).

Results

OPERAS usage substantially outperformed manual coding for all coders on both coding time and inter-coder reliability. The median job description coding time was 38 s using OPERAS versus 60.8 s while manually coding. Inter-coder reliability (in Cohen’s kappa) ranged 0.61–0.70 and 0.56–0.61 for the PCS, while ranging 0.38–0.61 and 0.34–0.61 for the NAF for OPERAS and manual coding, respectively. The average SUS score was 75.5, indicating good usability.

Conclusions

Compared with manual coding, using OPERAS as DSS for occupational coding improved coding time and inter-coder reliability. Subsequent comparison studies could use OPERAS’ ISCO-88 and ISCO-68 classification models. Consequently, OPERAS facilitates large, harmonised job coding in large-scale occupational health research.

Full text

Turn on search term navigation

Correspondence to Mathijs A Langezaal; [email protected]

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Manual occupational coding requires substantial time and expertise. Tools for automatic coding do exist; however, for accurate exposure assessment, humans are still required. Decision support systems (DSS) offer automated suggestions that support decision-making, improving efficiency while allowing manual corrections to ensure reliability. However, the claimed impact of DSS on the occupational coding process’s coding time and inter-coder reliability of assigning occupation and activity sector codes remains unproven with expert coders.

WHAT THIS STUDY ADDS

  • We compare the OPERAS DSS to manual coding on both coding time and inter-coder reliability with five expert coders. Our results show that decision-supported coding with OPERAS significantly outperforms manual coding in both aspects.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • The DSS OPERAS significantly improves the coding time and reliability of job coding. As such, it potentially facilitates the uptake of large-scale occupational health research.

Introduction

Large-scale cohort and case–control studies play a crucial role in occupational epidemiology by assessing health effects associated with occupational exposures. One common method to assess these associations in such large-scale studies is reconstructing subjects’ job histories and, subsequently, linking them to job-exposure matrices.1 2 To reconstruct subjects’ job histories, respondents answer open-ended questionnaires containing questions about a respondent’s job, resulting in answers in a free-text format.3 Before effective use, these free-text job descriptions require standardisation, using occupational or industry classification systems like the International Standard Classification of Occupations.4 This process, usually performed manually, is time-consuming, expensive and requires expert knowledge.5 Moreover, expert coders typically have a limited monthly coding capacity of ~2700 codes.6

Tools have been developed to assist with the manual coding process7 8 and can even automate it entirely.9–13 However, accurate exposure assessment still requires human interaction,14 15 especially when the tools are applied to novel datasets.16 Additionally, humans can adapt their coding to specific industrial or geographical contexts. A valuable alternative to both manual and fully automated coding in occupational coding is artificial intelligence’s (AI’s) decision support systems (DSS) that provide automated suggestions to assist users in decision-making.17

DSS can generate recommendations for job descriptions, often accompanied by an estimated probability of correctness,13 18 19 potentially improving coding time. When suggestions are incorrect, expert coders can ensure reliability through manual corrections. Additionally, DSS promote consistency, as users tend to select the same codes for similar job descriptions, thereby enhancing both inter- and intra-coder reliability.19 In cases where the system’s confidence score is high and users trust its recommendations, suggestions can be processed automatically, with expert coders reviewing the remaining cases, accompanied by the provided suggestions. This approach is typically referred to as semi-automatic or computer-assisted coding.20 21 However, since manual review remains essential to ensure accurate exposure assessment,6 14 20 decision support plays a critical role in occupational coding by improving coding time through code suggestions while maintaining reliability through potential corrections.

While DSS hold the potential to improve the occupational coding processes’ coding time and inter-coder reliability, their impact on the occupational coding process has not been quantitatively assessed with expert coders.13 21 This study addresses this gap by evaluating OPERAS, a state-of-the-art DSS for epidemiological job coding.13 Using the same underlying data, this study compares OPERAS to manual coding, measuring differences in coding time and inter-coder reliability.

Methods

To evaluate the effect of the usage of OPERAS on expert coders’ coding time and inter-coder reliability during the occupational coding process, we employed a within-subject design. Following this design, coders proficient in using the hierarchically structured PCS2003 and NAF2008 classifications coded two different, but equally sized and difficult-to-code subsets of job descriptions, both manually and using OPERAS successively. After coding, we compared coding time and inter-coder reliability. In the following paragraphs, the study design is explained in detail.

Coders

Five French expert coders with occupational coding experience using the French PCS2003 and NAF2008 were recruited for this study. Their coding experience with the aforementioned classifications ranged from 6 months to 15 years. The last time they had coded using these classifications ranged from 0 to 12 months ago. None of the coders had prior experience with (semi-)automatic or decision-supported coding. All coders previously coded exclusively manually and used CAPS (http://www.CAPS-France.fr),7 a French web application designed to assist with finding PCS2003 and NAF2008 codes.

Materials

Using questionnaires, we collected information on the coders’ PCS2003 and NAF2008 coding experience and experience with tools similar to OPERAS. We used the short-form Computer Proficiency Questionnaire (CPQ-12) to measure computer proficiency, as it could influence software usage performance.22 To assess OPERAS’ perceived usability, we used the French System Usability Scale (SUS).23 The SUS contains 10 questions regarding the perceived usability of a system, answered on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). An adjective rating (eg, ‘poor’ or ‘excellent’) can be assigned based on the average SUS score of all participants.24

The job descriptions to be coded were selected from the CONSTANCES cohort, a French general-purpose cohort (adults aged 18–69) focused on occupational and environmental factors.25 The dataset included free-text descriptions of participants’ occupations and activity sectors, as well as the start and end year of a job, employment status (ie, salaried worker or self-employed), type of contract (ie, open-ended or fixed-term) and work-time schedule (ie, full-time or part-time). Expert coders manually coded this information into the French PCS2003 and NAF2008 classifications, which both use hierarchical coding structures to classify occupations and activity sectors, respectively. A PCS code consists of four levels of aggregation and contains three digits and a letter (eg, ‘211A’), whereas an NAF code has five levels of aggregation and contains four digits and a letter (eg, ‘1013A’). Here, each added character adds more detail about the described job or activity sector. The descriptive statistics of CONSTANCES’ job descriptions and their implications on the classification performance can be found in Langezaal et al.13

To ensure a fair comparison between OPERAS and manual coding, we developed two job description subsets of equal size and coding difficulty. Coding the same subset twice would be inadequate, as this would result in memorisation of the assigned code in the second condition (ie, coding method), negatively impacting validity. Additionally, an expert coder repeatedly coding the same job descriptions is not representative of real-world scenarios. Expert coders may have coding experience with descriptions from different populations, leading to differences in coding time due to their perceived coding difficulty of descriptions stemming from these worker groups. For example, one coder might be more familiar with coding descriptions from farmers (ie, codes from category ‘1’), while another more frequently codes descriptions from artisans (ie, codes from category ‘2’). Furthermore, at the least aggregated coding level, the detail and distinction between codes within the same higher-level group can vary. For instance, differentiating between code ‘472B’: ‘Surveyors, topographers’ and ‘472C’: ‘Quantity surveyors and various building and public worker technicians’ is more nuanced and could be perceived as more challenging than deciding between ‘431B’: ‘Psychiatric nurses’ and ‘431C’: ‘Nursery nurses’, where the distinction is clearer. Since perceived coding difficulty affects coding time, the overall perceived difficulty between subsets must be equal. Additionally, it is unknown which specific jobs will need to be classified in future, real-world OPERAS usage, where the system should be applicable across all job types. Hence, we included a wide range of descriptions in the subsets reflecting the overall French worker population. We achieved this via stratified random sampling from CONSTANCES, aligned with the 2017 French Census second-level PCS2003 distribution.26 We sampled 100 subjects, resulting in 326 job descriptions per subset.

To enable a statistical coding time comparison on an individual level, we measured coding time for each job description separately and subsequently compared coding time between conditions based on code pairs. A code pair consists of one description from one subset and one description from the other subset with the same perceived coding difficulty. These were developed by matching the PCS codes of job descriptions at the least aggregated coding level between the two subsets (eg, ‘421A’). If a match could not be found at the least aggregated level, pairs were formed using more aggregated code levels (eg, ‘421’, ‘42’, etc) until all job descriptions were matched. Subsequently, an expert coder (not involved in the present coding process) verified that both codes in each code pair are of equal coding difficulty to minimise differences in coding time between subsets attributable to variations in the job descriptions’ perceived coding difficulty.

OPERAS’ AI provides occupational and/or activity code classification suggestions for free-text job descriptions.13 OPERAS displays the information for each job description in a row format, with customisable and sortable columns to accommodate different coding strategies (see online supplemental figure S1). For each job description, OPERAS provides three code suggestions on the least aggregated coding level in a dropdown menu for each coding system. These suggestions were generated using the same data processing strategies and classification models as described in Langezaal et al.13 Hovering over a suggestion displays the available information from the coding index related to the suggested code. Each suggestion includes an estimated chance of correctness (ranging 0%–100%), allowing users to set a threshold for automatic coding.13 If suggestions are deemed incorrect by the user, OPERAS allows users to manually overwrite them. Within each row, OPERAS displays a checkbox clickable by users to indicate that the job description in that row is coded. When clicked, this checkbox saves a timestamp in OPERAS’ database for subsequent coding time analyses. For OPERAS coding, coders were provided with OPERAS containing one of the subsets. They were shown all occupational information present in the CONSTANCES dataset for those descriptions.

For manual coding, the coders were provided with an Excel file containing the job descriptions of one subset, showing all the same information present during OPERAS coding. Here, we developed a macro that saved a timestamp whenever the PCS or NAF field was edited for subsequent analyses. Similar to OPERAS, the columns were sortable to accommodate different coding workflows.

Study design

This study employed a within-subject design, a method where the same participants are exposed to all experimental conditions, allowing direct comparisons within individuals.27 Here, five expert coders successively coded two subsets of job descriptions: one manually and one using OPERAS. This design was chosen to minimise potential biases in coding time measurements caused by differences between coders (eg, coding experience) between the two conditions. To account for fatigue and procedural effects, two coders started with manual coding, while three began with OPERAS coding. After the first session, the coders coded the other subset in the second session. Coders completed coding in their familiar work environment with breaks and in as many sittings as needed. This ensured a realistic occupational coding scenario and high ecological validity. The study procedure consists of four data collection stages:

  • Explanation of the study, informed consent and questionnaires assessing coding experience and computer proficiency.

  • Coding session 1.

  • Coding session 2.

  • Assessment of OPERAS’ perceived usability.

During the first stage, coders received information about the study’s objectives verbally and through an information letter. We instructed the coders to record their computer screens during both coding conditions to secure the accuracy of the data for subsequent analyses. After indicating their understanding of the procedure and contents of the study, they completed the informed consent form. Subsequently, questionnaires on the coders’ PCS2003 and NAF2008 coding experience, experience with tools similar to OPERAS, and the CPQ-12 were conducted. Using the aforementioned counterbalanced design, we assigned coders to one of the coding orders.

In stage 2, coders coded one of the two subsets, using either OPERAS or manual coding, based on their assigned coding order. In stage 3, they coded the other subset using the alternative method. Coders assigned to order 1 began with manual coding in the first session and used OPERAS in the second session, while those assigned to order 2 followed the opposite sequence. During the manual coding phase, coders received an Excel file containing one of the job description subsets. Coders coded each job description using PCS2003 and NAF2008. To replicate the expert coders’ normal coding workflow, they could modify the file for the manual coding process. During the OPERAS coding stage, coders coded the other subset of job descriptions into the PCS2003 and NAF2008 using OPERAS. Beforehand, coders received an explanation of OPERAS’ features and received instructions to code to the least aggregated coding level, unless the data only supported coding to a lower level due to missing or incomplete information. After both coding stages had concluded, the SUS was conducted to assess OPERAS’ perceived usability.

Data collection and statistical analysis

Since all coders performed the coding procedures in their natural work environment, all data were collected remotely. We measured coding time in both conditions using the generated timestamps in OPERAS and Excel. The time to code each description is calculated as the difference between consecutive timestamps in full seconds. For example, if code A is completed at 12:15:41 and code B at 12:17:43, the time to code code B is 122 s. Consequently, if codes are copied and therefore coded within the same second, this is reflected as a coding time of 0 s.

All recordings were manually reviewed to correct timestamps if needed. For both coding conditions, this included checking whether a coder continued working on a description/code after a timestamp was created. In such cases, timestamps were corrected to one-tenth of a second.

To allow for an overall comparison between OPERAS and manual coding time, we averaged the coding efficiencies of individual job descriptions of each participant. The Shapiro-Wilk test for normality reveals that the data for both OPERAS (W(5)=0.956, p=0.753) and manual coding (W(5)=0.981, p=0.942) are normally distributed. Consequently, we used a paired samples t-test to compare coding time between OPERAS and manual coding. For all coders separately, the distribution plots of the coding time appeared neither normal nor symmetrical. Consequently, we applied the non-parametric paired-samples sign test to compare coding time between OPERAS and manual coding on an individual level.28 To assess whether the use of DSS significantly influences code selection compared with manual coding, we conducted a χ2 test to compare the distribution of second-level PCS and NAF codes between OPERAS and manual coding. We also conducted ad-hoc analyses to explore potential interactions between overall coding time and several covariates: coding experience, time since last coding using PCS2003 and NAF2008, computer literacy (measured by CPQ-12) and perceived usability (assessed using the SUS).

We calculated inter-coder reliability using Cohen’s kappa, evaluating each condition in which coders used the same subset.29 Additionally, we assessed agreement between each coder and the ‘gold standard’ manually coded job descriptions from CONSTANCES. For all evaluations, we compared the inter-coder reliability between OPERAS and manual coding for both the PCS and NAF separately on each coding level.

Results

We obtained 1630 records of coded job descriptions. All coders provided their final code for each job description, resulting in an inter-coder reliability comparison on all records. After manually checking screen recordings of the coding, 1204 out of 1630 (73.8%) valid records remained for the coding time comparison due to missing screen recordings.

The descriptive statistics of the coders’ timestamps can be found in table 1. The distribution of the timestamps for manual and OPERAS coding can be found in figure 1. Overall median coding time for manual coding was 60.6 s compared with 38.0 for OPERAS coding. For coder P1, the median coding times were 28.0 s for OPERAS and 62.0 s for manual coding. For P2, they were 48.2 and 81.3 s; for P3, 27.0 and 39.0 s; for P4, 42.9 and 63.5 s; and for P5, 43.2 and 66.5 s for OPERAS and manual coding, respectively. OPERAS coding demonstrated a faster overall median coding time of 22.6 s (t(5)=6.542, p=0.003). This is also the case for individual coders, with faster median coding times of 34 s for P1 (Z=4.072, p<0.001), 33.1 s for P2 (Z=2.298, p=0.022), 12 s for P3 (Z=2.017, p=0.044), 20.6 s for P4 (Z=4.597, p<0.001) and 23.3 s for P5 (Z=4.133, p<0.001).

View Image - Figure 1. Distribution of the coding time for individual job descriptions of expert coders using (a) OPERAS and (b) manual coding in seconds. Vertical lines with percentages indicate the percentage of descriptions with a coding time within that bracket.

Figure 1. Distribution of the coding time for individual job descriptions of expert coders using (a) OPERAS and (b) manual coding in seconds. Vertical lines with percentages indicate the percentage of descriptions with a coding time within that bracket.

Table 1

Descriptive statistics of the coding time of occupational coding per individual job description in seconds

CoderConditionMinMaxMean (SE)IQR (25%)IQR (75%)
OverallOPERAS0.0649.662.9 (2.2)16.080.0
Manual0.0735.083.5 (2.5)27.0109.1
P1OPERAS0.0598.057.7 (4.6)13.273.8
Manual0.0653.071.8 (4.7)20.098.0
P2OPERAS0.4650.082.8 (7.2)19.8112.9
Manual0.0513.5108.5 (7.6)31.4145.3
P3OPERAS1.0226.831.7 (3.4)11.052.0
Manual1.0398.048.8 (4.7)21.056.0
P4OPERAS1.0596.568.5 (4.3)18.092.5
Manual3.0562.086.0 (4.6)29.0113.5
P5OPERAS0.0363.059.9 (3.8)14.080.3
Manual1.0735.092.7 (6.5)29.0117.7

Following the within-subject design, each participant coded both using OPERAS and manually. P1–P5 refers to coders’ coding time. IQR is given for 25% and 75% quantiles. SE refers to the SE of the mean.

Inter-coder reliability for each possible coder pair on all coding levels can be found in table 2. For the PCS, the inter-coder reliability ranged 0.80–0.87, 0.73–0.83, 0.68–0.76, 0.61–0.70 for OPERAS coding, while for manual coding this ranged 0.80–0.85, 0.72–0.76, 0.66–0.72, 0.56–0.61 on coding levels 1–4, respectively. For the NAF, this ranged 0.70–0.85, 0.67–0.83, 0.45–0.70, 0.39–0.65 and 0.38–0.61 for OPERAS coding and 0.69–0.82, 0.66–0.77, 0.47–0.68, 0.39–0.62, 0.34–0.61 for manual coding on coding levels 1–5, respectively. Out of nine comparisons, OPERAS outperformed manual coding in eight instances on the first and second coding level, and in all instances for the third and fourth coding levels. For the NAF, OPERAS outperformed manual coding in seven instances for the first coding level, eight instances for the second through fourth coding levels, and in all instances for the fifth coding level. No significant differences were observed in the code distributions between OPERAS and manual coding for second-level PCS (χ²(33, N=3260)=14.34, p=0.99) and NAF codes (χ²(92, N=3260)=77.03, p=0.86) (see figure 2).

View Image - Figure 2. Distribution of second-level (a) PCS and (b) NAF codes selected by expert coders during OPERAS and manual coding. A ‘#’ in a PCS or NAF code indicates an uncodable coding level due to missing or incomplete job description information.

Figure 2. Distribution of second-level (a) PCS and (b) NAF codes selected by expert coders during OPERAS and manual coding. A ‘#’ in a PCS or NAF code indicates an uncodable coding level due to missing or incomplete job description information.

Table 2

For each coder pair (CP), the inter-coder reliability of assigning occupation (PCS) and activity sector (NAF) codes per coding level (CL) in Cohen’s kappa (k) for OPERAS coding and manual coding is given

CPCLPCSNAF
OPERASManualOPERASManual
P1 and P410.850.840.700.72
20.800.760.670.66
30.740.660.500.47
40.640.560.420.39
50.390.34
P2 and P310.840.810.770.73
20.810.740.770.68
30.720.670.610.61
40.670.560.550.51
50.520.50
P2 and P510.870.800.830.82
20.830.750.820.76
30.760.670.700.68
40.700.590.650.62
50.610.61
P3 and P510.840.810.850.79
20.810.760.830.77
30.750.720.670.62
40.680.610.630.58
50.590.55
P1 and GS10.850.850.700.77
20.770.750.670.71
30.730.660.450.48
40.650.590.390.42
50.380.36
P2 and GS10.830.810.810.72
20.740.730.790.67
30.680.660.630.56
40.610.600.600.47
50.560.46
P3 and GS10.800.810.770.69
20.730.740.750.67
30.690.680.600.52
40.620.570.560.46
50.500.44
P4 and GS10.860.820.720.77
20.790.730.700.72
30.730.670.600.58
40.640.570.550.54
50.540.50
P5 and GS10.860.810.780.74
20.780.720.780.71
30.720.660.650.58
40.650.600.590.54
50.560.53

Following the within-subject design, each participant coded both using OPERAS and manually. GS refers to the gold standard manually coded job episode from CONSTANCES. A number is bold if the k-value exceeds the other condition for that coder pair and classification.

The average score for individual SUS questions ranged between 1.2 and 4.8, with the SUS score from coders ranging 65–87.5 (see table 3). The average SUS score is 75.5.

Table 3

System Usability Scale (SUS) questions on OPERAS’ perceived usability and scores for coders 1–5 (P1–P5) and average scores (Avg)

No.QuestionP1P2P3P4P5Avg
1I think that I would like to use this system frequently535233.6
2I found the system unnecessarily complex121111.2
3I thought the system was easy to use555544.8
4I think that I would need the support of a technical person to be able to use this system111121.2
5I found the various functions in this system were well integrated423343.2
6I thought there was too much inconsistency in this system331453.2
7I would imagine that most people would learn to use this system very quickly445344
8I found the system very cumbersome to use121321.8
9I felt very confident using the system434323.2
10I needed to learn a lot of things before I could get going with this system121111.2
SUS score87.567.592.5656575.5

Coder scores are on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). SUS scores are calculated by subtracting one from odd-numbered question scores, subtracting even-numbered question scores from 5, summing the results and subsequently multiplying by 2.5.

Discussion

In this study, we quantitatively assess the impact of the DSS OPERAS on occupational coding time and inter-coder reliability of assigning occupation and activity sector codes to job descriptions. Five expert coders proficient in coding using the PCS2003 and NAF2008 participated in the study. Using a within-subject design, occupational coding performance using OPERAS was compared with traditional manual coding. OPERAS was shown to benefit both coding time and inter-coder reliability.

OPERAS consistently outperformed manual coding, both overall and for each coder individually. Differences in coder performance (see table 1) can be attributed to differences in prior coding experience, familiarity with the classification systems and utilisation of OPERAS’ features. For example, the hover function, which displays a code suggestion’s corresponding coding index information, was not fully used by all participants. This led to manual checks of each suggested code’s coding index entry, resulting in substantial time loss. Furthermore, none of the coders used the automatic coding function, despite its significant potential to enhance coding time.13 OPERAS’ original data processing and classification models are used, achieving prediction accuracies of 68.8% and 78.9%, resulting in potential workload reductions of 40.7%–55.7% for PCS2003 and NAF2008, respectively.13 Despite its state-of-the-art performance compared with other tools, which have prediction accuracies ranging 15%–65%,6 9–11 13 20 21 30 users were still hesitant to employ the automatic coding function. This reluctance likely stems from the coders’ initial lack of trust in OPERAS’ recommendations due to first-time usage.31 Consequently, all coders verified each suggestion’s accuracy before accepting, regardless of confidence score, resulting in higher coding time. Overall, with continued OPERAS usage and increased trust in the system, coders are expected to become more likely to fully use OPERAS’ functionalities and accept high-confidence suggestions, leading to progressively improving coding time.31

On the least aggregated coding level, OPERAS outperforms manual coding. This outcome was expected because users are more likely to select a suggested code than to independently agree on the same code without suggestions.21 32 Increased inter-coder reliability leads to more stable and accurate results.33 Therefore, when suggestions are validated, receiving suggestions from DSS is preferred for occupational epidemiological studies and other research based on these occupational codes. No significant differences were observed in the distribution of second-level PCS and NAF codes between OPERAS and manual coding. This finding suggests that the recommendations provided by OPERAS do not lead to code choices that significantly deviate from those manually selected, thereby demonstrating comparable reliability. This is further supported by the percentage of corrections made by participants during OPERAS coding: P1 corrected 33.7% of PCS codes and 47.9% of NAF codes, P2 corrected 36.2% and 45.4%, P3 corrected 33.7% and 44.5%, P4 corrected 33.7% and 44.5%, and P5 corrected 43.9% and 57.1%, respectively. These results highlight that, although some incorrect codes were initially presented to the expert coders, final choices were not influenced by these errors and often aligned with the codes that would have been selected during manual coding.

OPERAS did not outperform manual coding on inter-coder reliability in all cases. In 9.8% of cases (ie, 8 out of 81), manual coding surpassed OPERAS coding. This could be attributed to various factors, such as differences and similarities between coder pairs in terms of training and experience.34 35 Furthermore, in the current study, no specific end goal was defined for which the resulting codes will be used. In cases where certain exposed job groups require thorough and/or additional review due to high exposure risk, OPERAS’ models could be adapted to ensure higher reliability for these codes.

In all instances, expert inter-coder reliability using the NAF was lower than when using PCS. This was expected given that NAF has more outcome categories.36 37 Additionally, job descriptions were selected if the gold-standard PCS code in CONSTANCES was coded to at least the second coding level. Consequently, the gold-standard NAF was sometimes coded to even lower levels due to insufficient job description information, potentially leading to increased variability and ambiguity compared with PCS coding.38

With an average SUS score of 75.5, OPERAS’ perceived usability is considered ‘good’.24 However, it is notable that P4 and P5, who reported the lowest usability scores (ie, 65), experienced the greatest coding time gains. This indicates potential for further improvement, as OPERAS usage improved coding time even when perceived usability was low. For example, OPERAS was sometimes considered inconsistent. Integrating explainability into OPERAS could address the perceived inconsistency by providing rationale for code suggestions, potentially reducing user uncertainty and improving trust.39 Beyond clarifying suggestions, this explainability could also enable respondent self-coding, which already demonstrates reasonable reliability and efficiency when respondents traverse a decision tree to select their occupation.40 Here, OPERAS’ suggestions and explanations can be leveraged to guide users in this selection process, selecting the occupational category and coding level they deem most appropriate. Furthermore, the underutilisation of OPERAS’ hover function suggests that this, and possibly other features, may not yet be effectively implemented or integrated (see table 3). Therefore, usability studies and other qualitative evaluations (eg, interviews) should be conducted to identify and address bottlenecks in OPERAS’ usability to increase usability and potentially maximise coding time gains.

The primary cause of data loss was due to the absence or corruption of parts of the video files for both conditions, which led to a total data loss of 26.2%. Despite this, the impact on the overall results was minimised because no specific parts of the subsets were consistently affected. Additionally, manual corrections of timestamps ensured that only valid timestamps were analysed, securing accurate and stable results. Furthermore, a comparison conducted with complete, uncleaned data yielded similar inferences.

This study involved five expert coders proficient with the PCS2003 and NAF2008 coding systems. As different classification systems can influence coding workflows, which may also vary among expert coders, future comparison studies could include a larger number of coders who are familiar with these and other occupational coding classifications. This study demonstrates that using a DSS for occupational coding decreases coding time and improves inter-coder reliability compared with traditional manual coding. Additional insights could be gained by conducting a similar study with more coders and diverse classification systems. Furthermore, OPERAS’ coding time and inter-coder reliability gain could be further improved by conducting usability studies to find and solve potential bottlenecks in OPERAS’ usability. Overall, OPERAS may facilitate large-scale, harmonised job coding, potentially enabling more stable and efficient occupational health research.

The authors thank the CONSTANCES team for providing the job histories of the cohort participants and Fabien Gilbert (IRSET, University of Angers) for supervising their coding. The authors also thank the expert coders Brigitte Dananché, Corinne Pilorget, Florence Orsi, Loïc Garras and one anonymous coder for their valuable participation in the study.

Data availability statement

Data are available upon reasonable request. Data from this study can be obtained from the corresponding author upon reasonable request.

Ethics statements

Patient consent for publication

Not applicable.

Ethics approval

This study involves human participants. Ethical approval for this study (Bèta S-23167) was obtained from the Science-Geosciences Ethics Review Board of Utrecht University on 6 December 2023. Participants gave informed consent to participate in the study before taking part.

Footnote

Contributors Substantial contributions to the conception or design of the work (MAL, ELvdB, NLM, MG, RV and SP); acquisition of the data (MAL, MG, NLM, CP); analysis and interpretation of the data (MAL, ELvdB, GR, MG, RV and SP); drafting the work (MAL, ELvdB); critical revision of the work (MAL, ELvdB, GR, NLM, CP, MG, RV and SP); final approval of the draft for submission (MAL, ELvdB, GR, NLM, CP, MG, RV and SP). MAL is the guarantor for this work. ChatGPT and Gemini were used for textual improvement of the manuscript by checking its spelling, grammar and textual flow. No scientific content was created or checked using either AI.

Funding The OPERAS project discloses support for the research of this work from ANSES [PNR EST-2018/1/106]. The CONSTANCES cohort is supported and funded by the Caisse Nationale d’Assurance Maladie (CNAM). It benefits from a grant from the Agence Nationale de la Recherche [ANR-11-INBS-0002] and from the French Ministry of Research. SP is supported by the Exposome Project for Health and Occupational Research (EPHOR) which is funded by the European Union’s Horizon 2020 research and innovation programme [874703]. RV is supported by the Gravitation programme of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organisation for Scientific Research through the EXPOSOME-NL [024.004.017].

Disclaimer All funders had no role in the study design, data collection, analysis, interpretation, manuscript preparation or the decision to submit the manuscript for publication.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

References

1 Peters S. Although a valuable method in occupational epidemiology, job-exposure -matrices are no magic fix. Scand J Work Environ Health 2020; 46: 231–4. doi:10.5271/sjweh.3894

2 Peters S, Vienneau D, Sampri A, et al. Occupational Exposure Assessment Tools in Europe: A Comprehensive Inventory Overview. Ann Work Expo Health 2022; 66: 671–86. doi:10.1093/annweh/wxab110

3 Kromhout H, Vermeulen R. Application of job-exposure matrices in studies of the general population: some clues to their performance. Eur Respir Rev 2001; 11: 80–90.

4 Elias P. Occupational classification (ISCO-88): concepts, methods, reliability, validity and cross-national comparability. Paris: OECD Labour Market and Social Policy Occasional Papers, 1997.

5 Hoffmann E, Elias P, Embury B, et al. What kind of work do you do: data collection and processing strategies when measuring “occupation” for statistical surveys and administrative records [STAT Working Paper 95–1]. Geneva, 1995.

6 Burstyn I, Slutsky A, Lee DG, et al. Beyond crosswalks: reliability of exposure assessment following automated coding of free-text job descriptions for occupational epidemiology. Ann Occup Hyg 2014; 58: 482–92. doi:10.1093/annhyg/meu006

7 Rémen T, Richardson L, Pilorget C, et al. Development of a Coding and Crosswalk Tool for Occupations and Industries. Ann Work Expo Health 2018; 62: 796–807. doi:10.1093/annweh/wxy052

8 Petersen NG, Mumford MD, Borman WC, et al. An occupational information system for the 21st century: the development of O*NET. Washington, DC, US: American Psychological Association, 1999.

9 Savic N, Bovio N, Gilbert F, et al. Procode: A Machine-Learning Tool to Support (Re-)coding of Free-Texts of Occupations and Industries. Ann Work Expo Health 2022; 66: 113–8. doi:10.1093/annweh/wxab037

10 Bao H, Baker CJO, Adisesh A. Occupation Coding of Job Titles: Iterative Development of an Automated Coding Algorithm for the Canadian National Occupation Classification (ACA-NOC). JMIR Form Res 2020; 4: e16422. doi:10.2196/16422

11 U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, Division of Field Studies and Engineering, Health Informatics Branch. NIOSH. NIOSH industry and occupation computerized coding system (NIOCCS). 2024. Available: https://csams.cdc.gov/nioccs/About.aspx [Accessed 16 Apr 2025 ].

12 Patel MD, Rose KM, Owens CR, et al. Performance of automated and manual coding systems for occupational data: a case study of historical records. Am J Ind Med 2012; 55: 228–31. doi:10.1002/ajim.22005

13 Langezaal MA, van den Broek EL, Peters S, et al. Artificial intelligence exceeds humans in epidemiological job coding. Commun Med (Lond) 2023; 3: 160. doi:10.1038/s43856-023-00397-4

14 Koeman T, Offermans NSM, Christopher-de Vries Y, et al. JEMs and incompatible occupational coding systems: effect of manual and automatic recoding of job codes on exposure assignment. Ann Occup Hyg 2013; 57: 107–14. doi:10.1093/annhyg/mes046

15 Pilorget C, Imbernon E, Goldberg M, et al. Evaluation of the quality of coding of job episodes collected by self questionnaires among French retired men for use in a job-exposure matrix. Occup Environ Med 2003; 60: 438–43. doi:10.1136/oem.60.6.438

16 Wan W, Ge CB, Friesen MC, et al. Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison. Ann Work Expo Health 2023; 67: 663–72. doi:10.1093/annweh/wxad002

17 Bonczek RH, Holsapple CW, Whinston AB. Introduction to information processing, decision making, and decision support. In: Foundations of decision support systems. London, UK: Academic Press, 2014: 3–25.

18 Jones R, Elias P. CASCOT: computer assisted structured coding tool. 2004. Available: https://warwick.ac.uk/fac/soc/ier/software/cascot/ [Accessed 16 Apr 2025 ].

19 Jaspers MWM, Smeulers M, Vermeulen H, et al. Effects of clinical decision-support systems on practitioner performance and patient outcomes: a synthesis of high-quality systematic review findings. J Am Med Inform Assoc 2011; 18: 327–34. doi:10.1136/amiajnl-2011-000094

20 Russ DE, Josse P, Remen T, et al. Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies. Ann Work Expo Health 2023; 67: 772–83. doi:10.1093/annweh/wxad020

21 Belloni M, Brugiavini A, Meschi E, et al. Measuring and detecting errors in occupational coding: an analysis of SHARE data. J Off Stat 2016; 32: 917–45. doi:10.1515/jos-2016-0049

22 Boot WR, Charness N, Czaja SJ, et al. Computer proficiency questionnaire: assessing low and high computer proficient seniors. Gerontologist 2015; 55: 404–11. doi:10.1093/geront/gnt117

23 Gronier G, Baudet A. Psychometric evaluation of the F-SUS: creation and validation of the French version of the system usability scale. Int J Hum-Comput Interact 2021; 37: 1571–82. doi:10.1080/10447318.2021.1898828

24 Bangor A, Kortum PT, Miller JT. An empirical evaluation of the system usability scale. Int J Hum Comput Interact 2008; 24: 574–94. doi:10.1080/10447310802205776

25 Zins M, Goldberg M, team C. The French CONSTANCES population-based cohort: design, inclusion and follow-up. Eur J Epidemiol 2015; 30: 1317–28. doi:10.1007/s10654-015-0096-4

26 Institut National de la Statistique et des Études Économiques (INSEE) (producteur), Archives de Données Issues de la Statistique Publique (ADISP) (diffuseur). Enquête Emploi en continu (version FPR) - 2017 [dataset]. 2017. doi:10.13144/lil-1262b

27 Machin D, Campbell M, Tan S, et al. Paired binary, ordered categorical and continuous outcomes. In: Sample sizes for clinical, laboratory and epidemiology studies. Hoboken, New Jersey, US: John Wiley & Sons, 2018: 117–35.

28 Sheskin DJ. Inferential statistical tests employed with two dependent samples (and related measures of association/correlation). In: Handbook of parametric and nonparametric statistical procedures. Boca Raton, Florida, US: Chapman and Hall/CRC, 2003: 763–882.

29 Sun S. Meta-analysis of Cohen’s kappa. Health Serv Outcomes Res Method 2011; 11: 145–63. doi:10.1007/s10742-011-0077-3

30 Gweon H, Schonlau M, Kaczmirek L, et al. Three methods for occupation coding based on statistical learning. J Off Stat 2017; 33: 101–22. doi:10.1515/jos-2017-0006

31 Cabiddu F, Moi L, Patriotta G, et al. Why do users trust algorithms? A review and conceptualization of initial trust and trust over time. Eur Manag J 2022; 40: 685–706. doi:10.1016/j.emj.2022.06.001

32 Furnham A, Boo HC. A literature review of the anchoring effect. J Socio Econ 2011; 40: 35–42. doi:10.1016/j.socec.2010.10.008

33 Rémen T, Richardson L, Siemiatycki J, et al. Impact of Variability in Job Coding on Reliability in Exposure Estimates Obtained via a Job-Exposure Matrix. Ann Work Expo Health 2022; 66: 551–62. doi:10.1093/annweh/wxab106

34 Mannetje A, Kromhout H. The use of occupation and industry classifications in general population studies. Int J Epidemiol 2003; 32: 419–28. doi:10.1093/ije/dyg080

35 Massing N, Wasmer M, Wolf C, et al. How standardized is occupational coding? A comparison of results from different coding agencies in Germany. J Off Stat 2019; 35: 167–87. doi:10.2478/jos-2019-0008

36 Institut National de la Statistique et des Études Économiques (INSEE). La PCS. 2003. Available: https://www.insee.fr/fr/information/2497952 [Accessed 16 Apr 2025 ].

37 Institut National de la Statistique et des Études Économiques (INSEE). Nomenclature d’Activités française. 2008. Available: https://www.insee.fr/fr/information/2406147 [Accessed 16 Apr 2025 ].

38 Conrad FG, Couper MP, Sakshaug JW. Classifying open-ended reports: factors affecting the reliability of occupation codes. J Off Stat 2016; 32: 75–92. doi:10.1515/jos-2016-0003

39 Ferrario A, Loi M. How explainability contributes to trust in ai. FAccT’22; New York, NY, USA, 2022: 1457–66. doi:10.1145/3531146.3533202

40 De Matteis S, Jarvis D, Young H, et al. Occupational self-coding and automatic recording (OSCAR): a novel web-based tool to collect and code lifetime job histories in large population-based studies. Scand J Work Environ Health 2017; 43: 181–6. doi:10.5271/sjweh.3613

© 2025 Author(s) (or their employer(s)) 2025. Re-use permitted under CC BY. Published by BMJ Group. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:  https://creativecommons.org/licenses/by/4.0/ . Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.