Content area
Word frequency effect has always been of interest for reading research because of its critical role in exploring mental processing underlying reading behaviors. Access to word frequency information has long been considered an indicator of the beginning of lexical processing and the most sensitive marker for studying when the brain begins to extract semantic information Sereno & Rayner, Brain and Cognition, 42, 78-81, (2000), Trends in Cognitive Sciences, 7, 489-493, (2003). While the word frequency effect has been extensively studied in numerous eye-tracking and traditional EEG research using the RSVP paradigm, there is a lack of corresponding evidence in studies of natural reading. To find the neural correlates of the word frequency effect, we conducted a study of Chinese natural reading using EEG and eye-tracking coregistration to examine the time course of lexical processing. Our results reliably showed that the word frequency effect first appeared in the N200 time window and the bilateral occipitotemporal regions. Additionally, the word frequency effect was reflected in the N400 time window, spreading from the occipital region to the central parietal and frontal regions. Our current study provides the first neural correlates for word-frequency effect in natural Chinese reading so far, shedding new light on understanding lexical processing in natural reading and could serve as an important basis for further reading study when considering neural correlates in a realistic manner.
Abstract
Word frequency effect has always been of interest for reading research because of its critical role in exploring mental processing underlying reading behaviors. Access to word frequency information has long been considered an indicator of the beginning of lexical processing and the most sensitive marker for studying when the brain begins to extract semantic information Sereno & Rayner, Brain and Cognition, 42, 78-81, (2000), Trends in Cognitive Sciences, 7, 489-493, (2003). While the word frequency effect has been extensively studied in numerous eye-tracking and traditional EEG research using the RSVP paradigm, there is a lack of corresponding evidence in studies of natural reading. To find the neural correlates of the word frequency effect, we conducted a study of Chinese natural reading using EEG and eye-tracking coregistration to examine the time course of lexical processing. Our results reliably showed that the word frequency effect first appeared in the N200 time window and the bilateral occipitotemporal regions. Additionally, the word frequency effect was reflected in the N400 time window, spreading from the occipital region to the central parietal and frontal regions. Our current study provides the first neural correlates for word-frequency effect in natural Chinese reading so far, shedding new light on understanding lexical processing in natural reading and could serve as an important basis for further reading study when considering neural correlates in a realistic manner.
Keywords Chinese reading * EEG * Eye-movement * Frequency effect
Word frequency refers to the occurrence of a word in a given text or corpus, and it is considered the primary indicator for mental processing during reading (Brysbaert et al., 2018). Numerous classical findings have shown that word frequency can influence eye movements in sentence reading (Inhoff & Rayner, 1986; Rayner & Duffy, 1986) and the time taken to make lexical decisions in isolated reading tasks (Schilling et al., 1998). It is widely believed that the emergence of word frequency effect marks the onset of lexical processing (Hudson & Bergman, 1985) and is therefore the most sensitive indicator to investigate when the brain initiates semantic extraction (Sereno & Rayner, 2000, 2003).
Understanding the time course of word frequency can shed light on the dynamics of lexical access and integration during reading. For example, if word frequency effects were to emerge early, it could suggest rapid and immediate lexical access, with the frequency of a word influencing early stages of word recognition. On the other hand, a later emergence of word frequency effect might indicate that frequency information becomes more influential during later stages of processing, such as word integration or semantic access. Most previous studies have investigated word frequency effects from an eye-movement perspective, and the timing and mechanism of this effect have been an intensive topic for over a half-century (Brysbaert et al., 2018; Reingold et al., 2012). Despite substantial evidence supporting that eye movements can be greatly affected by word frequency (Liu et al., 2017, 2019; Rayner & Raney, 1996), the dispute comes from the interpretation. Given the nature of saccadic programming, there is a possible oculomotor latency required to program an eye movement and the linguistic variable is not fast enough to act on the eye movement. Such inconsistent concerns are based on different reading models. For instance, the E-Z Reader model assumes that the lexical processing triggers saccadic programming, and hence word frequency can have an immediate effect on the duration of most fixations, and the saccade is driven by early lexical access (Reichle et al., 2003, 2006). On the contrary, the parallel SWIFT model assumes that lexical factors do not immediately influence eye movements upon word presentation. Instead, there may be a temporal delay between word presentation and the observable impact of lexical factors on eye movement behaviors (Engbert et al., 2005). Thus, the timing of word frequency effect could help to understand which reading model is closer to the truth.
Researchers often take the statistical information of the fixation duration data to infer how early lexical processing begins to influence saccade programming. Among these, the survival curve analysis is an innovative and promising way, which can find the divergence points caused by the word frequency effect (Reingold et al., 2012; Reingold & Sheridan, 2014). The survival curve analysis relies on calculating the survival rate of a given time t; the percentage of first fixation duration with a duration bigger than t is called the survival rate. Thus, when the t equals zero, the survival rate is 100%, and when the t increases, the survival rate declines and approaches zero. Researchers used this method to calculate the survival curve of high- and low-frequency words and considered the divergence point to be an early and sensitive estimate of the word frequency on first fixation durations. They found that the earliest divergence point regarding word frequency emerged as 145 ms in English (Reingold et al., 2012). In short, eye-movement studies provide a wealth of empirical studies about the lexical effect in reading, but because it is an indirect measurement and quite controversial opinions arise in the interpretation, the time course of the word frequency effect remains in doubt.
Apart from the eye-tracking studies, researchers have also endeavored to study the time course of word frequency effect using EEG, especially with the Rapid Serial Visual Presentation Paradigm (RSVP). To minimize the ocular artifact in EEG signals, the RSVP presents stimuli or words one by one in predefined time intervals (or with self-paced intervals in some variants). The earliest study could trace back to Sereno et al. (1998), whose results indicated that word frequency modulated the ERPs in the time window 132-164 ms on a few limited electrodes over occipital and anterior parietal regions. Studies aimed at emotional words found that the neutral low-frequency words induced a larger N1 (135-180 ms) waveform than the neutral high-frequency words in the bilateral posterior electrodes, and this frequency effect became significant in the opposite direction for the negative words (Scott et al., 2009). Evidence from the MEG supports that word frequency effects can occur either in the early (120-170 ms; Assadollahi & Pulvermüller, 2003) or later time window (300-400 ms; Embick et al., 2001). An early difference has sometimes been seen in previous research taking n-gram frequency as a variable and found that the signal-to-noise ratios from word frequency effects emerged already around 110 ms, mainly left occipital area (Hauk et al., 2006).
While previous ERPs studies seem to suggest a pattern that the earliest electrophysiological effects of word frequency occurred in the N1 time window (around 150 ms) over the occipital region, these results are still insufficient and sparse to answer the key time course question of lexical access in reading. This is mainly because the prior studies-isolated word recognition tasks, are not natural reading. In fact, the widely adopted EEG paradigm RSVP has been criticized in several ways for being fundamentally different from the natural reading, especially for the lack of parafoveal processing and the oculomotor execution (Kornrumpf et al., 2016). Since parafoveal information is a critical factor in facilitating reading, and the execution of free eye movement allows subjects greater and more flexible control over their reading style, the absence of these features makes it difficult to generalize the RSVP findings to natural reading.
A novel coregistration of Eye Movement and EEG has been introduced to the studies of natural reading, in which the fixation-related potentials (ERPs) are considered the reflection of the cognitive process (Degno et al., 2021; Dimigen et al., 2011). This approach simultaneously records subjects' eye movements and their EEGs while they read sentences and paragraphs normally. Unlike the ERPs, which use the onset of stimuli presentation to do the time lock analysis, ERPs lock the onset of the fixation. The FRP components are valuable in reflecting how the brain processes the visual stimuli over time, from early sensory processing to later cognitive processing. Like the traditional ERPs interpretation, early components like the Pl might reflect low-level visual processing, while later components like the N400 could signify semantic integration processes (Luck et al., 1996; McWeeny & Norton, 2020). Despite some concerns arising from this method (e.g., the baseline chosen and the overlay between two fast consecutive fixations), studies have gained reliable evidence when compared with the predictability effect in RSVP (Dimigen et al., 2011) and had wider application in recent studies (Degno & Liversedge, 2020; Degno et al., 2019a; Kretzschmar et al., 2015). The coregistration method is particularly advantageous for revealing word frequency effects when compared with both standalone eye-tracking studies and RSVP paradigms. Traditional eye-tracking studies are constrained by the saccadic programming latency, approximately 125 ms (Rayner, 2009; Schotter, 2018), which may not accurately capture the brain's instantaneous processing of lexical information. Similarly, RSVP studies are presented in an unnatural way and come at the cost of sacrificing parafoveal processing, hindering the overall processing efficiency. Therefore, the coregistration approach offers a unique and multimodal means of exploring the timing and dynamics of word frequency effects, allowing us to gain deeper insights into the intricacies of language processing during reading.
Until now, only a few studies have reported the word frequency effect in the coregistration method, and they are almost all from alphabetic language reading. Niefind and Dimigcn (2016) first reported a significant word frequency effect that occurred around 140-200 ms in FRPs over the occipital-temporal area, which was consistent with previous RSVP studies. However, this finding was not replicated in two other studies. Kretzschmar and colleagues (Kretzschmar et al., 2015) well designed the corpus to distinguish the neural correlate elicited by the word frequency effect and word predictability effect in the coregistration paradigm but only found the predictability effect on N400, while word frequency effect was reflected in a late narrow window (500-550 ms). A similar lack of significant results was reported in another study that manipulated word frequency (Degno et al., 2019b). Although Degno et al. (2019b) utilized a moderately loose statistical method-namely, the cluster-based permutation test-no significant effect was observed in any time window or region of interest in their study. On the other hand, a recent study that manipulated word frequency and preview condition found a significant frequency effect when the preview condition was valid. This effect was observed in the left occipitotemporal electrodes between 200 ms to 360 ms, as well as in a few centroparietal sites between 284 to 46 8ms, as detected by the permutation method (Milligan et al., 2023).
Therefore, although the word frequency effect robustly appeared in the eye-tracking and RSVP studies, neither can fully explain the time course of the word frequency effect. The interpretation from the former is mixed, and the paradigm from the latter does not capture the essence of reading. Recent advancements in the FRP technique offer a unique opportunity to investigate the frequency effect in natural reading tasks. As for the inconsistent results from previous several FRPs studies, we suspect it might be due to the possible covariates of other linguistic factors, such as the interactive effects of frequency and word length or predictability (Assadollahi & Pulvermüller, 2003; Penolazzi et al., 2007; Sereno ct al., 2020). Unlike other languages, Chinese characters have equal sizes and are well-suited for controlling low-level visual features (word length and word complexity). The "no word gap" characteristic resulted in no significant preferred viewing location and may also hinder further word segmentation and potentially prevent the word frequency effect in sentence reading (Li et al., 2011). To our knowledge, no studies have yet examined the neural correlates and time course of word frequency in the natural reading of Chinese. Since there is no consensus on whether parafoveal processing is affected by word frequency in Chinese reading (Ma et al., 2015; Zhang et al., 2019), our current study will focus on the word frequency effect in fovea and conduct some exploratory analysis in parafovea. The primary hypothesis centers on the timing of word frequency effects in FRPs results. Specifically, we hypothesize that if word frequency can affect semantic access rapidly, then we can observe word frequency effects in the early time window (e.g., Pl or N200). Conversely, if word frequency only influences the late stage of semantic access, we only detect significance in later time windows (i.e., N400). Taken together, the current study will make a first attempt at using coregistered eye tracking and EEG to examine direct neural evidence for word frequency effect in Chinese natural reading, to provide valuable insights into the generalizability of word frequency effects across different writing systems and methods.
Method
Participants
The sample-size calculation for the experiment was conducted as a priori using Pangea, a web-based application (Westfall, 2015). The effect size of word frequency showed in previous eye-movement studies was very high, ranging from 0.5 to 1.62 (Liu et al., 2019; Yu et al., 2021). Considering that FRPs will be affected by the signal-to-noise ratio and the potential trial losses in artefact reduction, we calculated the sample size using a moderately conservative Cohen's d value of 0.45. The word frequency was treated as the fixed factor, with 68 items in each frequency. This analysis revealed that a sample size of 30 subjects provides sufficient power (0.822).
Thus, a total of 32 native-Chinese-speaking students (17 females; mean age = 21.6 years, SD = 2.8, range: 18-29) from Sun Yat-sen University were recruited and paid for their participation. All participants had a normal or corrected-to-normal vision and were naïve to the purpose of the experiment, with informed consent before the experiment. Two participants were excluded because of poor eyetracking calibration or EEG recording. This experiment was approved by the research ethics committee in the Department of Psychology at Sun Yat-sen University.
Stimuli
One hundred and thirty-six pairs of two-character target words with different word frequencies (high vs. low) were selected for the experiment. Each target-word pair was embedded into a sentence frame, with the restriction that each sentence had 18-29 characters, and target words could not be the first or last four words of any sentence. Both the plausibility of target words in the sentence frames and the overall naturalness of these sentences were rated by an additional 80 participants using a 5-point scale (1 = not at all plausible/natural, 5 = very plausible/natural). The rating results showed that the plausibility (M = 4.2, SD = 0.4) and naturalness (M = 4.3, SD = 0.3) were high, and no reliable difference between target word frequency conditions (see Table 1 for detailed information). Another 80 participants were recruited to assess the cloze predictability, with the results showing target words could not be predicted from prior sentence context. Participants were provided with ten practice trials to familiarize experimental procedure before the 136 formal sentences, and only the formal sentences were entered into the final analysis. The frequency conditions were presented randomly, with participants viewing an equal number of high- and lowfrequency words (68 words per condition). In addition, half of the sentences were followed by questions to ensure that participants were engaged in sentence comprehension.
Apparatus
Participants were tested individually in a normally lit room. Stimuli were displayed against a black background on a 27-inch. LED computer screen (with a resolution of 2,560 x 1,440 and a refresh rate of 144 Hz) at a viewing distance of 70 cm. Each character subtended about 1 degree of visual angle. During the experiment, the participant sat down with a chin rest to reduce head motion. Viewing was binocular, but only the right eye was recorded by a desktop EyeLink 1000+ eye-tracking system (SR-Research Ltd), with a sampling rate of 1000 Hz.
The EEG signals were recorded from Neuroscan SynAmps (Compumedics Neuroscan, USA) from 64 scalp AgCl electrodes according to the standard 10/20 system, DC recording, at a sampling rate of 1000 Hz. Four EOG channels were also used to record the EEG signals associated with eye movements. AFz was taken as the ground electrode, and an additional default midline electrode between Cz and CPz was used as the online reference. All impedances were maintained at <10 KQ.
Procedure
A 3-point horizontal calibration was completed at the beginning of the experiment to ensure tracking accuracy. Prior to each trial, a driftcheck appeared at the center of the screen to check calibration, and then a 1° × 1° white box at the location of the first character in the sentences was displayed. Participants were required to fixate the square for 500 ms to start the trial. Once they had finished the silent reading of the sentence, they were instructed to fixate another white square on the right side of the screen for another 500 ms to terminate this trial and initiate the next one. Participants read the sentences at their own pace for comprehension, and the whole experiment lasted about 40 minutes, with the opportunity to rest every third part or any needed time. Example of the sentences can be see from Fig. 1.
Coregistration of eye movements and EEG signal
Markers were sent at the onset and offset of each trial from the display computer (using an OpenGL-based Psychophysics Toolbox 3 with EyeLink Toolbox extensions in MATLAB) to the computer recording the EEG signal, and the computer registered eye movements simultaneously. Then, the off-line synchronization was implemented from the EYE-EEG extension of the EEGLAB toolbox (Dimigen et al., 2011). We encountered a minor technical issue with the EEG recording equipment for a small subset of subjects (five subjects) that resulted in slight marker loss during data acquisition (usually occurred the first few trials at the beginning). To ensure the integrity of data analyses, we made the decision to extract and utilize the segments free from marker loss for subsequent analysis. Time deviations between the markers arriving at both recordings were equal or shorter than 1 ms in absolute value (M = 0.44, SD = 0.60), indicating that the remaining synchronization quality was good. The synchronization information from one subject can be see from Fig. 2.
Eye movement
The eye-movement analysis was mainly focused on the target words. Three duration-related measures were examined: first fixation duration (FFD), the duration of the initial fixation on words during first-pass reading; gaze duration (GD), or the sum of all first-pass fixations on words; and total-viewing time (TT), or the sum of all fixation durations on words. FFD is often linked to early word recognition processes, GD encompasses initial word processing and integration, and TT reflects overall reading time, including post-lexical processes (Rayner, 2009). Additionally, trials with durations that were shorter than 50 ms and longer than 800 ms, exceeding 3 standard deviations of their response, or with blink at the target words were excluded from all measures (removing 28.23% data). The remaining available trials were used to guide the FRPs analysis in the EEG. High-frequency target words had a skipping rate of 0.27, a single fixation rate of 0.66, and a regression rate of 0.25. In contrast, lowfrequency target words exhibited a skipping rate of 0.24, a single fixation rate of 0.63, and a regression rate of 0.31.
To examine whether there is a parafoveal-on-foveal effect in terms of word frequency in Chinese reading, we also conduct the eye-duration-related analysis on pretarget words. The measurements were FFD, GD, TT, and the last fixation duration.
EEG preprocessing
The EEG signal was preprocessed by the EEGLAB_v2019.0 and auto-defined scripts; the pipeline was mainly in accordance with the suggestions by Degno et al. (2021). The EEG data were band-pass filtered with a high-pass band edge frequency of 0.1 Hz and a low-pass band edge frequency of 50 Hz. Subsequently, the ocular artifacts were identified by the Extended Infomax ICA algorithm (ICA). Followed by a recent workflow OPTICAT in optimizing ICA training introduced by Dimigen (2020), we did three additional processes in data pruning: in order to improve the processing efficiency, the EEG signal was down-sampled to 250 Hz; in order to reduce the influence of low-frequency noise, the data were again subjected to a 1-Hz high-pass filtering; to better identify various saccade artifacts, we added additional segments (first 20 ms, last 10 ms) around the saccade to increase the proportion of saccade artifacts in the overall data, so as to better separate neural and nonneural components (Dimigen, 2020). For the identification of ICA components, we mainly used the ICLabel extension (Pion- Tonachini et al., 2019) and supplemented by visual inspection to mark obvious components caused by eye blinks, horizontal eye movements, and muscle movements (category by artifact with at least with 80% confidence). After obtaining the weights of ICA and the labels of artifact components, we transferred the weight matrix to the original data before ICA training. This weight transfer could maximize the performance of ICA and preserve the nonuniformity in the original data. Then, the marked components were removed accordingly. The number of removed components is 5.53, with standard deviation 2.81, and the excluded portion of the artifacts are 30% (eyes), 43% (muscle), 1% (heart), 1% (line noise), and 27% (channel noise).
The FRPs analysis was guided by eye movement preprocessing but not vice versa to maintain their independence to uncover word frequency effect. Each epoch was extracted 200 ms before and 800 ms from the onset of the fixation point. The epochs with an absolute value in any electrode greater than 100 µV were removed. Bad channels were also checked with the aid of the probability threshold method and interpolated by spherical channels when necessary. Then, according to previous studies (Degno, 2019a; Dimigen, 2020; Dimigen et al., 2011), the EEG signal was re-referenced to the average of all scalp electrodes and baselinecorrected to 100 ms before the fixation onset. The target word regions are pre-defined within the corpus material. Therefore, using the coordinate information provided by coregistered eye-tracking data, we can offline identify the respective positions of the target words. Subsequently, the first pass fixations that fall within these defined target word regions are selected for further analysis. Finally, 11% epochs were removed and an average of 0.63 channels were interpolated, the dataset of target words included 2,476 fixations (1,274 for low frequency and 1,202 for high frequency), with an average of 82.53 per participant.
Statistical analyses
The statistical analysis of eye movement data were performed using linear mixed models (LMMs), with the fixed factor being word frequency and random effects being subject and paired sentence item number. Contrast coding was used, with high frequency as reference.
The selection of time windows and ROI in FRPs analysis was based on previous studies to make a meaningful comparison with them (Degno et al., 2019b; Niefind & Dimigen, 2016; Sereno et al., 2020). In Degno et al.'s (2019b) study, they selected a series of time windows for the cluster- based permutation (0-70 ms, 70-120 ms, 120-300 ms, and 300-500 ms), and the ROI was defined as frontal, central, temporal, parietal, and occipital, covering almost all electrodes. On the other hand, Niefind and Dimigen (2016) specificity focused on the N400 time window (300-500 ms) and restricted the analysis to central-posterior (CZ, PZ, CP1, CP2) and occipital-temporal (P09, PO7, PO8, PO10) locations. From the recent study of Sereno et al. (2020), the time window and ROI were divided more finely, including baseline, 50-80 ms, P1 (80-120 ms), N1 (160-200 ms), N2 (200-300 ms), and N400 (350-550 ms). The baseline analysis serves to confirm that the observed word frequency condition differences in the waveform are not influenced by unstable or different baseline conditions and suggested by one committee report (Keil et al., 2014). Considering the main findings from previous studies and the shape of the current FRP waveform, we therefore choose the four following time windows in the FRPs analysis: -100-0 ms (Baseline), 100-140 ms (P1), 160-300 ms (N200), 300-500 ms (N400). This time window selection is highly consistent with previous studies and can sufficiently capture how the brain processes the visual stimuli from early sensory processing to later cognitive processing (Degno et al., 2019b; Sereno et al., 2020). Similarly, based on previous studies, four different regions of interest covering important areas were selected. The frontal-central (FC) includes FCz, C1, Cz, and C2; the central-parietal (CP) includes CPz, P1, Pz, and P2; and the leftand right occipitotemporal regions (LOT and ROT), include PO7, PO5, PO3, and O1, as well as PO8, PO6, PO4, and O2 electrodes, respectively. The statistical analysis of EEG data were also performed using LMMs, in which the fixed factor was word frequency and ROI (entered as treatment contrast with the high frequency and frontal central as the baseline), and the random effect was the subjects themselves. In EEG analysis, the ANOVA test was used to calculate the main effect and the interaction effect, the significance of model was determined by comparing to the null model dropping out the fixed factors (Riha et al., 2020). The model started from the full model including both intercepts and slopes for subject and sentence item, and if it failed to converge or overfit, the random slope would be adjusted. The final mode can be found in shared analysis code (https:// osf. io/ rym8f/). The Satterthwaite approximation was used to determine the significance. Post-hoc analyses were performed to get contrasts and were adjusted by using the multivariate t distribution (mvt) in the emmeans package. All statistical analyses were performed under R (Version 4.1.0) using packages lmerTest, emmeans, and lme4.
Results
Accuracy
The mean accuracy to comprehension questions was 97%, indicating that participants understood and engaged well in the experiment.
Eye movement
The statistical results revealed that the target word frequency affected fixation-duration measures significantly (FFD: b = 6.93, SE = 3.39, t = 2.04, p = .04; GD: b = 25.88, SE = 5.65, t = 4.58, p < .001; and TT: b = 60.05, SE = 17.74, t = 3.39, p = .001), showing that participants had longer viewing time on low-frequency target words, compared with high-frequency target words.
The results also showed that the target word frequency did not significantly affect any durations on the pre-target words (FFD: b = 3.09, SE = 4.11, t = 0.75, p = .45; GD: b = -0.84, SE = 6.88, t = -0.12, p = 0.90; and TT: b = 8.54, SE = 17.59, t = 0.49, p = .63). Even the last fixations on the pre-target words were not affected by word frequency (b = 2.14, SE = 4.79, t = 0.45, p = .66). Please see the results from Fig. 3.
Fixation-related potentials
Baseline (-100 to 0 ms)
The linear mixed-effect model with baseline EEG amplitude as the dependent variable revealed a main effect of ROI region, F(3, 894) = 6.16, p < .001. The post hoc analysis showed that the frontal central area induced larger positive amplitude waveforms than the bilateral occipital regions. No significant main effects were found for word frequency, F(1, 29) = 0.005, p = .95, nor the interaction between ROI and word frequency, F(3, 894) = 0.18, p = .91. These results implied that there was a baseline difference resulting from regional differences, but the word frequency did not affect the baseline level. Please see Table 2 for details.
P1 time window (100 to 140 ms)
In this time window, no significant main effect of word frequency, F(1, 29.01) = 1.53, p = .23, nor an interaction between word frequency and ROI was found, F(3, 819.29) = 2.52, p = .06, except that the effect of ROI continued to be significant, F(3, 29.15) = 44.05, p < .001. The P1 amplitude in four ROIs differed greatly compared with each other (all ps < .0001). Although the high-frequency words in bilateral occipitotemporal regions seemed to elicit a larger P1 amplitude during this time window, this effect did not reach statistical significance (all ps > .05). As part of data exploration, we tried to narrow down the time window to 110 to 130 ms and still failed to find statistical significance (all ps > .05).
N200 time window (160 to 300 ms)
The amplitude in the N200 time window was significantly affected by ROI, F(3, 29) = 17.51, p < .001. The main effect of word frequency and the interaction between word frequency and ROI also reach statistical significance, F(1, 28.99) = 5.10, p = .053, F(3, 807) = 11.65, p < .001, respectively. The simple main effect analysis showed that the word frequency affected the bilateral occipital with the low-frequency words inducing more negative going amplitudes compared with the high-frequency ones (for leftOT, b = 0.45, SE = 0.15, t = 3.04, p = .004; for right OT, b = 0.56, SE = 0.15, t =3.76, p < .001). The word frequency effect can be seen from the topographic plot (please see Fig. 4).
N400 time window (300 to 500 ms)
In the late N400 time window, there was a significant main effect of word frequency, F(1, 29.01) = 16.91, p < .001, and ROI, F(3, 29.01) = 11.90, p < .001. The interaction between word frequency and ROI was also significant, F(3, 807) = 24.02, p < .001. Post hoc analysis showed that similar to the trend in the previous time window, the negative waveforms of low-frequency words were stronger in bilateral occipitotemporal areas (for leftOT, b = 0.66, SE = 0.13, t = 5.03, p < .001; for right OT, b = 0.68, SE = 0.13, t = 5.20, p < .001). This trend was also spread to the central parietal areas (b = 0.61, SE = 0.13, t = 4.68, p < .001) and displayed a reversal direction in frontal central regions (b = -0.29, SE = 0.13, t = -2.20, p = .03). Please see the results from Fig. 5.
Discussion
The current study sought to find neural correlates of lexical processing in natural reading by manipulating within-sentence target-word frequencies with the coregistration of eye-tracking and EEG methods. The results from eye-movement data showed that, compared with high-frequency words, participants spent longer viewing time on the low-frequency words, which is consistent with previous studies (Liu et al., 2019; Rayner & Raney, 1996; Yu et al., 2021). There was no significant parafoveal-on-foveal effect observed, as the frequency of the target word did not have any impact on the duration of eye movements on the words preceding it. Importantly, the FRP analysis showed that the word frequency effect began to be significant in the N200 time window (160-300 ms), with the low-frequency words inducing greater negativity and first observed in the occipital-temporal area. Finally, the word frequency effect was also reflected by the N400 spreading to wider regions. Thus, our study provides the first evidence that neural correlates of word frequency effect are evident in natural Chinese language reading.
In the past 20 years, the word frequency effect has accumulated considerable evidence from EEG, with significant findings often coming from RSVP studies (Condray et al., 2010; Grainger et al., 2012; Hauk & Pulvermüller, 2004). While the results from RSVP implied that the initial electrophysiological effects of word frequency might happen as early as around 120 ms over the occipital region (Hauk et al., 2006; Scott et al., 2009; Sereno et al., 1998), the findings from natural reading paradigm failed to yield consistent conclusions. Kretzschmar et al. (2015) did not detect a significant effect of word frequency in the EEG-EM coregistration study in any early time windows or N400 component except for a difference restricted to 500-550 ms (Kretzschmar et al., 2015). In their research, despite manipulating the word frequency and word predictability orthogonally, each word pair was embedded into different sentence frames. As suggested by previous studies that the sentence context could modulate cognitive processing (Himmelstoss et al., 2020; Van Petten & Kutas, 1990), which might be why they failed to find a reliable frequency effect in the natural reading paradigm. Additionally, Milligan et al.'s (2023) study using the same sentence frame for each pair of target word also detected the word frequency effect over a few electrodes from 200-450 ms in the valid preview condition, suggesting that word frequency effect can be reliably reflected across different writing system. In the current study, we used the same strategy for each pair of target words to exclude the interference of irrelevant factors, which is more conducive to obtaining word frequency properties.
The statistical power might also explain the inconsistent neural results between ours and previous studies. Degno et al. (2019b) reported a significant word frequency effect in their eye-movement data but found no significant impact on neural correlates. Specifically, they manipulated the preview condition and word frequency simultaneously, with only 18 sentences per condition, resulting in their sentences per condition being far less than the current study (i.e., 68 sentences per condition in our experiment). Though they have doubled the target words in each sentence to increase the analyzable items, the final number was still only half of ours. Besides, our analysis showed that the choice of reference (mastoids) did not affect the conclusion about the waveform and distribution of the FRPs (please see the Appendix Fig. 6 ). In summary, it seems that even though the word frequency effects are well established in a wealth of previous studies, the application of EM-EEG coregistration to explore the word frequency effect in natural reading still requires careful experimental design and proper materials consideration.
The current results indicated that the word frequency effect significantly started from the N200 time window and lasted to the later N400 window. We considered this is a reliable result from the following aspects. First, this effect was unlikely to be attributable to eye-movement artifacts because ICA eliminated most of the ocular motor activity, the trials with blinks on target words were excluded, and the initial effect occurred in the occipitotemporal region, which is less susceptible to eye movements. Second, the number of remaining fixations for high-frequency and low-frequency were almost equal (1,202 and 1,274 for high and low frequency, respectively), which also indicated that the two conditions were equivalent in terms of the signal-to-noise ratio of FRPs. Third, there might be a view that the fixations in the natural reading will be relatively dense, and the consecutive fixations may overlay with each other and interfere with the fixation-related potentials, thus covering up the real effect. A newly introduced unfold method is beneficial in handling the waveform overlay problems in FRPs analysis (Ehinger & Dimigen, 2019), but the influence from the previous fixations on the waveform of the current fixations were likely averaged and counterbalanced over a large number of trials (Dimigen et al., 2011). Furthermore, our empirical data (baseline analysis) also supported this assumption, as no significant differences were found in the pre-fixation period. Taken together, the word frequency effects were steadily available in natural reading, as reflected by different FRPs.
Given the above results from FRPs analysis and the basic fact that the average fixation time is only about 250 ms in the natural Chinese reading (Li et al., 2011; Rayner et al., 2007), why did the word frequency effect not appear earlier? For example, in P1 window (80-120 ms)? It is very likely that lexical access is more cognitive demanding that follows the initial visual characteristics (e.g., word complexity) and orthographic processing, potentially explaining the absence of significant word frequency effects in these early stages. Interestingly, some previous ERPs studies suggested that the N200 time window is unaffected by semantic manipulation but more related to the orthographic processing in Chinese and other writing system close to Chinese (Korea) (Du et al., 2014; Liu & Zhang, 2023). However, our study clearly showed that the word frequency modulated N200 response, implying that lexical access may indeed occur earlier during natural reading. This can also explain why previous RSVP studies mostly identified semantic processing in the N400 (Kutas & Federmeier, 2011) rather than at early time windows. The coregistration paradigms with complete sentence presentation and offering parafoveal processing, are more sensitive to discovering early neural evidence. Concerning the observation that the word frequency effect becomes more prominent and extends across a broader area in the N400, our interpretation aligns with a recent similar study by Milligan et al. (2023), suggesting that lexical access likely involves an accumulative process rather than a sudden event. Future reading models that integrate eye movements and physiological responses should also take this cumulative or even partially overlapping process into account. Besides, we doubt the idea that reselecting the time window or ROI in a more refined grain level can detect an earlier frequency effect. From a data-driven perspective, permutation tests or point-by-point t tests may have higher sensitivity in discovering new effects (Sassenhagen & Draschkow, 2019), but these approaches also tend to introduce some unexplained or meaningless significant results. Thus, in the present study, we used more of the hypothesis-driven method by predefined the windows and ROI. The current study provides empirical evidence of the feasibility and stability of using FRPs to detect word frequency effects in natural reading, and the significance of the word frequency effect in the N200 implies that lexical information can exert fast influence, aligning with the E-Z Reader model.
In conclusion, the current research was the first to find stable neural correlates for word frequency effect in a natural reading of Chinese. We suggest that appropriate control of linguistic variables and a sufficient amount of material contributed to the discovery of this effect. Our study not only sheds new light to understand lexical processing in natural reading but also provides a solid methodological basis for further studies when examining neural correlates in a natural manner.
References
Assadollahi, R., & Pulvermüller, F. (2003). Early influences of word length and frequency: A group study using MEG. NeuroReport, 14(8), 1183-1187. https:// doi. org/ 10. 1097/ 00001 756- 20030 6110- 00016
Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45-50. https:// doi. org/ 10. 1177/ 09637 21417 727521
Condray, R., Siegle, G. J., Keshavan, M. S., & Steinhauer, S. R. (2010). Effects of word frequency on semantic memory in schizophrenia: Electrophysiological evidence for a deficit in linguistic access. International Journal of Psychophysiology, 75(2), 141-156. https:// doi. org/ 10. 1016/j. ijpsy cho. 2009. 10. 010
Degno, F., & Liversedge, S. P. (2020). Eye movements and fixationrelated potentials in reading: A review. Vision, 4(1), 1-37. https:// doi. org/ 10. 3390/ visio n4010 011
Degno, F., Loberg, O., & Liversedge, S. P. (2021). Coregistration of eye movements and fixation-Related potentials in natural reading: Practical issues of experimental design and data analysis. Collabra: Psychology, 7(1), 1-28. https:// doi. org/ 10. 1525/ colla bra. 18032
Degno, F., Loberg, O., Zang, C., Zhang, M., Donnelly, N., & Liversedge, S. P. (2019a). A coregistration investigation of inter-word spacing and parafoveal preview: Eye movements and fixationrelated potentials. PLOS ONE, 14(12), e0225819. https://d oi.o rg/ 10. 1371/ journ al. pone. 02258 19
Degno, F., Loberg, O., Zang, C., Zhang, M., Donnelly, N., & Liversedge, S. P. (2019b). Parafoveal previews and lexical frequency in natural reading: Evidence from eye movements and fixationrelated potentials. Journal of Experimental Psychology: General, 148(3), 453-473. https:// doi. org/ 10. 1037/ xge00 00494
Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117. https:// doi. org/ 10. 1016/j. neuro image. 2019. 116117
Dimigen, O., Sommer, W., Hohlfeld, A., Jacobs, A. M., & Kliegl, R. (2011). Coregistration of eye movements and EEG in natural reading: analyses and review. Journal of Experimental Psychology: General, 140(4), 552-572. https:// doi. org/ 10. 1037/ a0023 885
Du, Y., Zhang, Q., & Zhang, J. X. (2014). Does N200 reflect semantic processing?-An ERP study on Chinese visual word recognition. PLOS ONE, 9(3), e90794.
Ehinger, B. V., & Dimigen, O. (2019). Unfold: An integrated toolbox for overlap correction, non-linear modeling, and regression-based EEG analysis. PeerJ, 7, e7838. https:// doi. org/ 10. 7717/ peerj. 7838
Embick, D., Hackl, M., Schaeffer, J., Kelepir, M., & Marantz, A. (2001). A magnetoencephalographic component whose latency reflects lexical frequency. Cognitive Brain Research, 10(3), 345- 348. https:// doi. org/ 10. 1016/ s0926- 6410(00) 00053-7
Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R. (2005). SWIFT: A dynamical model of saccade generation during reading. Psychological Review, 112(4), 777-813. https:// doi. org/ 10. 1037/ 0033- 295X. 112.4. 777
Grainger, J., Lopez, D., Eddy, M., Dufau, S., & Holcomb, P. J. (2012). How word frequency modulates masked repetition priming: An ERP investigation. Psychophysiology, 49(5), 604-616. https:// doi. org/ 10. 1111/j. 1469- 8986. 2011. 01337.x
Hauk, O., Davis, M. H., Ford, M., Pulvermüller, F., & Marslen-Wilson, W. D. (2006). The time course of visual word recognition as revealed by linear regression analysis of ERP data. NeuroImage, 30(4), 1383- 1400. https:// doi. org/ 10. 1016/j. neuro image. 2005. 11. 048
Hauk, O., & Pulvermüller, F. (2004). Effects of word length and frequency on the human event-related potential. Clinical Neurophysiology, 115(5), 1090-1103. https:// doi. org/ 10. 1016/j. clinph. 2003. 12. 020
Himmelstoss, N. A., Schuster, S., Hutzler, F., Moran, R., & Hawelka, S. (2020). Coregistration of eye movements and neuroimaging for studying contextual predictions in natural reading. Language, Cognition and Neuroscience, 35(5), 595-612. https:// doi. org/ 10. 1080/ 23273 798. 2019. 16161 02
Hudson, P. T., & Bergman, M. W. (1985). Lexical knowledge in word recognition: Word length and word frequency in naming and lexical decision tasks. Journal of Memory and Language, 24(1), 46-58. https:// doi. org/ 10. 1016/ 0749- 596X(85) 90015-4
Inhoff, A. W., & Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception & Psychophysics, 40(6), 431-439. https:// doi. org/ 10. 3758/ BF032 08203
Keil, A., Debener, S., Gratton, G., Junghöfer, M., Kappenman, E. S., Luck, S. J., ..., (2014). Committee report: Publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1-21. https:// doi. org/ 10. 1111/ psyp. 12147
Kornrumpf, B., Niefind, F., Sommer, W., & Dimigen, O. (2016). Neural correlates of word recognition: A systematic comparison of natural reading and rapid serial visual presentation. Journal of Cognitive Neuroscience, 28(9), 1374-1391. https:// doi. org/ 10. 1162/ jocn_a_ 00977
Kretzschmar, F., Schlesewsky, M., & Staub, A. (2015). Dissociating word frequency and predictability effects in reading: Evidence from coregistration of eye movements and EEG. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1648-1662. https:// doi. org/ 10. 1037/ xlm00 00128
Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621-647. https:// doi. org/ 10. 1146/ annur ev. psych. 093008. 131123
Li, X., Liu, P., & Rayner, K. (2011). Eye movement guidance in Chinese reading: Is there a preferred viewing location? Vision Research, 51(10), 1146-1156. https:// doi. org/ 10. 1016/j. visres. 2011. 03. 004
Liu, J., & Zhang, Y. (2023). Language experience modulates the visual N200 response for disyllabic Chinese words: An event-related potential study. Brain Sciences, 13(9), 1321. https:// doi. org/ 10. 3390/ brain sci13 091321
Liu, Y. P., Huang, R., Li, Y. G., & Gao, D. G. (2017). The word frequency effect on saccade targeting during Chinese reading: Evidence from a survival analysis of saccade length. Frontiers in Psychology, 8, 116. https:// doi. org/ 10. 3389/ fpsyg. 2017. 00116
Liu, Y. P., Yu, L. L., Fu, L., Li, W. W., Duan, Z. Y., & Reichle, E. D. (2019). The effects of parafoveal word frequency and segmentation on saccade targeting during Chinese reading. Psychonomic Bulletin & Review, 26(4), 1367-1376. https:// doi. org/ 10. 3758/ s13423- 019- 01577-x
Luck, S. J., Vogel, E. K., & Shapiro, K. L. (1996). Word meanings can be accessed but not reported during the attentional blink. Nature, 383(6601), 616-618. https:// doi. org/ 10. 1038/ 38361 6a0
Ma, G., Li, X., & Rayner, K. (2015). Readers extract character frequency information from nonfixated-target word at long pretarget fixations during Chinese reading. Journal of Experimental Psychology: Human Perception and Performance, 41(5), 1409-1419. https:// doi. org/ 10. 1037/ xhp00 00072
McWeeny, S., & Norton, E. S. (2020). Understanding event-related potentials (ERPs) in clinical and basic language and communication disorders research: A tutorial. International Journal of Language & Communication Disorders, 55(4), 445-457. https:// doi. org/ 10. 1111/ 1460- 6984. 12535
Milligan, S., Antúnez, M., Barber, H. A., & Schotter, E. R. (2023). Are eye movements and EEG on the same page?: A coregistration study on parafoveal preview and lexical frequency. Journal of Experimental Psychology: General, 152(1), 188-210. https:// doi. org/ 10. 1037/ xge00 01278
Niefind, F., & Dimigen, O. (2016). Dissociating parafoveal preview benefit and parafovea-on-fovea effects during reading: A combined eye tracking and EEG study. Psychophysiology, 53(12), 1784-1798. https:// doi. org/ 10. 1111/ psyp. 12765
Penolazzi, B., Hauk, O., & Pulvermüller, F. (2007). Early semantic context integration and lexical access as revealed by event-related brain potentials. Biological Psychology, 74(3), 374-388. https:// doi. org/ 10. 1016/j. biops ycho. 2006. 09. 008
Pion-Tonachini, L., Kreutz-Delgado, K., & Makeig, S. (2019). ICLabel: An automated electroencephalographic independent component classifier, dataset, and website. NeuroImage, 198, 181-197. https:// doi. org/ 10. 1016/j. neuro image. 2019. 05. 026
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62(8), 1457-1506. https:// doi. org/ 10. 1080/ 17470 21090 28164 61
Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14(3), 191-201. https:// doi. org/ 10. 3758/ BF031 97692
Rayner, K., Li, X., Williams, C. C., Cave, K. R., & Well, A. D. (2007). Eye movements during information processing tasks: Individual differences and cultural effects. Vision research, 47(21), 2714- 2726. https:// doi. org/ 10. 1016/j. visres. 2007. 05. 007
Rayner, K., & Raney, G. E. (1996). Eye movement control in reading and visual search: Effects of word frequency. Psychonomic Bulletin & Review, 3(2), 245-248. https:// doi. org/ 10. 3758/ BF032 12426
Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E-Z Reader: A cognitive- control, serial-attention model of eye-movement behavior during reading. Cognitive Systems Research, 7(1), 4-22. https:// doi. org/ 10. 1016/j. cogsys. 2005. 07. 002
Reichle, E. D., Rayner, K., & Pollatsek, A. (2003). The EZ Reader model of eye-movement control in reading: Comparisons to other models. Behavioral and Brain Sciences, 26(4), 445-476. https:// doi. org/ 10. 1017/ s0140 525x0 30001 04
Reingold, E. M., & Sheridan, H. (2014). Estimating the divergence point: A novel distributional analysis procedure for determining the onset of the influence of experimental variables. 5(1432). https:// doi. org/ 10. 3389/ fpsyg. 2014. 01432
Reingold, E. M., Reichle, E. D., Glaholt, M. G., & Sheridan, H. (2012). Direct lexical control of eye movements in reading: Evidence from a survival analysis of fixation durations. Cognitive Psychology, 65(2), 177-206. https:// doi. org/ 10. 1016/j. cogps ych. 2012. 03. 001
Riha, C., Güntensperger, D., Kleinjung, T., & Meyer, M. (2020). Accounting for heterogeneity: Mixed-effects models in restingstate EEG data in a sample of tinnitus sufferers. Brain Topography, 33, 413-424. https:// doi. org/ 10. 1007/ s10548- 020- 00772-7
Sassenhagen, J., & Draschkow, D. (2019). Cluster-based permutation tests of MEG/EEG data do not establish significance of effect latency or location. Psychophysiology, 56(6), e13335. https:// doi. org/ 10. 1111/ psyp. 13335
Schilling, H. E., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory & Cognition, 26(6), 1270-1281. https:// doi. org/ 10. 3758/ BF032 01199
Schotter, E. R. (2018). Reading ahead by hedging our bets on seeing the future: Eye tracking and electrophysiology evidence for parafoveal lexical processing and saccadic control by partial word recognition. Psychology of Learning and Motivation, 68, 263-298. https:// doi. org/ 10. 1016/ bs. plm. 2018. 08. 011
Scott, G. G., O'Donnell, P. J., Leuthold, H., & Sereno, S. C. (2009). Early emotion word processing: Evidence from event-related potentials. Biological Psychology, 80(1), 95-104. https:// doi. org/ 10. 1016/j. biops ycho. 2008. 03. 010
Sereno, S. C., Hand, C. J., Shahid, A., Mackenzie, I. G., & Leuthold, H. (2020). Early EEG correlates of word frequency and contextual predictability in reading. Language, Cognition and Neuroscience, 35(5), 625-640. https:// doi. org/ 10. 1080/ 23273 798. 2019. 15807 53
Sereno, S. C., & Rayner, K. (2000). The when and where of reading in the brain. Brain and Cognition, 42(1), 78-81. https:// doi. org/ 10. 1006/ brcg. 1999. 1167
Sereno, S. C., & Rayner, K. (2003). Measuring word recognition in reading: Eye movements and event-related potentials. Trends in Cognitive Sciences, 7(11), 489-493. https:// doi. org/ 10. 1016/j. tics. 2003. 09. 010
Sereno, S. C., Rayner, K., & Posner, M. I. (1998). Establishing a timeline of word recognition: Evidence from eye movements and event-related potentials. NeuroReport, 9(10), 2195-2200. https:// doi. org/ 10. 1097/ 00001 756- 19980 7130- 00009
Van Petten, C., & Kutas, M. (1990). Interactions between sentence context and word frequency in event-related brain potentials. Memory & Cognition, 18(4), 380-393. https:// doi. org/ 10. 3758/ BF031 97127
Westfall, J. (2015). PANGEA: Power analysis for general ANOVA designs. Unpublished manuscript. Available at http:// jakew estfa ll. org/ publi catio ns/ pangea. pdf
Yu, L., Liu, Y., & Reichle, E. D. (2021). A corpus-based versus experimental examination of word-and character-frequency effects in Chinese reading: Theoretical implications for models of reading. Journal of Experimental Psychology: General, 150(8), 1612. https:// doi. org/ 10. 1037/ xge00 01014
Zhang, M., Liversedge, S. P., Bai, X., Yan, G., & Zang, C. (2019). The influence of foveal lexical processing load on parafoveal preview and saccadic targeting during Chinese reading. Journal of Experimental Psychology: Human Perception and Performance, 45, 812-825. https:// doi. org/ 10. 1037/ xhp00 00644
Appendix
This appendix shows that the conclusions about the waveform and distribution of the FRPs were not affected by the choice of reference (mastoids).
Acknowledgements Author note This research was supported by grants from the National Social Science Fund of China (21BYY105).
Author contributions X.M.: conceptualization; methodology; data collection & analysis; manuscript writing & revision. S.C.: data collection & analysis; manuscript writing, discussion, and revision. X.X.: Experimental design discussion; methodology; data collection & analysis; manuscript writing, discussion, and revision. B.Y.: methodology; data collection & analysis; manuscript writing, discussion, and revision. Y.L.: conceptualization; material provider; manuscript writing, and supervising.
Data availability The materials and data supporting the findings of this study are openly available from the Open Science Framework at https:// osf. io/ rym8f/.
Copyright Springer Nature B.V. Jan 2025