Content area
While the neural underpinnings of semantic cognition have been extensively studied, the brain mechanisms that allow the extraction of meaning from the initially perceptual visual linguistic input are less understood. These mechanisms have typically been explored through the analysis of psycholinguistic properties that reflect key aspects of semantic processing (e.g., word frequency, familiarity or concreteness), and more recently, through natural language processing (NLP) models. However, both approaches lack a direct comparison of sublexical (i.e., phonological and orthographic) and lexico-semantic aspects of words, with NLP models. Understanding how sublexical and lexico-semantic systems interact and/or overlap is a current challenge in the field of neurobiology of language. In this fMRI study, 30 participants performed a lexical decision task in the MRI, where all aforementioned sublexical and lexico-semantic properties were carefully controlled. The resulting models reflected either sublexical, semantic, or NLP (word vector) relations, which were compared to multivariate brain patterns in representational similarity analysis. Our findings reveal that sublexical and lexico-semantic representations recruit different areas of the left inferior frontal gyrus (IFG) and ventral occipitotemporal cortex (vOTC). The anterior IFG and vOTC represented semantic models, while regions posterior to the IFG, like supplementary motor area (SMA), or to the vOTC, like areas V3-V4, showed representations of sublexical models. Importantly, both semantic and NLP models converged in semantic hubs, including the inferior anterior temporal lobe (ATL), parahippocampal gyrus, or anterior IFG . The implications of these results are discussed in line with the most recent neuroscientific evidence.
1 Introduction
Understanding how the human brain enables quick recognition of relevant information from complex perceptual input is vital in our constantly developing world. In a few seconds, we are able to read this paragraph and comprehend the information it conveys. For this to happen, we first need to recognise the symbols that constitute the paragraph. This entails quick visual categorisation of the linguistic stimuli, and linking those stimuli to the sounds of language (i.e., phonological processing) and, ultimately, to whole words (i.e., sublexical processing). We also need to access the previously learned information the words represent (i.e., lexico-semantic processing). These processes require a perfectly orchestrated set of neural mechanisms ( Yeatman and White, 2021).
There is extensive knowledge of the neural underpinnings of sublexical and lexicosemantic processing mechanisms, each studied independently. Sublexical processing recruits areas of the
dorsal language networks, which connect the posterior superior temporal gyrus (pSTG), inferior parietal lobule (IPL), supplementary motor area (SMA) and posterior inferior frontal gyrus (IFG
pars opercularis)
. This group of pathways allows the auditory-motor mapping of language sounds (
Friederici, 2023;
Friederici, 2013;
Hickok and Poeppel, 2004). In turn, lexico-semantic processing involves areas of the
ventral language networks, connecting the ventral occipitotemporal cortex (vOTC) with inferior temporal gyrus (ITG), anterior temporal lobe (ATL) and anterior and orbital portions of the IFG (i.e.,
pars triangularis and
pars orbitalis). This route enables the extraction of previously learned information from whole words (
Friederici, 2013;
Hagoort, 2013;
Hickok and Poeppel, 2004;
Ralph et al., 2017).
Fig. 1
The study of sublexical and lexico-semantic mechanisms has often been approached through the analysis of specific psycholinguistic features. Lexico-semantic representations in semantic hubs (i.e., ATL or IFG) have been better understood thanks to the analysis of word concreteness (i.e., concrete versus abstract concepts) ( Binder et al., 2009; Hoffman, et al., 2015), frequency ( Sánchez et al., 2023; Schuster et al., 2016), or familiarity ( Shinozuka et al., 2021). Similarly, manipulating orthographic features such as word length has helped refine our understanding of the functional specialisation within the vOTC ( Lerma-Usabiaga, et al., 2018; White et al., 2019). Finally, the analysis of phonological properties has revealed how areas like the pSTG, posterior IFG or SMA participate in sublexical processing ( Carreiras et al., 2006; Chiarello et al., 2018; Diaz et al., 2021).
These possibilities have considerably improved with recent advances in natural language processing (NLP) and with the popularisation of representational similarity analysis (RSA). Recent fMRI RSA studies have tried comparing brain activation patterns with several different multivariate models that express specific properties of language. For instance, some studies have used models built from affective valence versus concreteness ( Meersmans et al., 2022), semantic associative strength ( Meersmans et al., 2020) or word vectors ( Liuzzi et al., 2023). While these multivariate designs have been extremely helpful in revealing how conceptual information is organised in terms of word co-occurrence and similar semantic features, they cannot shed light on language sublexical processes that are not semantic in nature (e.g., phonology or orthography), but are key for word recognition, and are also known to influence brain activation patterns ( Blumenthal-Dramé et al., 2017; Carreiras et al., 2014). Moreover, many of the above-mentioned psycholinguistic variables interact, and produce neural effects in overlapping regions like the IFG or vOTC.
The main objective of the present study is to disentangle the neural representational structure of sublexical and lexico-semantic linguistic properties, and to assess how these are reflected in both traditional psycholinguistic measures and NLP-derived word embeddings. We used representational similarity analysis (RSA) to compare brain activity patterns during a single-word fMRI recognition task with multivariate models capturing sublexical (e.g., orthographic distance, phonological neighbours) and lexico-semantic (e.g., word frequency, concreteness, familiarity) properties, as well as a distributed semantic representation from a Word2Vec model. This approach allows us to investigate where in the brain these distinct linguistic dimensions are encoded and whether their associated neural representations converge or remain separable. Based on prior research, we hypothesised that semantic representations—both psycholinguistic and NLP-derived—would yield high similarity in ventral language network regions, particularly the anterior and orbital IFG, ATL, and anterior vOTC. In contrast, we expected sublexical properties to recruit areas within the dorsal language network, such as the posterior IFG ( pars opercularis) or SMA, reflecting their involvement in phonological and orthographic processing. Lastly, we anticipated that NLP models would show a distinctive spatial profile within key semantic hubs, such as the inferior ATL or anterior IFG, capturing specific distributional patterns that may complement—but not necessarily replace—those indexed by psycholinguistic variables. Through this design, we aim to clarify how psycholinguistic and computational models complement each other, and to characterise the distinct and shared neural architectures that support different levels of linguistic processing.
2 Methods
2.1 Participants
A total of 30 Spanish-speaking, right-handed participants (7 males) aged between 19 and 40 years old (average = 28.5 ± 6.933 years) took part in the study. All participants spoke Spanish as their first language, had normal or corrected-to-normal vision, and had no history of reported neurological or psychiatric disorders. Of the initial 32 participants, two of them were discarded due to excessive head motion. All participants received monetary compensation for their voluntary participation and gave informed consent to take part in the study, in compliance with the regulations established by the BCBL Ethics Committee and the guidelines of the Helsinki Declaration.
2.2 Stimuli and materials
The stimuli included a total of 960 Spanish words. Half of these words were tested in the MRI, while the other half were tested outside the scanner (see below). All words were nouns extracted from EsPal ( Duchon et al., 2013), ranging from 4 to 10 letters in length, and including concreteness, familiarity and imageability subjective ratings. These ratings were available in EsPal, and came from normative data described elsewhere (Duchon et al., 2013). They are subjective ratings of the word, ranging from 1 to 7, with 1 indicating completely abstract (in the case of concreteness), completely unfamiliar (in the case of familiarity), or an object that is completely impossible to imagine (in the case of imageability). Additionally, all words were characterised using objective measures including frequency of occurrence, bigram frequency, biphone frequency, number of phonological neighbours, and orthographic Levenshtein distance ( OLD20; see Yarkoni et al., 2008).
Within each subset, we created discrete categories based on frequency, familiarity and concreteness. Each subset of 480 words was split into 240 abstract and 240 concrete words. Each word group was also subsequently divided into 4 conditions: a) 60 words with low familiarity and low frequency; b) 60 words with low familiarity and high frequency; c) 60 words with high familiarity and low frequency; and d) 60 words with high familiarity and high frequency. The cutoff points for each of the variables (concreteness, familiarity and frequency) were based on their approximate median values. Thus, abstract concepts were referred to by words with a rating of 4.5/7 or lower, and concrete concepts include words above this value. Likewise, the cutoff value for familiarity was set to 4.5/7. For word frequency we used a Zipf (
Brysbaert et al., 2018) scaled value of 3.5 as a cutoff. Although discrete groups were used for behavioural contrasts and for simplifying the univariate analyses, we utilized a continuum for all factors of interest, also including sublexical properties, thus allowing us to perform RSA and hierarchical regression analyses. The two subsets of words were matched in all variables of interest.
Fig. 2
All materials described above were tested in two lexical decision tasks, one inside the scanner and another one outside the scanner. The lexical decision task allowed for measurement of participants’ reaction times (RTs) while reading and processing words, while also avoiding excessive overload due to complex decision making. The two tasks differed in the proportion of words and pseudowords as well as in the ITIs, but were otherwise identical. To maintain a high number of observations, while ensuring sufficient spacing between trials (over 6 s) for RSA, the lexical decision task inside the scanner contained only 11 % of non-word trials (60 items). The behavioural lexical decision task outside the scanner, with 50 % of non-words and 50 % of words, acted as a control for any potential task effects derived from the differential proportion of words versus pseudowords in the lexical decision task inside the scanner. This task corroborated that no differences in the general pattern of lexicality effects arose from the different proportion of nonwords in the task (see supplementary materials, Table S1).
2.3 Procedure
Firstly, high resolution T1 images were acquired. Next, during the functional MRI BOLD sequence, participants performed the lexical decision task. Participants were instructed to carefully read a series of letter strings that may or may not form real words in Spanish. Participants would determine whether each word was real or not by pressing the corresponding button. Button assignments were counterbalanced across participants, and the order of the presented items was unique for every two participants (i.e., a total of 15 predefined counterbalanced orders) to ensure that no effects of order influenced pattern similarity ( Mumford et al., 2014). The stimuli were presented on the centre of the screen for 1 s, followed by a variable inter-trial interval (ITI) of at least 6 s. This allowed the haemodynamic response to return to baseline, which enabled us to accurately model each trial separately. The task was divided into 6 identical functional runs of 11:40 min each. Within each run, 90 items were presented: 10 items from each of the discrete categorical bins described above, along with 10 non-words. The distribution of word stimuli across runs was carefully controlled to ensure there were no differences in word length or number of phonological neighbours among the six runs.
Immediately after the MRI session, participants performed the behavioural lexical decision task with the remaining 480 words and 480 non-words outside the scanner. The task was the same, except for the proportion of non-words, and ITI, which was shorter (1 s) in the behavioural lexical decision task (longer ITIs were only required by the fMRI task). The order of the stimuli was also counterbalanced across participants.
2.4 MRI data acquisition and preprocessing
Whole-brain Images were acquired using a 3-T SIEMENS’s Magnetom Prisma-fit scanner, with a 64-channel head coil, at the Basque Center on Cognition Brain and Language (BCBL). High-resolution T1-weighted anatomical images were obtained with the following acquisition parameters: TR = 2530 ms, TE = 2.36 ms, flip angle = 7°, Field of view = 256 mm, 176 vol per run, voxel size = 1 cubic mm. Then, 6 functional runs were acquired. Each fMRI run consisted of a multiband gradient-echo-planar imaging sequence with the following parameters: TR = 1000 ms, TE = 35 ms, flip angle = 56°, field of view = 210 mm, 690 vol per run, voxel size = 2.4 cubic mm, multiband acceleration factor = 5. The first 6 vol of each run were removed to ensure T1-equilibration effects. The order of the trials in each run, as well as the ITIs variable duration over the 6 s of minimum separation left between trials, were determined with an optimised algorithm designed to maximise the efficiency of the recovery of BOLD response: Optseq II (Dale, 1999). Although the relatively long minimum ITI of 6 s would have been sufficient to separate events, Optseq was employed to introduce additional temporal jitter and to ensure randomised trial presentation across runs. The Optseq command was executed with the following parameters: ntp=680, tr=1, psdwin=6 10 1, tperscan=0, tnullmin=6, tnullmax=10. As described above, each run included 9 event types (eight word categories and one non-word category), with a total of 90 trials per run. The 680 time points generated by Optseq were embedded within a total of 690 acquired volumes per run. Specifically, six initial volumes were added to allow for T1-equilibration effects, and four volumes were included at the end to accommodate hemodynamic responses to the final trials and to ensure a buffer before run termination. The functional task timing was adjusted accordingly to align with these additions. The total duration of the run indicated above (11:40 min) was the result of adding the scanning time (690 vol) and pre-scan preparations (system calibration, fat saturation).
All images were preprocessed by using custom scripts based on AFNI ( Cox, 1996). The T1-weighted image was skull-striped and co-registered to the functional images by means of linear affine transformations. Although slice timing correction was not compulsory due to simultaneous acquisition of multiple slices with multiband sequence and a short repetition time (i.e. 1000 ms), the remaining volumes were corrected for potential differences in timing of slice acquisition and realigned to the minimum outlier volume by means of 12 parameter rigid-body motion transformation. Univariate analyses included a whole-brain, three-way ANOVA focused on the interaction between Frequency, Familiarity and Concreteness (see Supplementary Materials). For this, all images were normalised to the MNI152 standard space (2009Lin) by means of non-linear transformations, at a resolution of 2 cubic mm. The resulting images were smoothed with a 4-mm full width at half-maximum (FWHM) isotropic Gaussian kernel, and finally, scaled to get a mean voxel signal of 100. Regarding multivariate analyses, both RSA searchlight and RSA based on regions of interest (ROIs) were carried out. All multivariate analyses were performed in individual-subject space, on unsmoothed, unscaled images obtained prior to the normalisation to the MNI space.
2.5 RSA searchlight
In order to explore the whole-brain representations of the different lexical features that are objects of interest in the present study, a searchlight-based RSA was conducted. For this, we defined 3 composite models based on the dissimilarity between each pair of words in their key features: a) semantic features (the combination of concreteness, familiarity, frequency (zipf)
, and imageability); b) sublexical features (a combination of bigram frequency, biphone frequency, orthographic distance OLD20, number of phonological neighbours, and number of letters); and, c) word vectors (
word to vectors, Word2Vec; see
Mikolov et al., 2013) recovered from several different Spanish sources (
Almeida and Bilbao, 2018;
Bilbao-Jayo and Almeida, 2018). Although we focused on the composite models, we also created a simple model for each of the variables forming the semantic and sublexical composite models. To ensure that the models were expressed in the most comparable measures possible, we used euclidean distances for the unique variables constituting each composite model,
mahalanobis distances for composite sublexical and semantic models (given their potential covariability) and cosine distances for word vectors (given the potential influence of the size of the estimated vector). All measures were then normalised to produce a range between 0 (the items are the same) and 1 (the items are completely different).
Fig. 3
After preprocessing the images, beta values for each trial (each word) were estimated using the GLM. A total of 36 parameters corresponded to polynomial terms that estimated changes in signal due to drift (6 terms per run), with an additional 6 regressors estimated for motion parameters (3 rotational, 3 translational). Each word was estimated as a separate regressor by convolving the onset of the stimulus with a canonical gamma HRF. This is a mathematical function that models the delayed rise in BOLD signal, followed by an undershoot, characteristic of brief neural activity. The resulting images containing the beta values were then masked to include brain voxels only.
The searchlight itself was performed with custom scripts based on Python (
2.6 ROI-based RSA
In order to address the main objectives proposed, we conducted RSA in brain areas that are key for conceptual representations associated with the variables of interest, according to the previous literature ( Binder et al., 2009; Hoffman, et al., 2015; Wang et al., 2010) bib6. A total of 10 left-lateralised ROIs were anatomically defined by using the HCP atlas available in AFNI (Glasser et al. 2016): IFG pars orbitalis, IFG pars triangularis, IFG pars opercularis, parahippocampal gyrus, inferior, middle and superior ATL (iATL, mATL, sATL), posterior STG (pSTG), anterior vOTC (fusiform FG4) and posterior vOTC (fusiform FG2). As in the searchlight RSA, the ROI similarity matrix was obtained for each ROI by computing the cosine similarity between each pair of trial-specific vectorised patterns. The brain similarity matrix was compared with each of the 3 composite models, and submitted to a bootstrap analysis of 10 5 iterations, which yielded the employed 95 %CI. For every subject, each iteration was computed as the Spearman rank’s correlation between a random permutation of the reduced ROI RDM and a random permutation of a randomly selected model. While we were mainly interested in analysing the differences between the combinatorial models, in the ROI analyses we included the simple semantic models in order to investigate the potential contribution of each of the variables of interest to the semantic model. Complementary analyses focused on the simple sublexical models, and are available in supplementary materials (). The resulting correlation between each of the ROIs and each model was then compared to the 95th percentile of the bootstrap sample, by means of Pearson and Filon’s Z, adapted from the cocor R package ( Diedenhofen and Musch, 2015). At the group level, a proportion of ɣ = 0.5 of subjects showing significant (after Pearson and Filon’s Z) model similarity was used as a reference to consider the similarities to robustly reproduce at the population level. Because ROI analyses are less exploratory, this is a more restrictive criterion than that applied to the searchlight results. However, given this more stringent contrast in the ROI analyses, we also report those similarities for which a proportion of ɣ = 0.75 subjects were simply above the threshold (i.e., before Pearson and Filon’s Z) to help contextualise the effects.
3 Results
3.1 Behavioural results
On average, the global accuracy for all items in the fMRI task was 95.031 % ± 4.229, and the average RT for all items was 0.677 s ± 0.084. In order to evaluate the potential relations between the three variables of interest and RTs, we performed Pearson’s correlation tests over all correct responses by each participant. This participant-wise approach allowed us to capture individual variability in lexical effects and provided a more accurate estimate of the consistency and strength of these effects across the sample. Confidence intervals (CIs) for the r values are reported. Across the 30 subjects, the average correlation between Concreteness and RT was r = −0.052 ± 0.045, with a 95 %CI = [−0.069, −0.036], and an averaged p = 0.309 ± 0.255 . The averaged correlation between Familiarity and RT was r = −0.284 ± 0.083, with a 95 % CI = [−0.315, −0.253], and an averaged p = 0.001 ± 0.005. Finally, the average correlation between Frequency and RT was r = −0.246 ± 0.059, with a 95 %CI = [−0.269, −0.224], and an averaged p = 0.001 ± 0.007.
We also assessed the differences in the lexical effect between all 8 categorical bins described above, by calculating T values and effect sizes for the contrast
non-word’s RT minus each bin’s RT. Non-words tend to show slower RTs than words overall. A significant difference between non-words and any of the 8 bins would mean that words in such bins are easier to access than non-words (i.e., lexical effect). In this sense, the higher the effect size associated with the contrast, the greater the lexical effect. In turn, if the contrast does not show a significant difference, or if the effect size associated is considerably low, it can be considered that the words in the bin are as easy to access as non-words (i.e., no lexical effect). Below, we report averaged false discovery rate (FDR)-corrected p values (i.e. q-values) and averaged Cohen’s d values. The results demonstrated that all bins showed a lexical effect. However, the bins that combined highly familiar and highly frequent words showed the greatest effect sizes, with slight differences between abstract and concrete words. In contrast, bins that contained lower-familiarity and lower-frequency words showed moderate to high effect sizes, thus indicating a less pronounced lexical effect for words in these bins (see
Table 1
3.2 RSA results
3.2.1 Searchlight results
For each of the three combinatorial models (sublexical, semantic and Word2Vec), the normalised and then averaged (across all subjects) correlations resulting from the searchlight are shown in
Fig. 4
The sublexical model showed positive correlations above the fixed percentile in three main clusters. Cluster 1 was composed of bilateral primary visual cortex (V1), secondary visual cortex (V2), the different areas of V3, V4, and the lateral occipital cortex (LO/hOclp4, see Malikovic et al., 2016); Cluster 2 was formed by left primary and secondary motor cortices, left area 55b and left primary sensory area; and Cluster 3, was mainly composed of the left frontal eye fields. Cluster 1 was significantly above the permutations threshold for ɣ = 0.86 of the participants, and Cluster 2 exceeded the threshold for ɣ = 0.54 of the participants, while Cluster 3 did not show significant sublexical similarities at the group level (only ɣ = 0.43 of the sample).
Regarding the semantic model, the searchlight yielded significant positive correlations in five main clusters: Cluster 1 was formed by a set of left-lateralised areas within the IFG, MFG and ATL, including pars orbitalis, pars triangularis, pars opercularis, the frontal eye fields, and the anterior STG; Cluster 2 included the right IFG pars orbitalis and pars triangularis, as well as the right ventral insula; Cluster 3 mainly encompassed the posterior STG and MTG; Cluster 4 included the bilateral SMA and superior medial gyrus (SMG); and Cluster 5 included the right IFG pars triangularis. The permutations test revealed that only clusters 1 and 3 yielded significant semantic similarities in >50 % of the sample (ɣ = 0.68 and ɣ = 0.57 of the participants, respectively), while the percentage of participants showing significant semantic similarities in clusters 2, 4 and 5 did not exceed 50 % (Cluster 2: ɣ = 0.46; Cluster 4: ɣ = 0.39; Cluster 5: ɣ = 0.46)
Finally, the Word2Vec model showed significant correlations in four clusters: Cluster 1 was formed by areas FG3 and FG4, the middle and anterior inferior temporal gyrus, including the left parahippocampal gyrus, and left hippocampus; Cluster 2 encompassed part of the right parahippocampal gyrus, the right IFG pars orbitalis and right insula; Cluster 3 was mainly formed by the left IFG pars orbitalis; and Cluster 4 included the right IFG pars triangularis, extending to the right MFG. All 4 clusters showed supra-threshold similarities in >50 % of the sample (Cluster 1: ɣ = 0.57; Cluster 2: ɣ = 0.57; Cluster 3: ɣ = 0.71; Cluster 4: ɣ = 0.64).
3.2.2 ROI-based results
Because we only found significant similarities with the semantic model in the ROIs analysed, along with the similarities of the composite models, we present here the similarities with the simple variables forming the composite semantic model.
Fig. 5
The IFG pars orbitalis ROI showed above-percentile correlations with the Semantic model, reproducible in ɣ = 0.82 of the sample, although these similarities remained statistically significantly above the threshold within only ɣ = 0.36 of the participants (after Pearson and Filon’s Z). When looking at the simple models, all of them showed significant above-percentile correlations (Frequency: ɣ = 0.75; Familiarity: ɣ = 0.61; Concreteness: ɣ = 0.57, after Pearson and Filon’s Z). The IFG pars triangularis ROI also displayed above-percentile correlations with the Semantic model, reproducible in ɣ = 0.75 of the sample, but being found to be statistically significant only in ɣ = 0.46 of the population. Regarding the simple models, we found significant above-percentile correlations with the Frequency (ɣ = 0.93) and Concreteness (ɣ = 0.68) models, but only observed non-significant above-percentile correlations with the Familiarity model in ɣ = 0.83 of the sample, (ɣ = 0.46 after Pearson and Filon’s Z). In the posterior IFG pars opercularis, no combinatorial models showed reproducible above-percentile correlations. The Frequency model was the only one to show significant above-percentile correlations, reproducible in ɣ = 0.79 of the sample. The Familiarity and Concreteness models showed above-percentile correlations in ɣ = 0.79 and ɣ = 0.86, respectively, but failed to reach reproducibility after statistical contrasts (Familiarity: ɣ = 0.39; Concreteness: ɣ = 0.39, after Pearson and Filon’s Z). Finally, the pSTG ROI showed no significant correlations with any of the combinatorial models. However we found significant above-percentile correlations with the Frequency model in this area, reproducible in ɣ = 0.50 of the sample (after Pearson and Filon’s Z).
Fig. 6
Fig. 7
4 Discussion
Our goal was to investigate the neural representational space of linguistic properties that are linked to either sublexical or semantic aspects of lexical access. Additionally, we aimed to disentangle the interrelations between these linguistic properties and NLP models. We found that semantic representations of diverse nature (psycholinguistic and NLP) converge in the anterior IFG and anterior vOTC, while sublexical representations involve areas posterior to the IFG (i.e., SMA) and vOTC (i.e., V3 and V4). Additionally, our NLP model displayed specific representations in semantic hubs like the inferior ATL, parahippocampal gyrus, or the IFG pars orbitalis. Below, we discuss these main findings and their implications. We first focus on the anterior-posterior specialisation within the IFG/SMA and vOTC/V3-V4. Then, we focus the discussion on the differences between psycholinguistic and NLP models, and what these differences tell us about the integration of sublexical and lexico-semantic information.
4.1 Anterior-to-posterior specialisation in the left IFG and left vOTC
Our results show a functional specialisation that extends from the anterior IFG to motor planning areas. Previous work has shown that different subregions of the inferior frontal gyrus (IFG) are functionally specialised along its anterior-to-posterior axis ( Badre, and Wagner, 2007; Hagoort, 2013). Specifically, as shown in the searchlight analyses, the anterior and middle portions of the IFG (pars orbitalis and pars triangularis) are consistently implicated in semantic processing ( Chee et al., 2002; Junker et al., 2020; Sánchez et al., 2023), while the posterior IFG (pars opercularis) is more frequently associated with phonological processing ( Carreiras et al., 2009; Fiebach et al., 2002). These anterior-to-posterior specificities support a functional dissociation within the IFG. In our study, a psycholinguistic semantic model showed significant similarities in the whole left IFG. A naturalistic language model, representing fine-grained semantic relations, showed above threshold similarities in the bilateral pars orbitalis only . However, while the posterior IFG ( pars opercularis) displayed above-threshold correlations with the semantic model, we did not observe significant correlations with a sublexical model in this area. In turn the sublexical model showed significant correlations in more posterior areas, including the left SMA (BA 6), potentially highlighting its role in motor planning and phonological search (Carreiras et al., 2006, 2009). One possibility is that the anterior-to-posterior functional specialisation expands beyond the IFG. Under this interpretation the anterior IFG is strongly associated with semantic lexical access, as illustrated by its high similarity with both semantic and NLP models. Moving to the posterior IFG, we still observe a less pronounced involvement in semantic processing. Interestingly, the areas immediately posterior/dorsal to the IFG seem to be almost exclusively involved in motor planning associated with phonological processing ( Hagoort, 2005, 2013).
In subsequent ROI-based analyses, the Frequency model was the only model to show reproducible similarities in pars opercularis. In pars triangularis and pars orbitalis, however, we found consistent similarities with Frequency and Concreteness, while only in pars orbitalis, Familiarity also showed significant representations. While the effects of frequency have been attributed to both phonological access ( Carreiras et al., 2009; Fiebach et al., 2002), and semantic access ( Chee et al., 2002, 2003) during word processing, the cognitive effects of familiarity have been almost exclusively linked to the ease of accessing semantic knowledge ( Neveu and Kaushanskaya, 2023; Shinozuka et al., 2021). Given the role of the ventral reading network in active retrieval of lexico-semantic information (word meaning), it seems that the added effects of frequency, familiarity and concreteness are mainly linked to semantic access, without contesting the idea that word Frequency is also associated with sublexical processing. In light of this hybrid nature of word Frequency (presumably conveying both semantic processing and phonological search), our findings are compatible with the idea that pars opercularis is recruited for both sublexical and semantic processes, while demonstrating that semantic representations specifically involve the anterior IFG. Nevertheless, the absence of specific sublexical representations in the posterior IFG makes it necessary to further investigate the potential role of the posterior IFG as a convergence area that conveys both sublexical and lexico-semantic processing.
A slightly different representation pattern was found in the areas of the vOTC and adjacent posterior visual regions, where the anterior (but not posterior) vOTC displays partial sensitivity to semantically related properties, while sublexical representations recruit primary and secondary visual areas. The anterior vOTC showed significant and reproducible correlations with word Frequency, a variable related to both sublexical and semantic processing, while the posterior vOTC did not. This is compatible with previous evidence indicating a dissociation between the anterior and posterior vOTC ( Brem et al., 2006; Lerma-Usabiaga, et al., 2018; Price and Devlin, 2011; Vinckier et al., 2007; White et al., 2019; for a review, see Caffarra et al., 2021). Additionally, previous studies have shown that the anterior vOTC is sensitive to semantically related variables such as word frequency ( Graves et al., 2007; Kronbichler et al., 2004; Schuster et al., 2016) and to task demands involving semantic processing ( Sánchez et al., 2023), supporting its partial involvement in early lexico-semantic processing. However, in comparison with our findings in the anterior IFG, the anterior vOTC is less clearly associated with pure (i.e., beyond word Frequency) semantic representations. Such semantic representations are more clearly observed in areas of the same network that are recruited later in that pathway during lexical access (e.g., parahippocampal gyrus, inferior ATL, anterior IFG). In turn, the sublexical model showed high correlations with primary and secondary visual areas, including V3 and V4. As in the IFG, a possible interpretation is that this potential division of labour could extend posteriorly, where more posterior visual areas would be sensitive to lower-level, orthographic features of visual linguistic input. Areas V3 and V4 are known for their role in early visual processing of low-level visual features ( Furlan and Smith, 2016; Pasupathy et al., 2020). Area V3 has been associated with the perception of motion related with speech recognition ( Jeschke et al., 2023), and Area V4 has often been considered the earliest step for the categorisation of visual input ( Okazawa et al., 2016; Pasupathy et al., 2020). Additionally, adjacent areas have been linked to spatial and action-related processing, given their co-activation with the parietal and, more precisely, with SMA ( Malikovic et al., 2016).
4.2 Linguistic properties vs word vectors in semantic hubs
It is typically assumed that NLP models represent complex semantic relations between words ( Abnar et al., 2018; Grave et al., 2019; Liuzzi et al., 2023). In the present study, this was corroborated by significant correlation between the semantic psycholinguistic model and a word vector model (Fig. 3.3). At the neural level, we found significant correlations in brain areas that are highly involved in abstract processing and retrieval of conceptual information (sometimes referred to as semantic hubs) ( Patterson et al., 2007). Our Word2Vec and Semantic models recruit the left anterior IFG, the anterior vOTC, and the parahippocampal gyrus, extending to the inferior ATL, with some differences between the two models. The correlations with the semantic model were more extensive, especially in the IFG, and included additional areas like the posterior STG, left SMA, and the superior ATL, but were not observed in the middle or inferior ATL. In turn, the Word2Vec model showed less widespread similarities in the left anterior IFG, parahippocampal gyrus and inferior ATL. This could indicate that the Word2Vec model may be more sensitive to fine-grained semantically related processes (see Carota et al., 2017, 2021) than the semantic model built from psycholinguistic variables. We have already referred to the hybrid nature of word Frequency, a variable included in the Semantic model. As such, the pSTG, an area associated both with phonological and semantic processing ( Friederici, 2012; Lau et al., 2008; MacGregor et al., 2012), as well as the superior ATL, showed high similarities with the Semantic model that were mainly accounted for by representation of word Frequency in this area. The ATL displays a graded functional specialisation, and thus diverse information is represented along this area ( Ralph et al., 2017). In this sense, the superior ATL is connected with the auditory cortex and hence it is not surprising that phonological representations also recruit this area ( Zhang et al., 2024). On the other hand, correlations with the left inferior ATL were only found with the Word2Vec model, supporting the idea that this area conveys abstracted multimodal information ( Ralph et al., 2017). As in the posterior IFG, these interpretations should be supported by further research demonstrating sublexical representations in the pSTG and superior ATL that confirm their converging nature.
In sum, we show how combining psycholinguistic and NLP models can contribute to further detailing the functional specialisation of ventral and dorsal language networks. Posterior/dorsal visual and motor areas participate in sublexical processing, while anterior/ventral regions subserve semantic access. Some areas, like the parahippocampal gyrus, anterior IFG or inferior ATL, are more strongly linked to fine-grained semantic representations. Other areas, like the anterior vOTC, pSTG or posterior IFG display clear representations of psycholinguistic variables that are linked to both sublexical and lexico-semantic processing, like word frequency. Such representations are compatible with the notion that these areas converge both types of processes, but further research is required to clarify this aspect.
Some limitations and future directions should be mentioned. Firstly, our fMRI protocol did not include a specific correction for signal loss in orbital areas. Although brain similarities, which measure how comparable trial-specific brain patterns are among one another, are less sensitive to signal intensity per se ( Jimura, and Poldrack, 2012) than univariate intensity-based measures (e.g., Jackson et al., 2015), it should be noted that this fact might have influenced the extent of some effects found around the ATL. Secondly, while the lexical decision task employed was designed and delivered to not bias participants towards semantic or sublexical processing, it may be argued that it does not require covert or overt articulation, and as such, this fact could have attenuated sublexical representations in some of the areas that may convey sublexical and semantic representations, such as IFG pars opercularis, pSTG or anterior vOTC. Future studies should include a direct comparison of sublexical and lexico-semantic tasks, using optimised fMRI protocols that allow maintaining enough variability in the linguistic stimuli analysed. In addition, we acknowledge that our fMRI task was not a canonical lexical decision task, due to the reduced proportion of nonwords included (11 %). This modification was implemented to maximise the number of word trials and optimise BOLD signal estimation for the RSA analyses. Despite this deviation, we found that the pattern of lexicality effects across words—particularly those varying in familiarity, frequency, and concreteness—was highly consistent with effects observed in a standard lexical decision task featuring a 50 % nonword ratio (see Supplementary Table S1). Since our core analyses focused on the representational similarity across individual word stimuli, rather than a direct comparison between words and nonwords, we consider it unlikely that either task difficulty or the low nonword ratio biased our results. Nevertheless, from a purely behavioural standpoint, direct comparisons between our task and canonical lexical decision tasks should be made with caution. And finally, we focused on single-word recognition to simplify the analysis of psycholinguistic properties and associated word vectors. However, this focus made it difficult to include an in-depth analysis through large language models, which might offer more nuanced semantic relations than word vectors based on word co-occurrences ( Digutsch and Kosinski, 2023). Additionally, the analysis of some semantic aspects known to occur at the sentence level was not possible in our study. Future investigations should extend the combination of psycholinguistic factors and NLP models to sentence-level processing and spatiotemporal dynamics. Such investigations have proven especially promising in dissecting the implication of different ventral areas at different stages during sentence reading ( Woolnough et al., 2021, 2022), and are key to understanding the complex functional specialisation within these networks.
Code availability
All scripts used are available at
Data availability
fMRI and behavioural data are available at
CRediT authorship contribution statement
Abraham Sánchez: Writing – review & editing, Writing – original draft, Visualization, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Pedro M. Paz-Alonso: Writing – review & editing, Supervision, Project administration, Methodology, Funding acquisition, Conceptualization. Manuel Carreiras: Writing – review & editing, Supervision, Methodology, Funding acquisition, Conceptualization.
Declaration of competing interest
The authors declare no competing financial interests.
Acknowledgements
A.S. was supported by a predoctoral grant from the
Supplementary materials
Supplementary material associated with this article can be found, in the online version, at
Appendix Supplementary materials
Table 1
| Concreteness | Familiarity | Frequency | T | q values | Cohen’s d |
| Abstract | Low | Low | 3.978 | 0.026 | 0.741 |
| High | 7.249 | < 0.001 | 1.353 | ||
| High | Low | 8.004 | < 0.001 | 1.485 | |
| High | 10.465 | < 0.001 | 1.946 | ||
| Concrete | Low | Low | 5.240 | 0.003 | 0.979 |
| High | 7.426 | < 0.001 | 1.382 | ||
| High | Low | 8.347 | 0.001 | 1.549 | |
| High | 9.680 | < 0.001 | 1.805 |
Table 2
| Area | N of voxels | Peak r value | X | Y | Z |
| Primary Visual Cortex (V1) | 10.224 | .209 | −15 | −94 | −1 |
| Area V3v | 5.453 | .062 | −27 | −91 | −10 |
| Secondary Visual Cortex (V2) | 4.090 | .044 | −1 | −83 | 15 |
| Area hOc3d (V3d) | 3.408 | .128 | −16 | −98 | 10 |
| Area hOc4v (V4v) | 2.726 | .074 | −22 | −86 | −14 |
| Area hOc4d (V3A) | 1.022 | .063 | −17 | −89 | 16 |
| Left Premotor/SMA | 657 | .056 | −57 | −5 | 42 |
| Left Primary Motor Cortex | 560 | .052 | −57 | −4 | 31 |
| Left Area 55b | 348 | .055 | −52 | −2 | 51 |
| Area hOc4lp | 341 | .038 | −30 | −87 | 13 |
| Left Frontal Eye Fields | 312 | .05 | −34 | −4 | 49 |
| Left Primary Sensory Cortex | 232 | .049 | −57 | −11 | 39 |
Table 3
| Area | N of voxels | Peak r value | X | Y | Z |
| Left IFG pars Triangularis | 12.239 | .073 | −48 | 32 | 17 |
| Left IFG pars Orbitalis | 4.080 | .068 | −39 | 32 | −16 |
| Right IFG pars Orbitalis | 2.352 | .066 | 34 | 30 | −7 |
| Left posterior STG | 2.154 | .063 | −59 | −42 | 5 |
| Left Frontal Eye FIelds (BA 8) | 1.785 | .062 | −43 | 5 | 41 |
| Left IFG pars Opercularis | 1.530 | .066 | −54 | 16 | 17 |
| Left Anterior STG | 1.275 | .069 | −50 | 17 | −16 |
| Right Anterior Ventral Insula | 1.266 | .063 | 34 | 30 | −5 |
| Right IFG pars Triangularis | 1.143 | .064 | 46 | 27 | 28 |
| Right Superior Medial Gyrus | 1.111 | .06 | 8 | 35 | 54 |
| Left SMA | 824 | .057 | −8 | 18 | 59 |
| Left Superior Medial Gyrus | 753 | .058 | 2 | 29 | 53 |
| Right SMA | 717 | .07 | 6 | 23 | 57 |
| Left posterior MTG | 538 | .062 | −66 | −49 | 4 |
| Left Anterior MTG | 510 | .056 | −61 | −7 | −12 |
Table 4
| Area | N of voxels | Peak r value | X | Y | Z |
| Left IFG pars Orbitalis | 3.367 | .063 | −37 | 32 | −13 |
| Right IFG pars Orbitalis | 3.287 | .057 | 25 | 27 | −14 |
| Left ParaHippocampal Area | 2.658 | .063 | −33 | −34 | −19 |
| Area FG3 | 1.739 | .061 | −31 | −39 | −18 |
| Area FG4 | 1.391 | .058 | −41 | −39 | −22 |
| Right IFG pars Triangularis | 1.247 | .055 | 39 | 26 | 26 |
| Right ParaHippocampal Area | 841 | .054 | 27 | −3 | −36 |
| Left Hippocampus | 580 | .049 | −20 | −37 | −4 |
© 2025 The Authors