Content area

Abstract

The neural mechanisms of auditory and visual processing are not only a core research focus in cognitive neuroscience but also hold critical importance for the development of brain–computer interfaces, neurological disease diagnosis, and human–computer interaction technologies. However, EEG-based studies on classifying auditory and visual brain activities largely overlook the in-depth utilization of spatial distribution patterns and frequency-specific characteristics inherent in such activities. This paper proposes an analytical framework that constructs symmetrical spatio-temporal–frequency feature association vectors to represent brain activities by computing EEG microstates across multiple frequency bands and brain functional connectivity networks. Then we construct an Adaptive Tensor Fusion Network (ATFN) that leverages feature association vectors to recognize brain activities related to auditory, visual, and audiovisual processing. The ATFN includes a feature fusion and selection module based on differential feature enhancement, a feature encoding module enhanced with attention mechanisms, and a classifier based on a multilayer perceptron to achieve the efficient recognition of audiovisual brain activities. The feature association vectors are then processed by the Adaptive Tensor Fusion Network (ATFN) to efficiently recognize different types of audiovisual brain activities. The results show that the classification accuracy for auditory, visual, and audiovisual brain activity reaches 96.97% using the ATFN, demonstrating that the proposed symmetric spatio-temporal–frequency feature association vectors effectively characterize visual, auditory, and audiovisual brain activities. The symmetrical spatio-temporal–frequency feature association vectors establish a computable mapping that captures the intrinsic correlations among temporal, spatial, and frequency features, offering a more interpretable method to represent brain activities. The proposed ATFN provides an effective recognition framework for brain activity, with a potential application for brain–computer interfaces and neurological disease diagnosis.

Full text

Turn on search term navigation

1. Introduction

The human perception and cognition of the external world are essentially processes through which the brain integrates and processes multimodal sensory information (such as visual and auditory inputs). As the two primary sensory channels for acquiring information, the mechanisms underlying the cerebral processing of audiovisual information are not only a core research direction in cognitive neuroscience but also hold critical significance for the development of brain–computer interfaces (BCIs), the diagnosis of neurological disorders (such as assessing audiovisual integration impairments in individuals with autism or schizophrenia), and the advancement of human–computer interaction technologies [1,2,3]. BCI systems based on the recognition of audiovisual brain activities can assist individuals with motor impairments in controlling external devices (e.g., wheelchairs, prosthetics) through “mental commands”, where the accurate classification of audiovisual brain activities is a prerequisite for achieving efficient interaction in such systems [4,5]. Therefore, research on the classification of audiovisual brain activities not only deepens the understanding of the brain’s multimodal information integration mechanisms but also promotes the practical application of related technologies in fields such as healthcare and rehabilitation, carrying significant theoretical value and practical importance.

To achieve the precise recognition of audiovisual brain activities, previous studies have extensively explored classification methods based on EEG signals. Studies on various audiovisual stimulus types and cognitive paradigms have achieved high accuracy rates. For instance, in research classifying visual and auditory stress responses, an automated evoked potential differentiation system achieved a binary stress classification accuracy of 97.14% for visual stimuli and 94.51% for auditory stimuli. In a more complex four-level stress classification task, accuracies reached 89.59% for visual and 82.63% for auditory stimuli [6]. Another study employed the Sparse Optimal Scoring (SOS) method to analyze EEG data, achieving efficient classification in visual language comprehension tasks with an average out-of-sample accuracy as high as 98.89% [7], fully demonstrating the potential of EEG signals in classifying audiovisual brain activities. With advancements in machine learning and deep learning technologies, models for audiovisual EEG classification have been continuously optimized. Convolutional Neural Networks (CNNs), leveraging their ability to extract local temporal features, became a mainstream tool for early EEG classification. Recurrent neural networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) networks, further improved classification performance under dynamic audiovisual stimuli by capturing long-term dependencies in the signals [8]. In recent years, the introduction of hybrid models (e.g., CNN-LSTM, Graph Convolutional Networks (GCNs)) and Transformer models has effectively enhanced classification robustness in complex audiovisual tasks by integrating spatio-temporal features and global attention mechanisms [9,10]. Furthermore, innovations in multimodal guiding strategies have provided new pathways for improving classification performance. One study proposed a dynamic visual–auditory cooperative guidance method. The experimental results showed that this method significantly enhanced cortical activity in fine motor imagery-related audiovisual tasks, leading to broader brain region activation and stronger Event-Related Synchronization (ERS) and Event-Related Desynchronization (ERD) effects in key areas such as the frontal and parietal lobes, providing clearer neural response markers for classification [11]. However, current EEG-based research on audiovisual brain activity classification predominantly focuses on the temporal dimension or single statistical features of EEG signals, often neglecting the in-depth utilization of the spatial distribution characteristics and frequency-specific features inherent in audiovisual brain activities. From a neurophysiological perspective, audiovisual information processing involves the collaboration of multiple brain regions. Visual stimuli primarily activate the primary visual cortex in the occipital lobe, while auditory stimuli preferentially activate the primary auditory cortex in the temporal lobe. The integration of both requires the participation of association areas such as the prefrontal and parietal lobes [12]. This spatial difference in brain region activation is a key distinguishing feature of audiovisual brain activities. Yet, many methods treat EEG electrode signals as independent temporal sequences, failing to fully exploit the spatial correlation information between electrodes. Simultaneously, EEG signals of different frequencies correspond to distinct brain functional states. For example, theta waves (4–7 Hz) are associated with cognitive control, alpha waves (8–13 Hz) with visual cortex inhibition, and beta waves (14–30 Hz) with motor preparation [13]. Audiovisual tasks elicit power changes in specific frequency bands. However, many studies often use full-band signals or analyze a single frequency band, failing to accurately capture the frequency-specific responses under different audiovisual tasks. This insufficient exploitation of spatial and frequency features makes it difficult for existing models to fully characterize the differences in neural mechanisms underlying audiovisual brain activities, thereby limiting an in-depth analysis of the brain’s multimodal information integration mechanisms.

EEG microstate and brain network analysis methods offer effective pathways to address the aforementioned limitations. EEG microstates, as sub-second stable scalp field topography patterns, can intuitively reflect the global spatial distribution characteristics of EEG activity. Parameters such as the topology, duration, and transition probabilities of microstates can precisely characterize the spatial activation differences in the cerebral cortex under audiovisual tasks [14]. On the other hand, brain network analysis, particularly that of functional brain networks, quantifies the spatial synergistic relationships between different brain regions (electrodes) by calculating functional connectivity strength (such as phase synchronization, correlation). It can also construct frequency-specific brain networks by combining different frequency bands, thereby simultaneously characterizing the spatial correlation features and frequency-specific characteristics of audiovisual brain activities [15]. However, the temporal, spatial, and frequency features of brain information processing are not independent; they are closely interrelated. Neuronal synchronous oscillations trigger the activation of distributed brain functional networks (frequency-space). These synchronous oscillations continuously change with cognitive task processing (time–frequency), which in turn affects the activation of brain functional networks and causes dynamic changes in network structure (space-time). The temporal, spatial, and frequency characteristics of audiovisual integration influence and constrain each other. Their intrinsic interrelationships form the foundation and core of the brain’s processing of audiovisual information. Therefore, establishing a mapping relationship that reflects the spatio-temporal–frequency characteristics of audiovisual brain activities and exploring the intrinsic one-to-one correspondence among temporal, spatial, and frequency features are essential means to fundamentally reveal the brain mechanisms of audiovisual information processing and to classify audiovisual brain activities. EEG microstates and brain networks are not independent analytical dimensions. Essentially, they represent manifestations of the brain’s information processing at different observational scales and share a close intrinsic relationship: on one hand, changes in the topological patterns of EEG microstates are macroscopic reflections of the dynamic adjustments in the activation and connection strengths of nodes (brain regions) within the functional brain network; on the other hand, the dynamic evolution of brain networks also depends on the temporal characteristics of neuronal synchronous oscillations, and the frequency of these oscillations further influences the transition rate and duration of microstates [16]. This relationship fundamentally stems from the inseparability of the spatio-temporal–frequency characteristics in brain information processing. The spatio-temporal–frequency features of audiovisual information processing form an organic whole that mutually influences and constrains each other. Their intrinsic interrelationship constitutes the core mechanism by which the brain accomplishes multimodal information processing.

In this study, we utilize the microstates of the brain during the processing of visual, auditory, and audiovisual information as time windows to establish dynamic brain networks symmetrical to microstates, representing brain activity by constructing spatio-temporal–frequency feature vectors. Using the feature vectors, we are able to accurately identify the brain activities involved in visual, auditory, and audiovisual information processing on our proposed Adaptive Tensor Fusion Network (ATFN). The main contributions of this study can be summarized as follows:

(1). We proposed a method for representing the brain activity involved in auditory, visual, and audiovisual information processing based on spatio-temporal–frequency feature vectors. We computed EEG microstates across multiple frequency bands and constructed brain functional connectivity networks by dividing the microstate sequences into time windows. By calculating microstate parameters, the topological properties of the corresponding brain networks, and their inter-correlations, we characterized the brain activities during audiovisual information processing as symmetric spatio-temporal–frequency feature association vectors.

(2). We constructed an Adaptive Tensor Fusion Network that utilizes symmetric spatio-temporal–frequency association vectors to recognize the brain activities involved in audiovisual information processing. The feature fusion and selection module based on differential feature enhancement performs standardization, differential enhancement, and dynamic weight assessment on multi-band microstate features to automatically screen key features. And the attention-enhanced feature encoding module combines 1D convolution, bidirectional GRU, and multi-head self-attention mechanisms to enhance the discriminative representation of the symmetric spatio-temporal–frequency features. Finally, a classifier based on a multilayer perceptron (MLP) performs pooling and non-linear mapping on the fused features, and classification decisions were optimized using a focal loss function to achieve the efficient recognition of audiovisual brain activities.

2. Materials and Methods

2.1. Subjects

Healthy participants aged 16–30 were recruited for this study through Changchun University of Science and Technology (CUST), with approval obtained from the CUST Ethics Committee. The inclusion criteria were as follows: normal or corrected-to-normal vision and hearing (e.g., using glasses or hearing aids); a minimum of secondary education; right-handedness for most daily tasks; and the availability to meet the time requirements and cooperate with the researchers’ instructions. Participants were excluded if they were under guardianship, lived in a care institution, used psychotropic medication, had a history of mental illness, had neurological disorders affecting brain function, or had previously taken part in similar experiments. This study included 23 healthy participants (n = 23; 5 male, 17 female; age 16–26, mean 22 years; education 14–18 years, mean 15.57 ± 1.56 years). All were right-handed, had normal/corrected vision and hearing, and completed the experiment without withdrawal. Eligible participants visited the Jilin Provincial International Joint Research Center for Brain Information and Intelligent Science, where they were briefed on this study and gave written consent.

2.2. Stimuli and Experiment

The experiment was conducted in a dimly lit, sound-attenuated, and electromagnetically shielded room. Participants were seated in an adjustable chair with a chin rest to minimize head movement. Following task instructions from the experimenter, they performed practice trials. The formal experiment began once a participant’s practice accuracy exceeded 80%, indicating their understanding of the task. During the formal session, each participant completed eight blocks, with each block containing 20 auditory (A), 20 visual (V), and 20 audiovisual (AV) stimuli. Standard stimuli depicted living objects (images and/or sounds), while target stimuli featured non-living objects. Target stimuli appeared at a fixed rate of 20% across all modalities (A, V, AV). As shown in Figure 1, The presentation order of stimulus types (2 categories [standard, target] × 3 modalities [A, V, AV]) was pseudo-randomized and balanced in probability, as was their left/right spatial placement. Participants were instructed to limit blinking and movement to reduce motion artifacts. They responded by pressing the left button for living objects and the right button for non-living objects, emphasizing both speed and accuracy. Each participant completed a total of eight experimental blocks, with a 5 min rest period between blocks.

2.3. EEG Data Acquisition and Preprocessing

EEG signals were recorded using a SynAmps 2 system with a 64-channel electrode cap arranged according to the international 10–20 system. The AFz electrode functioned as the ground, while the reference was placed on the left mastoid. Horizontal eye movements were monitored via electrodes positioned at the outer canthi of both eyes, with vertical eye movements and blinks captured by electrodes placed 1 cm above and below the left eye. The EEG and EOG signals were amplified and band-pass-filtered using an analog filter with a range of 0.01 to 100 Hz. During data collection, electrode impedances were kept under 5 kΩ, and continuous signals were sampled at 1000 Hz for later offline processing.

Data preprocessing and analysis were performed using MATLAB R2020b (MathWorks, Inc.) with the EEGLAB toolbox (version 14.1.2b) [17]. Offline digital band-pass filtering (0–30 Hz) was first applied to the raw signals. Ocular artifacts were removed through independent component analysis (ICA), with components representing artifacts and neural activity being manually identified and separated. Following artifact rejection, the data were re-referenced to the average reference. For audiovisual (AV) stimuli, epochs were extracted from −200 ms to +800 ms relative to stimulus onset, with baseline correction performed using the 200 ms pre-stimulus period. Trials containing extreme amplitudes (±100 μV, excluding EOG channels) or incorrect behavioral responses were excluded from subsequent analysis.

2.4. Spatio-Temporal–Frequency Feature Association Vector Generation Method Based on Multi-Band Brain Networks

The preprocessed EEG signals (referring to the 1–30 Hz frequency band) were further filtered to obtain data in the delta, theta, alpha, and beta frequency bands. The microstates of brain activity during A, V, and AV information processing under different frequency bands were obtained through cluster analysis, and microstate attributes were calculated. Using the microstate sequences as time windows, multi-band brain networks were constructed for the 1–30 Hz, delta, theta, alpha, and beta frequency bands, and attributes characterizing the topological structure of the brain networks were computed. Correlations between microstate attributes and brain network attributes were calculated to construct spatio-temporal–frequency association vectors characterizing A, V, and AV information processing brain activities. The detailed workflow is illustrated in Figure 2.

2.4.1. EEG Microstate Analysis

Microstate analysis was performed following the methodology proposed by Murray et al., using MATLAB (vR2020b) and the Microstate eeglab toolbox [18]. First, we calculated the field strength at each moment to obtain the GFP, defined as follows:

(1)GFP(t)=i=1nvitv¯tn

Here, vit is the EEG voltage vector at time t, v¯t is the mean voltage across all electrodes at time, and n is the number of electrodes. In topographic maps with distinct or numerous peaks, a positive correlation between GFP and the signal-to-noise ratio is observed. Topographic maps at GFP peaks were selected to describe the surrounding EEG signals, effectively reducing redundancy and the computational load.

Subsequently, each topographic map was spatially clustered to identify representative microstate classes. Subsequently, microstates were clustered using the Topographic Atomize and Agglomerative Hierarchical Clustering (TAAHC) algorithm [19]. Based on the potential values of each electrode at every time point of the EEG data, the corresponding spatial distribution of scalp electrodes was extracted as candidate microstates. These distributions can be represented as follows:

(2)V(tp)=[V1(tp), V2(tp),, VN(tp)]

where tp represents the time point, Vi(tp) denotes the potential at the i-th electrode at time point tp, and N represents the total number of EEG electrodes. Thus, each time point tp corresponds to a spatial distribution vector of scalp electrodes V(tp), with each vector representing the spatial distribution of the EEG signal at that moment. Initially, each vector is treated as an individual cluster, and the spatial potential distributions at different peak times form the basis for microstate clustering.

To merge the initial clusters, it is first necessary to define the similarity between clusters. Here, we use the Pearson correlation coefficient to measure inter-cluster similarity, expressed as follows:

(3)r(V(tp), V(tq))=i=1NVitpV¯tpVitqV¯tqi=1NVitpV¯tp2i=1NVitqV¯tq2

where V¯tp and V¯tq represent the average potential values across all electrodes at time points tp and tq, respectively. The value of r(V(tp), V(tq)) ranges from −1 to 1, with values closer to 1 indicating greater similarity between the two distributions.

After obtaining the similarity between every pair of clusters, the two most similar clusters are merged. This process proceeds in a bottom-up manner, merging only two clusters at a time and gradually reducing the total number of clusters. Ultimately, the microstate types for A, V, and AV in different frequency bands are obtained. Using representative maps as templates, each EEG time point was assigned to one of the microstate classes by calculating the maximum spatial correlation between the recorded scalp potential and the microstate prototypes. This yielded a time series of discrete microstate labels for the A, V, and AV EEG of each participant. During the backfitting process, the fitted microstate time series often contains some short-duration microstate segments. To reduce the impact of noise, we apply a window smoothing algorithm to smooth these short-term noise segments, thereby obtaining the final audiovisual EEG microstate time series.

Backfitting microstates to EEG data allows for the extraction of several quantitative parameters with neurological value. We calculated four microstate parameters: duration, occurrence, coverage, and transition probability.

Duration refers to the average time a microstate persists during a single occurrence, typically measured in seconds or milliseconds. This index not only reflects the temporal stability and intrinsic properties of the microstate but also reveals its role in the overall dynamic system from the time dimension. It is calculated as follows:

(4)Duration=1Ni=1Ntimesi

where N is the number of occurrences of the microstate, and times() is the duration at the i-th occurrence.

Occurrence represents the average number of times a microstate appears per second. This index provides an important basis for studying the dynamic characteristics of the system by quantitatively characterizing the temporal distribution of microstates. It is calculated as follows:

(5)Occurence=MT

where M is the total number of microstate occurrences, and T is the total observation time (in seconds).

Coverage quantifies the ratio of the total recording time that a given microstate occupies, usually expressed in percentage form. This metric reflects its importance and dominance in brain dynamics. It is calculated as follows:

(6)Coverage=timeT

where time is the total duration of the microstate, and T is the total recording time.

2.4.2. Brain Network Construction Based on Microstate Sequences

After backfitting, we obtained microstate sequences for A, V, and AV brain activities across five frequency bands: 1–30 Hz EEG, delta_EEG, theta_EEG, alpha_EEG, and beta_EEG. Microstate sequences of the same class share similar spatial topologies, reflecting the brain’s state under the same type of brain activity [20]. Therefore, we used the duration of each microstate class as a time window to compute the Phase Lag Index (PLI) within each window [20], constructing brain networks to capture the spatial features of brain activity across multiple frequency bands. The Phase Lag Index (PLI) quantifies phase synchronization between brain regions by assessing the asymmetry in the distribution of instantaneous phase differences, thereby detecting stable phase lag relationships. Compared to traditional phase synchronization metrics, the PLI offers the advantage of being less sensitive to volume conduction effects, providing a more reliable reflection of genuine neural interactions.

Let the two channel signals be xit and xjt. The formula for the PLI is as follows:

(7)PLIij=1Nt=1NsignsinΔφijt

where Δφijt=φitφjt is the instantaneous phase difference, sign· is the sign function, and N is the number of time points. The phase is extracted using the Hilbert transform:

(8)Hxt=1πp.v.xτtτdτ

(9)φt=arctanHxtxt

where p.v. denotes the Cauchy principal value integral. This transform converts a real-valued signal into an analytic signal zt=xt+jHxt, and the instantaneous phase is given by the argument of this complex number.

The PLI values obtained through the above calculations yield a functional connectivity matrix characterizing the brain network. Subsequently, various topological features of the brain network are computed. We calculated nine types of brain network topological features to characterize the global integration and local segregation of brain activity during audiovisual information processing across different frequency bands. These include the following: global efficiency, average clustering coefficient, average betweenness centrality, average closeness centrality, local efficiency, eigenvector centrality, average degree centrality, average shortest path length, and small-worldness.

2.4.3. Spatio-Temporal–Frequency Feature Association Vector Representation

The microstate time series and microstate parameters across multiple frequency bands reflect the temporal–frequency characteristics of the brain during the processing of audiovisual information. The brain networks constructed based on these microstate time series, along with their topological properties, reflect the spatial–frequency characteristics. These features do not exist in isolation; the temporal, spatial, and frequency domain features influence each other. Therefore, by calculating the correlation relationships among these features, we construct a multi-dimensional correlation vector that comprehensively characterizes the brain activities involved in processing A, V, and AV information.

Based on the EEG microstate sequences, microstate attributes are calculated and used as features Mkf of the temporal feature vector. The brain network features computed during the brain network construction phase are then used as features Nkf of the spatial feature vector, while the five different frequency bands are treated as features of the frequency domain feature vector. Two metrics are calculated to represent the correlation relationships between different dimensions of the vectors: feature correlation and microstate class importance weight.

The feature correlation analysis is quantified using the Pearson correlation coefficient, calculated as follows:

(10)r=i=1nxix¯yiy¯i=1nxix¯2i=1nyiy¯2

where xi and yi represent the observed values of different features under the same microstate, xˉ and yˉ are their respective means, and n is the number of subjects. This coefficient is used to measure the degree of linear correlation between any two features within the same microstate. Its value is in the range of [−1, 1], where a positive value indicates a positive correlation, a negative value indicates a negative correlation, and an absolute value closer to 1 signifies a stronger correlation.

The calculation of microstate class importance weight aims to quantify the significance of different microstate categories for overall EEG pattern recognition. The core idea is to comprehensively evaluate the average intensity of each microstate category across all frequency bands. Specifically, this weight is derived by computing the L1 norm (i.e., the sum of absolute values of all elements in the vector, representing the total activity or intensity of the feature vector) of both the temporal attribute features and spatial network features of a given microstate class, normalized by their respective dimensional scales. The average of these two normalized values is then taken to obtain a standardized composite intensity metric. This metric is multiplied by an inherent salience score of the microstate category itself to incorporate prior knowledge. Finally, the weight of a microstate class is determined by the proportion of its composite value relative to the sum of composite values across all classes. A higher weight value indicates that the microstate carries more critical and discriminative information in both the temporal and spatial dimensions.

The calculated weights are incorporated as parameters into the spatio-temporal–frequency association vector. This vector is constructed by horizontally concatenating the temporal and spatial features within the same frequency band and then vertically stacking these concatenated results across different frequency bands to form a three-dimensional feature structure. Within this structure, the model can adaptively identify which microstate categories (temporal dimension) or which frequency bands (spatial dimension) are more critical for a specific recognition task. Consequently, higher weights are assigned to this information in subsequent classification models, ultimately significantly enhancing the model’s recognition accuracy and interpretability.

The formula for calculating the microstate class importance weight is as follows:

(11)γk=1Ff=1FMkf1M+Nkf1N×significancekj=1M1Ff=1FMkf1M+Nkf1N×significancej

where N represents the total dimension of the brain network feature vector, and M represents the total dimension of the microstate attribute vector. F is the total number of frequency bands. Mkf is the microstate attribute vector in frequency band f; Mkf1 represents the L1 norm of Mkf, used to quantify the “strength” of the microstate property vector; and Nkf is the brain network feature vector in frequency band f. Nkf1 represents the L1 norm of Nkf.

The temporal and spatial features within the same frequency band are concatenated on the same plane. These planes are then stacked with spatio-temporal feature planes from other frequency bands to form a three-dimensional structure. The microstate class importance weight is calculated based on the same temporal features across different frequency bands, while the frequency band specificity index is calculated based on the same spatial features across different frequency bands. The calculated results are stored in a vector, which serves as the correlation representation within the spatio-temporal–frequency feature association vector. By analyzing these correlations, the relative importance of different features across frequency bands can be determined, allowing them to be used as adaptive weights in subsequent recognition models to improve classification accuracy.

2.5. Audiovisual Processing Brain Activity Recognition Model Based on Spatio-Temporal–Frequency Feature Association Vectors

To accurately identify the brain activities involved in audiovisual information processing, this study proposes an Adaptive Tensor Fusion Network (ATFN). The model takes spatio-temporal–frequency feature association vectors as input and adaptively handles dynamic changes in the number of microstate classes across different frequency bands while preserving the influence of feature correlations and differences to achieve multi-dimensional feature fusion and interaction. The model structure is shown in Figure 3 and includes the following components: a feature fusion and selection module based on differential feature enhancement, a feature encoding module based on attention enhancement, and a classifier based on a multilayer perceptron [21].

2.5.1. Feature Fusion and Selection Module Based on Differential Feature Enhancement

This module is responsible for feature fusion, validity labeling, and dynamic weight evaluation of multi-band, multi-microstate EEG signals, forming a standardized and weighted feature representation. The input data consists of spatio-temporal–frequency feature tensors for three information processing modes (A, V, and AV) across five frequency bands. Each frequency band contains several microstate categories (denoted as m1 to m5), and each microstate category includes 12 spatio-temporal features along with their correlation information. The overall input tensor is formed by concatenating the features of each frequency band along the feature dimension, with the following structure:

(12)InputB×5m1+m2+m3+m4+m5×12

Based on the input data labels, automatic category labeling is performed. Subsequently, each subject’s data is transformed into a 12 × M matrix (12 features, M microstate categories). A binary mask is generated for each feature value, where non-zero values are set to 1 and zero values to 0, to identify feature validity [22]. Following this, the variation features between microstates are extracted by calculating the changes in microstate and brain network features between adjacent microstate categories, capturing the variation patterns of features across different microstates. The formula is as follows:

(13)Δxij=xi,j+1xi,j

where xi,j represents the value of the i-th feature under the j-th microstate category, and Δxij denotes the feature change value between adjacent microstate categories. For the last microstate category, the change value is set to 0.

Finally, the original features and differential features are concatenated vertically to generate a feature vector of length 24. The model outputs the enhanced feature vector (number of samples × M × 36) and the feature validity mask (M × 12).

Subsequently, a small neural network is used to dynamically evaluate the importance of each feature, enabling the automatic screening and weighting of key features. This neural network consists of an input layer, a hidden layer, and an output layer. The input feature vector to the input layer has a shape of (number of samples × M × 36).

In the hidden layer, the feature importance I is calculated using the following formula:

(14)I=σW1z+b1

where σ denotes the sigmoid function, and z represents the hidden layer feature representation obtained after the first linear transformation and ReLU activation function. Its formula is expressed as follows:

(15)z=max0,W2x+b2

where W1, b1, W2, and b2 are learnable parameters.

Finally, the output layer produces the importance weight vector for each feature, with a structure of (number of samples × M).

2.5.2. Attention-Enhanced Feature Encoding Module

The feature encoding module employs a Gated Recurrent Unit (GRU)-based recurrent neural network structure to capture dynamic change features and long-range dependencies from the time series data of EEG signals. This module adopts a bidirectional GRU architecture [23], enabling the simultaneous integration of forward and backward information about the sequence, thereby more comprehensively modeling the temporal context. A one-dimensional Convolutional Neural Network (CNN) is used to extract spatial and temporal features from the EEG signals. The input data passes through two convolutional layers and one layer normalization.

The input data to the input layer is the weighted feature vector with a shape of (number of samples × number of microstate classes × 24). The first convolutional layer uses a kernel size of 3, with an input channel count of 36 and an output channel count of 12. The second convolutional layer uses a kernel size of 3, with an input channel count of 12 and an output channel count of 12. Finally, layer normalization is applied to standardize the convolutional output. Given an input vector XB×L, the formula is as follows:

(16)X=X1Ll=1LXb,l1Ll=1L(Xb,l1Ll=1LXb,l)2

where B represents the number of subjects, and L denotes the number of microstate categories. The output feature shape is (number of samples × M × 12). Subsequently, the features pass through a multi-head self-attention layer, residual connections and layer normalization, a feedforward network (FFN), and another round of residual connections and layer normalization. This structure is inspired by the Transformer encoder [24], effectively capturing long-range dependencies in the sequence and enabling deep feature fusion. The input vector structure is (number of samples × M × 12). After linear transformations for query (Q), key (K), and value (V), the formulas are as follows:

(17)Q=XWQ,K=XWK,V=XWV

where X is the input feature, and WQ, WK, and WV are learnable weight matrices. Q is the query vector corresponding to each “feature position”, K is used to match the Q vector, and V is the actual feature information ultimately used for weighted fusion.

Subsequently, the self-attention scores are calculated, and weighted fusion is performed using the following formula:

(18)AttentionQ,K,V=σQKTdkV

where σ is the softmax function, and dk is the dimension of the key vector, used to scale the attention scores.

Then, the multi-head attention mechanism is applied, with the formula as follows:

(19)MQ,K,V=j=1HhjWO

The computation for each attention head is as follows:

(20)hi=AttentionQWiQ,KWiK,VWiV

The formulas for the Feedforward Neural Network and residual connections are expressed as follows:

(21)Z=X+MQ,K,V

(22)F=max0,ZW1+b1W2+b2

The output of the feedforward network F is added to its input Z (residual connection), and the result is then processed by another layer normalization operation. This final result represents the output of the feature enhancement module, with a structure of (number of samples × M × 12).

2.5.3. Classifier Based on Multilayer Perception

The baseline model of the ATFN is a multilayer perceptron (MLP) that performs classification based on the fused features and outputs the final class probabilities. Its structure consists of a pooling layer, a linear layer, a dropout layer, layer normalization, and a second linear layer. The first linear layer has an input dimension of 12 and an output dimension of 256 and includes a ReLU activation function. The dropout layer has a dropout rate set to 0.6 to prevent overfitting. After feature standardization, the second linear layer has an input dimension of 256 and an output dimension of 3 (corresponding to three attention states). To address class imbalance, the model uses focal loss as the loss function, with a focusing parameter γ = 2.0, and class weights are adjusted based on the data distribution. The calculation formula is as follows:

(23)FLpt=αt1ptγlogpt

where FL is the focal loss function, pt is the probability predicted by the model, αt is the class weight, and γ is the focusing parameter. The initial learning rate is set to 0.0005, with a weight decay of 0.00001. The learning rate is halved when the validation loss stops decreasing. The training process runs for 100 epochs, and the model with the best performance on the validation set is saved after each epoch.

3. Results and Analyses

3.1. Spatio-Temporal–Frequency Feature Association Vector for Audiovisual Processing

3.1.1. Microstate Analysis Results

We employed EEG microstate analysis to perform clustering analysis on visual, auditory, and audiovisual EEG data to obtain brain microstates corresponding to the three types of information processing across multiple frequency bands. Ultimately, for auditory information processing, the microstates in the EEG, delta_EEG, theta_EEG, alpha_EEG, and beta_EEG frequency bands were clustered into 13, 7, 8, 7, and 5 classes, respectively. For visual information processing, the microstates in the same frequency bands were clustered into 11, 6, 8, 8, and 6 classes, respectively. For audiovisual information processing, the microstates were clustered into 6, 6, 8, 4, and 11 classes, respectively. For ease of comparison, Figure 4 displays the top six classes of microstates for A, V, and AV information processing in the 1–30 Hz frequency band. It can be seen from Figure 4 that the spatial patterns of MS1–MS3 for A, V, and AV are different. The spatial distributions of MS4 for AV and A are similar, but the distribution of AV is more concentrated in the central area, and there is a difference in intensity between the central area and the surrounding areas, while the central area and surrounding area of MS4 for A have a smaller intensity difference. The spatial distribution of MS6 for AV shows a “left upper to right lower” pattern. Although both MS5 patterns for A and V present a “frontal” distribution, MS5 for A is close to the classic resting-state EEG “Microstate D” [25,26,27], while that for V has a different distribution. The clustering results indicate that the topological structures of the microstates corresponding to A, V, and AV are different, suggesting that the neural activities induced by different types of sensory information exhibit notable distinctions. Effectively characterizing and capturing the spatial features of microstates across multiple frequency bands can be used to identify different brain activity states.

3.1.2. Brain Network Construction Results

Based on the previously obtained EEG microstate time series for visual, auditory, and audiovisual conditions through the backfitting procedure, this study further constructed corresponding brain functional networks for each microstate-categorized segment of EEG signals. Due to variations in the duration of microstate segments, a dynamic time window segmentation method was employed. The Phase Lag Index (PLI) was calculated for microstate segments under visual, auditory, and audiovisual stimuli to quantify phase synchronization between different brain regions.

Finally, this study constructed brain network sequences corresponding one-to-one with microstates for visual, auditory, and audiovisual information processing in the 1–30 Hz frequency band, as well as in the delta, theta, alpha, and beta frequency bands. Each functional connectivity network reflects the brain activity pattern of its corresponding microstate. Figure 5 displays the PLI matrices corresponding to the six microstates shown in Figure 4, within the 1–30 Hz frequency band.

To reveal the potential mechanisms of neural processing for different sensory information, this study further calculated the topological properties of brain functional networks. We focused on nine types of topological features closely related to global integration and local segregation functions, including the following: global efficiency, local efficiency, average clustering coefficient, average shortest path length, small-worldness, average degree centrality, average betweenness centrality, average closeness centrality, and eigenvector centrality. Through a quantitative analysis of these multi-dimensional metrics, we can comprehensively characterize the information transmission efficiency of the brain at the network level, the distribution of hub nodes, and the integration and differentiation levels of the overall network, providing insights into cognitive processing differences under different sensory modalities. The calculation results of brain network topological features are shown in Table 1, which displays the topological feature values of the first six brain networks for A, V, and AV information processing in the 1–30 Hz frequency band.

3.1.3. Spatio-Temporal–Frequency Feature Association Vector Results

Neural oscillations in different frequency bands are closely related to the cognitive states of the brain, which carry distinct “meanings” and “functions” [28,29,30]. Our study analyzed the correlations between spatio-temporal–frequency features by calculating Pearson coefficients among the microstate attributes (duration, coverage, occurrence) and brain network topological features (global efficiency, average clustering, average betweenness, average closeness, local efficiency, average degree centrality, average eigenvector centrality, average shortest path length, small-worldness) of visual, auditory, and audiovisual EEG across the 1–30 Hz, delta, theta, alpha, and beta frequency bands. Figure 6 displays the correlation analysis results between microstates and brain networks for the six microstates in the 1–30 Hz frequency band. Brain network topological properties exhibit strong intrinsic synergies among themselves, while microstate parameters demonstrate similar internal correlations. Under the same conditions, significant differences are observed in the correlations between microstate parameters and brain network topology for the three types of stimuli (A, V, and AV). For example, the correlation matrix of brain activities corresponding to MS1 shows that the correlation of AV brain network parameters and the microstate parameters is relatively low, while the correlation of the single-modal A and V brain activities is relatively high. Particularly, the correlation for the brain network parameters of A and V brain activities is close to 1. This result indicates that there are differences in the brain network topologies of single-modal and dual-modal brain activities, reflecting that the neural mechanisms of information processing are different. Similarly, for MS2 and MS3, AV stimuli generally show lower correlation levels compared to unimodal A and V stimuli, reflecting distinct brain activities during the processing of different types of stimuli.

We further compared the microstate parameters and brain network topological attributes under the three modal stimuli, with the results shown in Figure 7. The auditory modality exhibited higher centrality, clustering coefficient, and efficiency, while the visual modality showed higher values in the average shortest path length and duration. The majority of features in the audiovisual modality were distributed between those of A and V. As shown in Figure 7, the duration, coverage, and occurrence for A, V, and AV also present different distributions. The comparison results reflect differences in brain activity during the processing of the three modalities, and these differences can be captured by microstate parameters and brain network topological attributes.

Based on the above results, to comprehensively characterize the spatio-temporal characteristics of EEG activity across multiple frequency bands, this study constructed an integrated vector that fuses multi-dimensional spatio-temporal–frequency features and their intrinsic associations. This vector integrates three types of microstate features and nine types of brain network topological features across five frequency bands, forming a base feature layer. Additionally, two cross-dimensional correlation metrics—frequency band specificity index and microstate class importance weight—were introduced to characterize the prominence of features in the frequency band distribution and the contribution weight of different microstate categories to the overall features, respectively.

Specifically, the microstate features and brain network features from each frequency band are first used to construct a spatio-temporal plane. The feature planes from the five frequency bands are then integrated into a three-dimensional tensor structure. As shown in Figure 8, the resulting irregular vector not only encapsulates the original features but also explicitly incorporates the correlation information among these features across different dimensions. This vector serves as a high-dimensional representation with clear neurophysiological interpretability. In subsequent recognition models, an adaptive weighting mechanism is introduced to optimize the feature fusion process, thereby effectively enhancing the classification performance for attention states or cognitive tasks.

3.2. Audiovisual Brain Activity Recognition Results

This experiment was conducted on a computer equipped with an NVIDIA GeForce RTX 5060 GPU, an Intel Core i9-14900HX processor, and 16 GB of RAM, running a 64-bit Windows 11 operating system. The software environment included Python 1.9.13 and MATLAB R2022a, which were used for data preprocessing, feature extraction, model training, and validation.

Model training employed five-fold stratified cross-validation, where the proportion of samples from each class in every fold remained consistent with the overall distribution, ensuring the robustness and reliability of the evaluation results. During training, the Adam optimizer was used with an initial learning rate of 0.0005 and a weight decay of 0.00001. The training process spanned 100 epochs, and the model with the best performance on the validation set was saved after each epoch. To address class imbalance, the focal loss function was adopted as the loss function, with a focusing parameter γ = 2.0 and class weights dynamically adjusted based on the training data’s class distribution. The input data consisted of spatio-temporal–frequency feature association vectors extracted from each frequency band (including EEG, delta, theta, alpha, and beta). The data dimensions across all frequency bands were kept consistent to ensure balanced input data.

Model performance was comprehensively evaluated using four metrics: accuracy (Acc), Precision (Pre), Recall (Rec), and F1-score (F1). All metrics are reported as the macro-average on the test set to ensure a balanced evaluation of the multi-class classification task.

3.2.1. Audiovisual Brain Activity Recognition Based on ATFN

During the model training process, this study evaluated five distinct scenarios utilizing data from different frequency bands and recorded the changes in the loss function and accuracy with respect to the number of training epochs. Figure 9 shows the loss curves for training and validation across all folds. Figure 10 shows the average loss curves for training and validation. Figure 11 shows the average accuracy curves for training and validation.

The validation results of the model’s classification performance are displayed in the ROC curves and confusion matrices, as shown in Figure 12 and Figure 13. ROC curve analysis indicates that as data from more frequency bands are utilized, the model’s classification performance for visual, auditory, and audiovisual brain activities gradually improves.

In order to further verify the cross-subject effectiveness of our proposed method, we apply leave-one-subject-out (LOSO). We achieve a cross-subject average Acc, Pre, Rec, and F1 of 95.67%, 96.32%, 95.67%, and 95.97%, with standard error of the mean (SEM) values of 0.023, 0.051, 0.024, and 0.048, respectively. The results show that our method is relatively stable in a cross-subject condition.

3.2.2. Ablation Study

To investigate the contribution of spatio-temporal–frequency feature associations to model performance, this study designed a set of comparative ablation experiments evaluating the classification effectiveness when correlation features were excluded (i.e., only original multi-band features were used without incorporating feature correlation analysis and microstate class importance weighting). The experimental results are shown in Table 2 and Table 3 below. Under two conditions—with and without feature correlations—we sequentially added features from the theta, alpha, delta, and beta frequency bands to the feature set and recorded the model’s performance on the test set.

The experimental results indicate that when feature correlations are not incorporated, model performance improves gradually as the number of included frequency bands increases. When only the theta band is included, accuracy (Acc) is 0.5202, and the F1-score is 0.5181. With the sequential addition of the alpha, delta, and beta bands, model performance improves step by step, reaching its peak when all bands are included (Acc = 0.7234; F1 = 0.7201). After incorporating feature correlations, the overall model performance improves significantly, particularly when multiple bands are used together. With only the theta band, Acc is 0.6061, and F1 is 0.6016. When all frequency bands are included and their correlation information is fused, the model achieves its highest performance (Acc = 0.9697; F1 = 0.9696), significantly outperforming the results without feature correlations. These findings demonstrate that the fusion of multi-band features and the inclusion of their correlations play a crucial role in enhancing the discriminative ability of the brain network classification model. They also further confirm that microstate and brain network features across various frequency bands contribute substantially to the recognition of audiovisual brain activities.

To evaluate the contribution of each core module in the proposed model, this study conducted systematic module ablation experiments. The results are shown in Table 4.

The experimental results indicate that model performance improved significantly as modules were progressively added. When only the baseline structure (i.e., without any enhancement modules) was used, the model achieved an accuracy of 0.8765 and an F1-score of 0.8758, demonstrating that the basic architecture already possessed a certain level of classification capability. After introducing the feature dynamic weighting mechanism, the accuracy increased to 0.9124, indicating that this module effectively identifies and enhances discriminative features while suppressing noise or redundant information. With the further addition of the feature encoding module, the accuracy reached 0.9452, reflecting the module’s key role in extracting and fusing multi-dimensional spatio-temporal features and enhancing the model’s ability to distinguish temporal and spatial patterns in EEG signals. Finally, incorporating the self-attention mechanism optimized the model performance to its peak (accuracy: 0.9697; F1-score: 0.9696), proving its effectiveness in capturing long-range dependencies and further refining feature representation and interaction mechanisms. Overall, each module contributed distinctly to the performance improvement and exhibited good complementarity and synergy, validating the rationality and effectiveness of the ATFN model’s structural design. In conclusion, this ablation experiment verified the effectiveness and necessity of the proposed modules, providing empirical support for the model’s structural design.

4. Conclusions

This study integrates EEG microstates, brain networks, and their feature correlations to build a symmetrical spatio-temporal–frequency feature association vector that characterizes the brain activity involved in auditory, visual, and audiovisual information processing. By using the feature association vector, we develop an Adaptive Tensor Fusion Network (ATFN) model, which achieves an accuracy of 96.97% in recognizing A, V, and AV brain activity, providing a valuable reference for brain–computer interfaces, neurological disease diagnosis, and related fields. However, this study still has certain limitations. Firstly, the model’s performance in cross-subject generalization remains unstable. Due to significant individual differences in EEG signals, the current model exhibits considerable variability in classification performance across different subjects, which limits its potential for practical cross-user applications. Secondly, there is room for improvement in the multi-band feature fusion strategy. Although multi-band microstates, brain network features, and correlation features were incorporated, issues related to redundancy and noise in high-dimensional features have not been fully resolved. Some frequency bands or feature dimensions may contribute limited or even interference effects to classification. Future work will focus on enhancing cross-subject generalization, promoting practical applications in the fields of brain–computer interfaces and clinical medicine; using other classic deep learning models for the classification of visual and auditory brain activities, comparing them with our proposed method; and optimizing the structure of the ATFN to enhance its ability to integrate multi-dimensional features and reduce the complexity of the model. We will further analyze and verify the contributions of brain networks and microstates to brain activity recognition and enhance the interpretability of our method, providing theoretical references for studying the neural mechanisms of brain information processing.

Author Contributions

Conceptualization, methodology, Y.X.; software, validation, visualization, L.Z.; writing—original draft preparation, writing—review and editing, C.W., B.S. and C.L. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Changchun University of Science and Technology (Approval No.: 201705024) on 2 May 2017.

Informed Consent Statement

Informed consent for participation was obtained from all subjects involved in this study.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 A schematic of the experimental design. The participants were instructed to look at the central fixation (a cross). A, auditory; V, visual; AV, audiovisual.

View Image -

Figure 2 Flowchart of spatio-temporal–frequency association vector generation.

View Image -

Figure 3 The structure of the ATFN model.

View Image -

Figure 4 Clustering results of 1–30 Hz EEG microstates.

View Image -

Figure 5 PLI matrices of 1–30 Hz EEG signals for AV, A, and V.

View Image -

Figure 6 Feature correlations of AV, A, and V in 1–30 Hz frequency band.

View Image -

Figure 7 Feature comparison of microstates and brain networks in 1–30 Hz frequency band for AV, A, and V.

View Image -

Figure 8 Visualization representation of spatio-temporal–frequency association vector.

View Image -

Figure 9 Loss curves for training and validation across all folds.

View Image -

Figure 10 Average loss curves for training and validation.

View Image -

Figure 11 Average accuracy curves for training and validation.

View Image -

Figure 12 ROC curves of brain activity classification for AV, A, and V.

View Image -

Figure 13 Confusion matrix of brain activity classification for AV, A, and V.

View Image -

Brain network features of 1–30 Hz frequency band for AV, A, and V.

Brain Network Features Stimulus BN1 BN2 BN3 BN4 BN5 BN6
GlobalEfficiency AV 0.51 0.59 0.5 0.54 0.51 0.51
A 0.55 0.54 0.5 0.54 0.54 0.54
V 0.52 0.5 0.52 0.52 0.52 0.51
AvgClustering AV 0.38 0.35 0.36 0.37 0.4 0.37
A 0.42 0.4 0.35 0.39 0.4 0.42
V 0.36 0.36 0.38 0.39 0.39 0.34
AvgBetweenness AV 0.01 0.01 0.01 0.02 0.01 0.01
A 0.01 0.02 0.01 0.02 0.02 0.01
V 0.01 0.01 0.01 0.01 0.01 0.01
AvgCloseness AV 0.46 0.47 0.44 0.48 0.45 0.46
A 0.49 0.48 0.44 0.48 0.48 0.48
V 0.46 0.45 0.46 0.46 0.46 0.45
LocalEfficiency AV 0.56 0.53 0.55 0.56 0.59 0.55
A 0.62 0.6 0.55 0.58 0.6 0.61
V 0.56 0.53 0.57 0.57 0.57 0.52
AvgDegreeCent AV 0.11 0.11 0.11 0.11 0.11 0.11
A 0.11 0.11 0.11 0.11 0.11 0.11
V 0.11 0.11 0.11 0.11 0.11 0.11
AvgEigenCent AV 0.21 0.21 0.21 0.21 0.21 0.21
A 0.21 0.22 0.22 0.22 0.21 0.22
V 0.22 0.22 0.22 0.21 0.21 0.22
AvgShortestPath AV 1.93 1.95 1.96 1.95 1.95 1.94
A 1.92 1.96 1.86 2.02 1.96 1.92
V 1.98 1.92 1.91 1.98 1.97 1.9
SmallWorldness AV 0.81 0.83 0.83 0.85 0.89 0.82
A 0.91 0.93 0.86 0.9 0.93 0.98
V 0.91 0.78 0.85 0.88 0.91 0.82

Data ablation results (without feature correlation calculation).

Unfiltered Theta Alpha Delta Beta Acc Pre Rec F1
× × × × 0.5202 0.5286 0.5202 0.5181
× × × 0.4878 0.4982 0.4878 0.4821
× × 0.5218 0.5386 0.5218 0.5182
× 0.6328 0.6453 0.6328 0.6295
0.7234 0.7312 0.7234 0.7201

Data ablation results (with feature correlation calculation).

Unfiltered Theta Alpha Delta Beta Acc Pre Rec F1
× × × × 0.6061 0.6103 0.6061 0.6016
× × × 0.6364 0.6468 0.6364 0.6323
× × 0.7172 0.7448 0.7172 0.7132
× 0.8712 0.8766 0.8712 0.8663
0.9697 0.9722 0.9697 0.9696

Ablation experiment results for model structure.

Baseline Feature Fusion and Selection Feature Encoding Attention Enhancement Acc Pre Rec F1
× × × 0.8765 0.8794 0.8765 0.8758
× × 0.9124 0.9153 0.9124 0.9118
× 0.9452 0.9481 0.9452 0.9448
0.9697 0.9722 0.9697 0.9696

References

1. D’cRoz-Baron, D.F.; Bréchet, L.; Baker, M.; Karp, T. Auditory and visual tasks influence the temporal dynamics of EEG microstates during post-encoding rest. Brain Topogr.; 2021; 34, pp. 19-28. [DOI: https://dx.doi.org/10.1007/s10548-020-00802-4]

2. Wolpaw, J.R.; Birbaumer, N.; McFarland, D.J.; Pfurtscheller, G.; Vaughan, T.M. Brain-computer interfaces for communication and control. Clin. Neurophysiol.; 2002; 113, pp. 767-791. [DOI: https://dx.doi.org/10.1016/S1388-2457(02)00057-3]

3. Zhang, H.; Jiao, L.; Yang, S.; Li, H.; Jiang, X.; Feng, J.; Zou, S.; Xu, Q.; Gu, J.; Wang, X. . Brain–computer interfaces: The innovative key to unlocking neurological conditions. Int. J. Surg.; 2024; 110, pp. 5745-5762. [DOI: https://dx.doi.org/10.1097/JS9.0000000000002022]

4. Orban, M.; Elsamanty, M.; Guo, K.; Zhang, S.; Yang, H. A review of brain activity and EEG-based brain–computer interfaces for rehabilitation application. Bioengineering; 2022; 9, 768. [DOI: https://dx.doi.org/10.3390/bioengineering9120768] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36550974]

5. Baspinar, U.; Varol, H.S.; Yildiz, K. Classification of hand movements by using artificial neural network. Proceedings of the 2012 International Symposium on Innovations in Intelligent Systems and Applications; Trabzon, Turkey, 2–4 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1-4.

6. Huettel, S.A.; Song, A.W.; McCarthy, G. Functional Magnetic Resonance Imaging; Sinauer Associates: Sunderland, MA, USA, 2009.

7. Luck, S.J. An Introduction to the Event-Related Potential Technique; MIT Press: Cambridge, MA, USA, 2014.

8. Polich, J. Updating P300: An integrative theory of P3a and P3b. Clin. Neurophysiol.; 2007; 118, pp. 2128-2148. [DOI: https://dx.doi.org/10.1016/j.clinph.2007.04.019] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17573239]

9. Klimesch, W. EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Res. Rev.; 1999; 29, pp. 169-195. [DOI: https://dx.doi.org/10.1016/S0165-0173(98)00056-3] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/10209231]

10. Blankertz, B.; Tomioka, R.; Lemm, S.; Kawanabe, M.; Muller, K.-R. Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process. Mag.; 2008; 25, pp. 41-56. [DOI: https://dx.doi.org/10.1109/MSP.2008.4408441]

11. Wang, H.; Xie, J.; Liu, J.; Bo, H. Analysis of the activity of the fine motor imagery cortex under different guidance methods. Electron. Meas. Technol.; 2025; 48, pp. 106-113. [DOI: https://dx.doi.org/10.19651/j.cnki.emt.2417178]

12. Liu, B.; Chen, X.; Gao, X. A review of feature extraction methods for EEG-based brain-computer interfaces. Front. Neurosci.; 2020; 14, 589762.

13. Michel, C.M.; Koenig, T. EEG microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: A review. Nat. Rev. Neurosci.; 2011; 12, pp. 40-53.

14. Xi, Y.; Zhang, L.; Li, C.; Lv, X.; Lan, Z. Time-frequency feature calculation of multi-stage audiovisual neural processing via electroencephalogram microstates. Front. Neurosci.; 2025; 19, 1643554. [DOI: https://dx.doi.org/10.3389/fnins.2025.1643554]

15. Erciyes, K. Complex Brain Networks: A Graph-Theoretical Analysis. Bioinformatics of the Brain; CRC Press: London, UK, 2024; pp. 224-249.

16. Hutchison, R.M.; Womelsdorf, T.; Allen, E.A.; Bandettini, P.A.; Calhoun, V.D.; Corbetta, M.; Della Penna, S.; Duyn, J.H.; Glover, G.H.; Gonzalez-Castillo, J. . Dynamic functional connectivity: Promise, issues, and interpretations. NeuroImage; 2012; 62, pp. 1129-1138. [DOI: https://dx.doi.org/10.1016/j.neuroimage.2013.05.079] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23707587]

17. Delorme, A.; Makeig, S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods; 2004; 134, pp. 9-21. [DOI: https://dx.doi.org/10.1016/j.jneumeth.2003.10.009] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15102499]

18. Poulsen, A.T.; Pedroni, A.; Langer, N.; Hansen, L.K. Microstate eeglab toolbox: An introductory guide. bioRxiv; 2018; [DOI: https://dx.doi.org/10.1101/289850]

19. Nagabhushan Kalburgi, S.; Kleinert, T.; Aryan, D.; Nash, K.; Schiller, B.; Koenig, T. MICROSTATELAB: The EEGLAB toolbox for resting-state microstate analysis. Brain Topogr.; 2024; 37, pp. 621-645. [DOI: https://dx.doi.org/10.1007/s10548-023-01003-5]

20. Stam, C.J.; van Straaten, E.C.W. Go with the flow: Use of a directed phase lag index (dPLI) to characterize patterns of phase relations in a large-scale model of brain dynamics. Neuroimage; 2012; 62, pp. 1415-1428. [DOI: https://dx.doi.org/10.1016/j.neuroimage.2012.05.050] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22634858]

21. Taud, H.; Mas, J.F. Multilayer perceptron (MLP). Geomatic Approaches for Modeling Land Change Scenarios; Springer International Publishing: Cham, Switzerland, 2017; pp. 451-455.

22. Wang, Q.; Huang, J.; Meng, Y.; Shen, T. DF2Net: Differential feature fusion network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.; 2024; 17, pp. 10660-10673. [DOI: https://dx.doi.org/10.1109/JSTARS.2024.3403863]

23. Kingphai, K.; Moshfeghi, Y. On time series cross-validation for deep learning classification model of mental workload levels based on EEG signals. Proceedings of the International Conference on Machine Learning, Optimization, and Data Science; Tuscany, Italy, 18–22 September 2022; Springer Nature: Cham, Switzerland, 2022; pp. 402-416.

24. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4−9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2018.

25. Jajcay, N.; Hlinka, J. Towards a dynamical understanding of microstate analysis of M/EEG data. NeuroImage; 2023; 281, 120371. [DOI: https://dx.doi.org/10.1016/j.neuroimage.2023.120371]

26. Asha, S.A.; Sudalaimani, C.; Devanand, P.; Alexander, G.; Arya, M.L.; Sanjeev, V.T.; Ramshekhar, N.M. Analysis of EEG microstates as biomarkers in neuropsychological processes—Review. Comput. Biol. Med.; 2024; 173, 108266.

27. Perrottelli, A.; Giordano, G.M.; Koenig, T.; Caporusso, E.; Giuliani, L.; Pezzella, P.; Bucci, P.; Mucci, A.; Galderisi, S. Electrophysiological Correlates of Reward Anticipation in Subjects with Schizophrenia: An ERP Microstate Study. Brain Topogr.; 2024; 37, pp. 571-589.

28. Zuo, Y.; Wang, Z. Neural oscillations and multisensory processing. Advances of Multisensory Integration in the Brain; Springer Nature: Singapore, 2024; pp. 121-137.

29. Zhan, C.; Wang, Q.; Wang, W.; Lu, X.; Fei, S.; Chen, Z.; Chen, Y. Theta functional connectivity alterations related to executive control in refractory temporal lobe epilepsy. Epilepsy Res.; 2025; 217, 107620. [DOI: https://dx.doi.org/10.1016/j.eplepsyres.2025.107620] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/40609118]

30. Wang, B.; Li, P.; Li, D.; Niu, Y.; Yan, T.; Li, T.; Cao, R.; Yan, P.; Guo, Y.; Yang, W. . Increased functional brain network efficiency during audiovisual temporal asynchrony integration task in aging. Front. Aging Neurosci.; 2018; 10, 316. [DOI: https://dx.doi.org/10.3389/fnagi.2018.00316] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30356825]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.