Content area

Abstract

Beyond utilitarian purpose, architecture possesses profound aesthetic value, serving not only as structures to inhabit but as spaces to be perceived, interpreted, and appreciated. Unity and variety are among the predictors that influence aesthetics, traditionally viewed as oppositional but jointly predictive of aesthetic preference. The present research investigates how unity and variety jointly predict aesthetic preference in the domain of Chinese carved windows and how these relationships change from isolated elements to window–background compositions. Two quasi-experimental studies were conducted. Study 1 asked participants to rate ten digitally rendered Chinese windows on unity, variety, and aesthetic preference. Study 2 used the same ten windows embedded in either a stylistically congruent Chinese background or an incongruent Western background. Across both studies (N = 797), results showed that unity and variety were negatively correlated yet both positively associated with aesthetic preference, with unity exerting the stronger predictive influence. The trade-off between unity and variety decreased once windows were placed in backgrounds, and stylistically incongruent Western backgrounds reliably suppressed perceived unity and inflated perceived variety. The findings are consistent with the Unity-in-Variety principle and highlight how architectural context influences unity–variety judgments. The studies provide strategies toward a more nuanced, architecture-specific understanding of unity and variety in aesthetic preference.

Full text

Turn on search term navigation

1. Introduction

Aesthetic judgment is the core of visual experience, activating instantaneously and influencing perception or following emotion from the very first glance. Within a very short time of seeing an object, the brain has already fused cues such as order, complexity, and familiarity into a preference signal, without conscious awareness [1,2,3]. Because the judgment steers a cascade of downstream behaviors, such as approach or avoidance movements [4] and product choice [5], understanding what and how to elicit preference is important in both theory and practice.

Cognitive psychology suggests that aesthetic judgments rely on perceptual heuristics, notably, the detection of unity and variety. Unity promotes perceptual fluency through consistency, repetition, and structural coherence [2], whereas variety sustains engagement by introducing contrast, novelty, and complexity [6,7]. These two forces, though often viewed as oppositional, jointly shape how visual structures are evaluated and judged. Yet their interaction is rarely symmetrical: excessive unity risks monotony; excessive variety risks cognitive overload. This tension forms a principle of beauty, UiV, in which optimal aesthetics arise when variety is organized within the unity [8].

Despite the long-standing theoretical importance of unity and variety, some critical gaps remain unresolved in aesthetic research, particularly concerning architectural perception. Empirical UiV studies have mostly used single-layer, context-free stimuli, such as abstract patterns [9], artworks [10], and consumer products [8,11], which do not reflect the nested part–whole relationships inherent in architecture. Architectural perception unfolds across multiple spatial levels, from local element details to global composition [12,13]. As a result, aesthetic judgments emerge not from isolated features but from their locations within hierarchical configurations [14]. To date, the UiV research has not addressed this multi-level architectural structure, leaving unclear whether findings from object-based studies generalize to building environments.

The unclear aspect concerns the role of contextual congruence in architecture. As noted above, architecture is rarely perceived in isolation; elements are embedded within larger stylistic contexts. Although context can alter visual integration and perceived processing ease, and may cue learned style schemas (i.e., prototypes [3,15]), previous UiV work has paid little attention to how stylistically congruent versus incongruent contexts reshape perceived unity and variety in architectural contexts. It therefore remains unclear how UiV operates when an element is embedded in a congruent background versus an incongruent background. Understanding how unity and variety interact within nested contexts, and how their relative weights predict aesthetic preference, is essential for a more nuanced theory of architectural aesthetics.

Besides, the relative predictive strength and trade-off between unity and variety may change across hierarchical levels of perception, but this possibility has not been answered directly. Previous work often stops at examining a single stimulus. Yet, for unity and variety, it is still unknown whether their relationship changes from element to composition and their predictive effect on aesthetic preference changes from element to composition.

Addressing these gaps, the present research investigates how unity and variety jointly predict aesthetic preference for architectural components across hierarchical levels (isolated windows vs. window–background compositions) and across congruent versus incongruent contextual settings. We aim to clarify whether UiV is generalized in architecture, how contextual congruence influences unity and variety, and whether the balance between unity and variety changes from element to composition.

Specifically, this study aims to address the following research questions:

RQ 1: For the individual element, a window, what is the relationship between perceived unity and variety, and how does each of these perceptions predict aesthetic preference?

RQ 2: To what extent does adding the background impact the perceived unity and variety compared with viewing the individual window?

RQ 3: Does the background influence (a) the relationship between perceived unity and variety and (b) the predictive effects of unity and variety on aesthetic preference?

The present research operationalizes architectural hierarchy in two adjacent levels. Study 1 focuses on isolated Chinese carved windows as individual elements (element level), examining the relationships between unity, variety, and each predictor of aesthetic preference without contextual information. Study 2 then embeds the same windows into a stylistically congruent (Chinese) or incongruent (Western) background, creating window–background compositions (composition level). This stepwise design allows us to examine how adding a minimal yet meaningful architectural context influences unity and variety perceptions and their predictive effects on aesthetic preference, rather than attempting to cover the entire hierarchy from elements to complete building environments in a single study.

As visual attention and scene processing show systematic cultural variability, the unity–variety balance observed in one cultural group should not be assumed to generalize universally. Accordingly, we treat cross-cultural generalization as an open empirical question and focus here on characterizing unity/variety–preference relations within a homogeneous sample to maximize internal validity.

It is noted that although prototype/typicality and processing fluency theories are relevant to architectural aesthetic judgment, the present studies do not measure typicality, mental prototypes, or fluency indices. Therefore, we treat these mechanisms as explanatory hypotheses and focus on the observed relationships among perceived unity, perceived variety, and aesthetic preference.

2. Literature Review

2.1. The Importance of Aesthetics in Architecture

Even though architecture has long been valued and defined by its utilitarian capacity to shelter, structure, and support human activity, its significance extends far beyond physical function. The aesthetic dimension is equally vital, as highlighted by aesthetic response theory [16]. This recognition has expanded buildings as not merely functional inhabitation but also as aesthetic and psychological experiences that shape how people feel, think, and relate to space [17,18]. Thus, the aesthetic value of buildings should not be a secondary consideration.

The significance of aesthetic value extends beyond mere visual pleasure. Studies have shown that aesthetically coherent environments can improve cognitive performance and reduce stress, highlighting the psychological benefits of thoughtfully designed spaces [19]. Neuroscientific findings corroborate these insights, demonstrating that aesthetically appealing architectural designs activate neural regions associated with emotional reward and cognitive satisfaction, whereas visually chaotic environments evoke negative emotional and cognitive responses [4]. These empirical findings strengthen theoretical assertions about aesthetics’ integral role in shaping human experiences within built environments.

Importantly, aesthetics and function should not be viewed as dichotomous; aesthetic value arises organically from the resolution of functional demands and vice versa [20]. Evidence shows that functional features of visuals (e.g., clear circulation paths) elevate aesthetic experience [19]. Spence [21] emphasizes the joint role of the visual and functional in shaping emotional and cognitive responses. When people can effortlessly grasp how to move, rest, or interact in space, sensorimotor fluency elicits pleasure much like perceptual fluency does in the visual domain [2]. This emphasizes that the functional information can be carried by vision [19,22]. Similarly, attributes such as scale, proportion, rhythm, materiality, and particularly the dynamic interplay of light and shadow jointly and significantly affect how spaces are perceived and experienced, illustrating the inseparability of aesthetic and functional qualities [23]. In various architectures, visual coherence and expressive form emerge from the resolution of functional improvements. As Spence [21] argues, functional clarity enhances perceptual legibility, which in turn supports aesthetic value. Meanwhile, formal choices such as material articulation, spatial rhythm, and proportional balance can facilitate wayfinding, encourage comfort, and shape user behavior [24]. Architecture achieves both utility and beauty not by isolating these concerns but by aligning them through integrated design logic.

2.2. Unity and Variety in Preference Prediction

Architectural aesthetics has long been concerned with the principles of how architectural forms are perceived as aesthetically compelling, and Gestalt psychology provides a foundational lens for understanding. The Gestalt laws of perceptual organization, including proximity, similarity, continuity, and closure, highlight the human tendency to simplify and structure sensory input into coherent wholes. This principle of perceptual economy suggests that people naturally seek order and unity in complex visual scenes, reducing cognitive load by integrating diverse elements into an organized configuration [25,26]. Therefore, unity has been widely recognized as a foundational predictor in both traditional and contemporary design theory [8,27]. Unity refers to the perception of a design as an integrated whole, often achieved through repetition, symmetry, alignment, and formal continuity [8].

In the architecture context, unity enables fluency, emphasizes structural logic, and contributes to a sense of order and calm. For instance, the use of consistent proportions, axial alignment, and modular repetition in architecture exemplifies how unity leads to visual organization and spatial experience. Nasar [28] found that visual order and clarity were positively associated with perceived beauty in urban street scenes and facades, supporting the idea that unity facilitates environmental comprehension. Similarly, Vartanian et al. [4] showed that architectural interiors with contour coherence and curvilinear features elicited more favorable affective responses than those with disjointed or angular geometry. These studies highlight the significance of perceptual fluency in influencing aesthetic judgments of built environments, particularly when spatial arrangements promote structural continuity.

However, as our environment is made up of diverse elements, our senses have evolved to cope precisely with this variety of information. If perceptual input were overly unified, the senses would get dulled [7,29]. Variety refers to visual contrast, rhythm, and complexity, breaking monotony and enhancing perceptual engagement [1,7,8,11]. Research has examined the role of visual complexity and variety in enhancing architectural appeal. Gifford et al. [30] reported that impactful levels of visual complexity, interpreted as variation in material, color, and form, are preferred. Similarly, Hagerhall et al. [31] suggested that fractal complexity in naturalistic architectural features could predict viewer preference, suggesting that variety can be aesthetically rewarding when bounded by coherent patterns. Variety achieves preference through deliberate deviations in form, color, texture, or spatial arrangement [8,11]. In architecture, variety manifests in the use of asymmetry, material differentiation, or articulation of secondary elements, contributing to richness and dynamism in visual experience [28].

Empirical studies have offered cognitive explanations for why unity and variety matter to aesthetic preference. The processing fluency theory suggests that viewers tend to prefer stimuli that are perceptually easy to process. Unity enhances such fluency by providing regularity and predictability, which reduces mental effort and fosters positive effects [2,32]. Conversely, Berlyne’s [7] Collative-Motivation model argues that variety maintains attention and interest by introducing complexity and avoiding redundancy. Variety benefits brain arousal and enhances aesthetic preference [8]. Although often viewed as opposed, unity and variety are not mutually exclusive but coexist and are jointly predictive of aesthetic preference. Empirical studies demonstrate that aesthetic preference peaks when unity and variety are successfully balanced, as in the UiV principle, one of the oldest-known concepts [8]. This principle posits that optimal visual experiences arise when elements are diverse enough to sustain engagement but sufficiently organized to form a coherent whole. Empirical studies such as car keys [11], paintings [10], and abstract visual forms [33] consistently support this principle, demonstrating that neither unity nor variety alone predicts preference as strongly as their interaction.

Cross-cultural empirical aesthetics suggests that core aesthetic tendencies may be partly shared, yet the perceptual weighting of visual information can vary systematically across cultures [34]. Cross-cultural cognition research often distinguishes holistic versus analytic processing: East Asian observers tend to attend more to the overall field and contextual relations, whereas Western observers tend to focus more on focal objects and their attributes [35,36]. These differences are not only conceptual but also observable in visual behavior. For example, eye-tracking studies of scene perception show that American participants fixate earlier and longer on focal objects, whereas Chinese participants make more saccades to background regions, indicating stronger context sensitivity [37]. Complementary evidence from change-blindness paradigms likewise suggests cultural variation in sensitivity to contextual versus focal changes [36]. In environmental aesthetics, cross-cultural comparisons also report both convergence and divergence: people across cultures often prefer orderly streetscapes, yet Japanese and U.S. participants can differ in evaluations of the same urban street scenes [38]. There are critical reviews of cross-cultural empirical aesthetics, which further conclude that aesthetic principles may show regularity, but effect sizes and boundary conditions vary by culture, task, and stimulus domain [34,39]. For architectural aesthetics, this implies that a preferred unity and variety balance is unlikely to be fixed: culturally learned architectural schemas (style prototypes) and culturally shaped attention to context may shift which cues are treated as the organizing structure (unity) versus decorative differentiation (variety) [34,35]. Therefore, the findings, which are based on a single cultural group, should be generalized across cultures with caution.

Taken together, although the UiV principle has been investigated in several domains, its transfer to architecture is rarely examined. The context-complex dependency [40,41], together with evidence that attentional weighting of focal versus contextual information may vary across cultures, suggests that the unity–variety balance associated with preference may not straightforwardly transfer to architecture. Addressing this gap, the present study empirically investigates the UiV relationship in architectural stimuli under systematically varied contexts, thereby clarifying whether and how the UiV principle operates in architectural aesthetics.

2.3. The Complex Context in Architecture

Visual perception in architecture is inherently hierarchical and unfolds across a nested set of spatial relationships that link local element details to global composition. As a result, in architecture, unity and variety are thus judged not by isolated components but by how they contribute to overall order. Classic Gestalt studies demonstrate that people extract global structural information before focusing on finer details and then dynamically alternate their perceptual focus between parts and wholes [6,12,13,14]. This multilevel perceptual system has been documented in façade perception, where attention cycles between broader compositional rhythms and elements [42]. Unlike flat images or isolated products, architectural scenes are complex, and observers rarely evaluate elements in isolation; instead, they perceive nested relationships between parts and wholes [14]. As such, the attention is assigned to the interpretation of the various elements, and certain elements become less dominant [19,43,44].

Even if certain elements become less dominant, they still contribute to the overall composition. Elements that are prominent but contribute to the overall order within a composition are processed fluently, which can enhance their influence on preference [4,44,45]. Conversely, inconsistent elements can impair perceptual fluency, leading to reduced aesthetic engagement. The perceptual weight of any element changes with its compositional context [33], and preference ultimately reflects the integration of both bottom-up stimulus features and top-down contextual cues [41].

In architecture, this kind of multilevel perception is always accompanied by the interaction of unity and variety, emphasizing the variety compression. When judging unity and variety in architecture, people judge less by strict element-to-element judgements than by how well a building instantiates a prototype, an internalized schema for what a building normally looks like [15]. As architectural prototypes already include multi-material assemblies (e.g., brick walls, wooden doors, glass windows, and tiled roofs), observers accept variety to some degree as long as the elements remain within the prototype’s tolerance band. That is, elements’ heterogeneity in architecture is processed as ‘variety within the prototype’ to some degree, thereby judged as unity [12,13,46]. The explanation is the utilitarian capacity itself, as a multi-layered system whose subsystems must meet distinct structural, environmental, and functional requirements. Doors and windows must admit light and ventilation, and façades must shed water and bear loads, all tasks that inherently call for different materials and construction logics. When our brain understands this, the visual system effectively compresses some conflicts during coherence judgments.

In the limited empirical research on the UiV principle, most have treated unity and variety in object-based, context-free settings, such as single products, and have overlooked these multilevel perception dynamics. Lights [8] or car key [11] can be conceived as a single-layer product punctuated by a limited set of elements. Because shape, color, and material often occupy the same visual plane, any abrupt material or chromatic shift competes directly with diagnostic form cues, increasing or decreasing perceived unity or variety and, in turn, aesthetic preference [47,48]. Thus, while the UiV principle can guide such product design, the applicability in architecture is questionable, where variety is structurally baked into the prototype to some degree.

Even though there is research employing stimuli with multilevel perception, such as websites [39], it is still doubtful whether it is applicable in architecture. As discussed earlier, it can be considered that websites share some characteristics with architecture. Both organize discrete elements, such as headers, navigation bars, and content panes, into a hierarchy, much like doors and windows on a façade. However, two types of stimuli are distinguished. The website is a highly interactive digital medium displayed on a computer or smartphone and often linking to hedonic value: users expect immediate emotional rewards from novel visual language and dynamic effects [49,50,51]. Architecture also has hedonic value, but utilitarian value is always considered first, emphasizing reliability and being more focused on safety and durability. Furthermore, prototypes mature at wildly different tempos across objects; prototypes exert far weaker constraints on webpage aesthetics than on architecture. The web is a few decades old since its invention, and empirical studies show that users’ mental model of a prototype webpage evolves swiftly with each new convention [52]. By contrast, architecture has accrued over centuries of structural necessity and cultural codification.

Crucially, this raises specific challenges for applying the UiV principle to architecture. Addressing this gap requires analyzing not only whether stimuli are perceived as united or varied, but also where in the hierarchy such perceptions occur and how they interact across levels. The present study aims to address this gap by analyzing the aesthetic preference for visual hierarchy, individual architectural elements (windows) and integrated compositions (windows and backgrounds), to better investigate the UiV principle.

In conclusion, these insights imply that the UiV principle in architecture must be examined across nested perceptual levels rather than at a single, isolated scale. Accordingly, the present study adopts a two-step empirical strategy: first, testing unity–variety–preference relationships at the element level using isolated windows (Study 1), and second, examining how these relationships change when the same elements are embedded in window–background compositions (Study 2).

3. Study 1: Element-Level Investigation of Chinese Carved Window

The purpose is to explore the relationship between unity and variety and how unity and variety influence aesthetic preferences for the individual element, a window, using questionnaires. Study 1 uses a within-subjects experimental design. Each participant rates all stimuli, ensuring direct comparisons between conditions while minimizing individual differences for direct comparisons of outcomes within the same individuals. This is a common design in previous studies for investigating aesthetic preferences [1,53,54]. After obtaining approval from the Ethics Review Committee, this study officially launched the research project. The process, from participant recruitment to data collection, spanned from 17 May 2025 to 16 June 2025. This phase included recruitment, screening, and data collection of participants, ensuring smooth progress and compliance with ethical review requirements.

3.1. Stimuli

Chinese carved windows were chosen as stimuli in study 1 because they are good in the controllability of unity and variety [55,56], with distinctive aesthetic characteristics. The clear hierarchical boundaries contribute to identifying the Chinese carved window as an individual element; meanwhile, they are rich in unity and variety, helping slots visibly fit into a larger composition without discordance [57]. Given this, digital Chinese carved window stimuli were created. While these digitally rendered stimuli inevitably simplify real architectural scenes, this approach was chosen deliberately to maximize internal validity. By standardizing viewpoint, luminance, color, and surrounding context, we were able to isolate unity- and variety-related formal properties and minimize confounds from material textures, shadows, or neighboring façade elements. Meanwhile, such controlled stimuli are common in experimental aesthetics.

The stimuli were developed to represent varied levels of unity and variety, operationalized through controlled manipulation of visual features such as symmetry, repetition, complexity, and motif consistency. Following prior theoretical frameworks [6,7], unity and variety and their related concepts were defined and manipulated as follows.

For high-unity windows: These windows were designed with strong symmetry, consistent geometric frameworks, and repeated motif patterns (e.g., identical floral or lattice shapes). Visual elements were evenly distributed across the frame to convey harmony and perceptual coherence. High unity was achieved by minimizing variation in line weight, shape orientation, and motif scale. For variety windows: These windows featured diverse visual motifs, asymmetrical structures, or irregular geometric patterns. Elements differed in shape, size, direction, or complexity to evoke visual stimulation and complexity. While still inspired by traditional carving styles, high-variety stimuli deliberately break pattern consistency to maximize perceptual richness.

A preliminary set of high-resolution images with a wide range of unity and variety was established based on the criteria, resulting in a total of approximately 24 stimuli.

All images were then standardized using Adobe Illustrator (version 26.2; Adobe Inc., San Jose, CA, USA) and Photoshop (version 24.2; Adobe Inc., San Jose, CA, USA) to eliminate contextual noise and ensure consistent presentation. Each stimulus was presented against a neutral background at a fixed size and resolution to minimize extraneous perceptual influences such as lighting, scale, or surrounding architectural elements. All images were converted to grayscale to remove potential confounds introduced by color and to focus participants’ attention on form and composition alone.

Before the main experiment, a pilot study was conducted for stimulus validation. A separate group (excluded from the main experiment) of participants (n = 40) rated each stimulus on perceived unity and perceived variety using 7-point Likert scales. The stimuli that were reliably rated as unity or variety, according to predefined thresholds (mean rating ≥ 5 on the target dimension), were retained for the final set. In addition, to reduce the risk that this structural trade-off would fully determine the observed relationships, we deliberately retained several stimuli that received relatively high ratings on both unity and variety stimuli, ensuring that the final set covered a broad combination of unity–variety configurations. The final set included 10 window stimuli, as shown in Figure 1.

3.2. Study 1 Instrument

Study 1 uses a closed-ended questionnaire for the survey. Following the informed consent form, the official questionnaire consisted of two sections. The first part is demographic information (three questions: gender, major, and age). The second part is the aesthetic judgment part; participants assess each stimulus image and then score the unity, variety, and aesthetic preference.

The questionnaire used ten Chinese carved windows as the stimulus images. The stimuli used in the experiment were designed and drawn by researchers. Each stimulus window image was presented at an identical size, resolution, and viewing angle, with uniform backgrounds (neutral colors and no contextual details), removing potential confounds related to surrounding architectural features or environmental contexts. This selection aims to balance the presentation of both the unity and variety of the windows, ensuring a more controlled evaluation of aesthetic preferences in the formal study. Furthermore, these images are set to appear randomly in order to avoid sequential effects in questionnaire surveys. In this section, only the stimulus images are shown, and there is no text introduction or guide involved.

The scale of the questionnaire was derived from references in existing studies. In order to collect questionnaire participants’ assessment for unity, variety, and aesthetic preference of windows, the ‘pleasing to see’ seven-point Likert scale, developed and validated by Blijlevens et al. [58], was employed in this study. Furthermore, this scale provides English and Chinese versions, eliminating the need for translation, ensuring language integrity, and ensuring participants do not misunderstand. Aesthetic preference was assessed using three statements: ‘This is a beautiful window,’ ‘This is an attractive window,’ and ‘This window is pleasing to see.’ Unity was assessed with statements such as ‘This is a unified window,’ ‘This is an orderly window,’ and ‘This is a coherent window.’ Variety was assessed with three statements: ‘This window is made of different parts,’ ‘This window conveys variety,’ and ‘This window is rich in elements.’ Each image was assessed based on these nine statements, and participants rated their agreement with each statement using a seven-point Likert scale.

This study adapted the questionnaire from Blijlevens et al. [58]. A pilot study was conducted on 19 May 2025 to ensure the usability of the scale. The Cronbach’s Alpha coefficient for each dimension was above 0.826, indicating good reliability. Additionally, the Kaiser–Meyer–Olkin (KMO) value was 0.806, and Bartlett’s test of sphericity was significant (p < 0.01), indicating good validity.

3.3. Study 1 Participants

To ensure statistical power and sufficient data support for this study, the goal is to obtain a minimum of 384 valid samples [59]. To achieve this, this study designed the recruitment process, targeting residents from non-art- and architecture-related fields because those with art or design training may lead to bias in aesthetic evaluations [60,61]. This sampling choice increases internal consistency within a single cultural context, but it also defines a boundary condition: the observed effects may differ in samples from other cultural backgrounds or in participants with formal art/design training. The recruitment was in Guangzhou City, China, with participants randomly recruited from 5 districts. Recruitment was facilitated with the assistance of local volunteers, who helped distribute the recruitment documents through local social media to ensure broad participation. The compensation for participants was self-funded by the authors. Participation in this study was voluntary, and all the participants received financial compensation. Of the 482 participants in the recruitment, valid data was collected from 401 subjects, with 81 excluded due to inaccuracies (1. missing ratings for one or more stimuli; 2. the entire questionnaire contained ≥6 consecutive questions with the same score; and 3. alternating extreme responses: the number of adjacent transitions with |Δ| ≥ 5 exceeded 6 across the questionnaire), as shown in the Supplementary Materials. Among valid samples, participants ranged in age from 19 to 40 (M = 24.34, SD = 3.16), with an almost equal gender distribution: 211 males (52.62%) and 190 females (47.38%).

3.4. Study 1 Data Collection Procedure

The computer lab at the university was used for this study. We controlled consistency in the brand, size, and color differences of each monitor. The 482 participants were divided into ten groups for data collection. Before this study began, each participant was given a detailed informed consent form. The form included details about this study’s purpose, procedure, participants’ rights, and data confidentiality. It was emphasized that participation was entirely voluntary and that participants could withdraw from this study at any time. After fully understanding the study content and having no further questions, participants signed the informed consent form, indicating their agreement to participate and comply with the study guidelines.

Participants first viewed a slideshow of the full image pool to establish a common frame of reference. After that, participants began the formal questionnaire. Each participant used a dedicated computer equipped with a 24-inch high-definition monitor. Throughout the experiment, there were no external disturbances, and discussions and interactions were not allowed. Participants answered demographic questions before viewing the images. After that, each image and its questions were displayed on the same page, and participants assessed the images and then answered the questions. No strict time limit was imposed for completing the questionnaire to reduce participant stress and encourage thoughtful responses. To ensure data consistency and reliability, participants were required to complete the survey without interruption. This approach helped ensure that participants maintained a clear memory of the images, thus improving the accuracy of their responses.

3.5. Study 1 Data Analysis

The collected data was anonymized during the preliminary processing stage to ensure participants’ privacy and information security. The collected data was anonymized during the preliminary processing stage to ensure participants’ privacy and information security. Responses were entered directly by participants on laboratory computers. Two research assistants independently exported and verified the raw datasets. We also ran consistency checks. Questionnaires were excluded if (i) any response was missing for any of the rating questions; (ii) the entire questionnaire contained ≥6 consecutive questions with the same score; and (iii) alternating extreme responses: the number of adjacent transitions with |Δ| ≥ 5 exceeded 6 across the questionnaire. We additionally verified that all item responses fell within the valid range and screened for duplicate records based on participant IDs. Cleaned, analysis-ready datasets were stored on a secure server together with a processing log, as shown in the Supplementary Materials.

Pearson correlation analysis was used to conduct preliminary correlation tests. For inferential analyses, we fitted the linear mixed-effects model (LMM) with crossed random intercepts for participants and stimuli, allowing us to account for the non-independence of repeated ratings. In LMM, it is used to examine the strength of influence of each independent variable in explaining the dependent variable. The resulting B coefficients can confirm and verify how the independent variables (unity and variety) affect the dependent variable (aesthetic preference), thus enabling comparison of the strength of the effects between different variables. Reported Pearson correlations served as preliminary, descriptive indices of association, whereas the primary inferential tests of our hypotheses were conducted using the LMM. All analyses were conducted in R (version 4.5.1) using the lme4 package, and significance was evaluated using Satterthwaite-approximated degrees of freedom.

3.6. Study 1 Result

The reliability analysis of the questionnaire, indicated by Cronbach’s Alpha coefficient, is presented in Table 1, showing good data reliability. Additionally, the KMO value is 0.785, and the significance of Bartlett’s Test of Sphericity is 0.000 < 0.01, indicating good data validity.

The results of the participants’ ratings of the scales were summarized, including the mean unity, variety, and aesthetic preference scores for each window, as shown in Figure 2 and Table 2.

Before the analyses, the Pearson correlations and rank-order consistency were calculated to verify the consistency from the stimulus selection to the formal study. Pearson correlations indicated strong correspondence between phases for both unity, r (8) = 0.945, p < 0.001, and variety, r (8) = 0.960, p < 0.001. Rank-order consistency was similarly high (Spearman ρ), with ρ (8) = 0.982, p < 0.001 for unity and ρ (8) = 0.948, p < 0.001 for variety. Together, these results confirm that stimuli retained highly similar consistency across the stimulus selection and the formal study.

Pearson correlation analysis was used as a preliminary test to explore the relationships among the variables, as shown in Table 3. The results showed that aesthetic preference was positively related to unity, r = 0.28, p < 0.001, and weakly but significantly related to variety, r = 0.008, p < 0.001. Unity and variety were moderately and inversely associated, r = −0.38, p < 0.001. When unity was controlled, the partial correlation between aesthetic preference and variety remained significant, r = 0.20, p < 0.001. When variety was controlled, the relationship between aesthetic preference and unity remained significant and increased in magnitude, r = 0.33, p < 0.001. Together, these results indicate that perceived unity and variety each make independent contributions to aesthetic preference, while unity and variety themselves tend to suppress each other’s effects.

The LMM was used to determine how unity and variety together predict aesthetic preference, and unity and variety ratings were standardized. The data set comprised 4010 observations (401 participants × 10 windows), with participants (pid) and window stimuli (stim) modelled as crossed random factors. We began with a model that included random intercepts for participants and stimuli and fixed effects of unity and variety. We then compared this base model to (a) a model with random slopes at the participant level and (b) a model with random slopes at the stimulus level. Model comparison relied on AIC/BIC and likelihood-ratio tests. Stimulus-level random slope models produced singular fits and did not significantly improve model fit (ΔAIC ≈ 4, χ2 (5) = 6.34, p = 0.27). In contrast, the participant-level random slope model substantially improved fit over the random-intercept model (ΔAIC = 151, χ2 (5) = 161.17, p < 0.001) without singularity issues. The residual diagnostics, such as Q–Q and residual-versus-fitted plots, and Cook’s distances, did not reveal violations of model assumptions. We therefore retained the model with random slopes for unity and variety by participants and random intercepts for stimuli as the final specification.

Results (Table 4) revealed significant effects of unity, partial R2 = 0.28, B = 0.40, p < 0.001, and variety, partial R2 = 0.09, B = 0.19, p < 0.001, indicating that both unity and variety significantly and positively predicted aesthetic preference.

3.7. Study 1 Conclusions

The results of Study 1 answered RQ1 by showing that both unity and variety were positively associated with aesthetic preference for individual windows. When the other variable was statistically controlled, each remained a significant positive predictor of preference (partial correlations r = 0.33 for unity and r = 0.20 for variety), indicating that unity and variety make partly independent contributions rather than simply standing in for one another. At the same time, unity and variety were moderately and negatively correlated (r = −0.38), suggesting a trade-off in which windows perceived as more unified tended to be perceived as less varied and vice versa. The LMM confirmed that unity (B = 0.25, Partial R2 = 0.08) had a stronger unique association with aesthetic preference than variety (B = 0.13, Partial R2 = 0.02), implying that, within this set of architectural window components, increases in unity were more consequential for preference than comparable increases in variety. Overall, these findings are in line with the Unity-in-Variety principle: aesthetic preference was higher for windows that combined relatively high unity with some degree of variety, within the limited range of stimuli tested here. Study 2 turns to the overall window–background composition, further examining how background context shapes the relationships between unity, variety, and aesthetic preference.

4. Study 2: Investigation of Window–Background Composition

The purpose of study 2 is to answer the following RQs: To what extent does adding a background impact the perceptions of unity and variety compared with viewing the individual window (RQ 2)? Does the background influence (a) the relationship between perceived unity and perceived variety and (b) the predictive effects of unity and variety on aesthetic preference (RQ 3)? Using questionnaires, study 2 employed a between-subjects design to avoid carryover effects that might arise from repeated evaluations of the same window by the same participant under different conditions. Different from study 1, a within-subjects design might have caused participants to recognize the study’s purpose after viewing the background conditions, leading them to adopt a comparative strategy for rating, weakening the reliability and validity of the ratings. A between-subjects design allows each participant to judge within only one context, ensuring that their aesthetic evaluations are based on a stable reference framework that more closely reflects their actual perceptual experience.

4.1. Study 2 Stimuli Development

Based on the same set of ten Chinese carved windows validated in Study 1, we created two sets of window–background compositions. We placed the same set of RQ 1’s Chinese carved windows into both Chinese backgrounds (consistent with the window) and Western backgrounds (incongruous with the window) to create two compositions, resulting in a total of 20 stimuli, 10 per set.

To ensure consistency and control over confounding variables, both backgrounds shared a similar base structure and material. The difference between them was the design of the eaves, with the Chinese background featuring traditional Chinese eaves and the Western background incorporating Western-style eaves. The intentional approach restricted the stylistic manipulation to the eaves design while keeping other features constant. Eaves are highly diagnostic of traditional Chinese versus Western architectural styles and therefore provide a minimal yet powerful cue for prototype-based style categorization. Moreover, as a quasi-experimental design, it was essential to ensure that the key variable was clearly and feasibly manipulable; focusing the manipulation on the eaves thus helped to maintain experimental control, as shown in Figure 3.

4.2. Study 2 Instrument

Study 2 employed the similar closed-ended questionnaire as Study 1. Participants rated each window–background composition on unity, variety, and aesthetic preference using the nine “pleasing to see” items [58] on 7-point Likert scales (1 = not at all, 7 = very much). Stimuli were presented at a fixed size and viewing angle, and their order was randomized for each participant. A pilot test (n = 40) on June 5, 2025 indicated good internal consistency (all Cronbach’s Alpha ≥ 0.821) and satisfactory construct validity (KMO = 0.816, Bartlett’s test p < 0.01).

4.3. Study 2 Participants

A new sample of participants was recruited from Guangzhou, China, using the same inclusion criteria as in Study 1 (no formal training in art, design, or architecture). Individuals who had participated in Study 1 were excluded to avoid mere exposure effects [62].

In total, 452 individuals took part; after excluding invalid responses (exclusion details same as study 1), the final sample comprised 396 questionnaires. Ages ranged from 20 to 36 years (M = 24.13, SD = 2.39), with a balanced gender distribution (54.3% male, 45.7% female). Participants were randomly assigned to one of two between-subjects conditions: a Chinese-background group (windows embedded in a Chinese background) or a Western-background group (windows embedded in a Western background). All participants provided written informed consent and received monetary compensation.

4.4. Study 2 Data Collection Procedure

The procedure mirrored Study 1 and was conducted in the same computer laboratories under standardized viewing conditions. After reading an information sheet and providing written informed consent, participants were seated at individual workstations and completed the questionnaire corresponding to their assigned background condition (Chinese or Western). Each participant viewed only one set of 10 window–background compositions, rated each stimulus on unity, variety, and aesthetic preference, and completed the task at their own pace without time pressure or interaction with others.

4.5. Study 2 Data Analysis

The data was anonymized; exclusion rules were applied same to study 1. All analyses were estimated in R (version 4.5.1) using the lme4 and lmerTest packages, with statistical significance evaluated using Satterthwaite-approximated degrees of freedom. In summary, LMM models provided the main inferential framework, capturing the hierarchical and repeated-measures structure of the data, while Welch t-tests offered stimulus-level, variance-robust follow-ups that clarified how consistently the background manipulation affected unity and variety across individual windows.

To address RQ2, the LMM models were conducted with crossed random intercepts for participants and stimuli to examine how different backgrounds influenced the perceived unity and variety of the overall compositions. Two models, one for unity and one for variety, assessed whether overall unity and variety ratings differed among the no-background condition (as established in RQ1), the Chinese background (stylistically congruent), and the Western background (stylistically incongruent). Following these analyses, additional follow-up tests were performed. As RQ2 also concerns whether these background effects are consistent for each window, we complemented the LMM models with Welch t-tests for each window across different background conditions to explore window-specific variations. Welch tests were chosen rather than classical Student t-tests because they do not assume equal variances or equal group sizes across background conditions, which is realistic in our between-subjects design. Thus, the LMM models served as the primary, hierarchy-sensitive tests, whereas the Welch t-tests acted as robust follow-up comparisons that unpacked the LMM effects window by window and checked whether the overall patterns were driven by a few stimuli or were broadly distributed across the set.

To address RQ3 (whether background alters the predictive effects of unity and variety on aesthetic preference), we estimated another LMM with aesthetic preference as the dependent variable, unity and variety as continuous fixed effects, and background (no background, Chinese, or Western) as a categorical fixed effect. All two-way interactions between unity, variety, and background were included to test whether the slopes of unity and variety on preference changed across contextual conditions. As in Study 1, crossed random intercepts for participants and stimuli were included to capture individual and stimulus-specific differences. A further LMM was fitted with variety as the dependent variable and unity and background (and their interaction) as fixed effects to quantify how the unity–variety trade-off itself was modulated by the presence and style of the background.

4.6. Study 2 Result

The reliability analysis of the questionnaire, indicated by Cronbach’s Alpha coefficient, is presented in Table 5, showing good data reliability. Additionally, the KMO value is 0.714, and the significance of Bartlett’s Test of Sphericity is 0.000 < 0.01, indicating good data validity.

Windows in a Chinese background score about 0.34 higher on perceived unity than in a Western background on average, while the Western backgrounds raise perceived variety by around 0.37, as expected. In the Chinese-background condition, the three most preferred windows (Windows 1, 8, and 9) all exhibited very high unity (5.13–5.38) while maintaining only middle variety (3.95–4.73). As unity scores fell below approximately 4.75, preference decreased, even when variety remained at a middle level. In the western background, windows with high variety (>5.0) never reached preference scores above 4.9. Instead, the preference maximum (Windows 1, 3, and 6) occurred for windows combining high unity (4.92–5.03) with middle variety (4.15–4.62). Visual inspection of Figure 4 suggested the pattern of high unity with middle variety for aesthetic preference.

After that, the LMM was conducted to examine how backgrounds influenced the perceived unity and variety from the window to the overall compositions. We combined Studies (no-background study 1 and with background study 2) and Background Style (Chinese and Western) into a single three-level factor. Treatment coding (no background = 0 0, Chinese background = 1 0, Western background = 0 1) made the model intercept equal to the mean unity rating for windows shown without the background. This intercept was also treated as the baseline. After that, the average of the two background conditions was compared to this baseline, testing them with the background effect.

We initially specified a random-intercepts model with random intercepts for participants and stimuli. We then compared this base model to (a) a model with random slopes of background at the participant level and (b) a model with random slopes of background at the stimulus level. Although these random-slope models reduced AIC and yielded significant likelihood-ratio tests compared to the random-intercepts model, both produced boundary (singular) fits (with variances collapsing and random-effects correlations approaching ±1), indicating that the additional variance components were not reliably estimable. Following current recommendations for LMM [63,64], we therefore retained the random-intercepts model as the selected specification. Besides, we did not reveal violations of model assumptions in residual diagnostics; therefore, this model was the final model.

Unity ratings were estimated at 4.81 when windows were viewed in isolation. Adding any background significantly reduced perceived unity (B = −0.11, SE = 0.04, t = −2.78, p < 0.01). Follow-up results showed that a Chinese background was insignificantly higher than the no-background baseline (B = 0.06, SE = 0.05, t = 1.18, p = 0.24, partial R2 = 0.002), suggesting a negligible unique contribution to the variance in unity. In contrast, the Western background decreased unity significantly (B = −0.28, SE = 0.05, t = −5.71, p < 0.001, partial R2 = 0.039), accounting for a small but non-trivial effect in magnitude after controlling for other fixed and random effects. Thus, stylistic incongruence (Western background) decreased unity, while the congruent Chinese background led to a small but insignificant unity increase compared to the baseline, as shown in Table 6.

Across the ten windows, the per-window Welch tests showed a pattern broadly consistent with the LMM results when inference was based on BH-FDR-adjusted p values (Table 7). Adding a Western background lowered perceived unity for nine windows (Windows 1–9), and this decrease remained statistically reliable for seven windows (Windows 1, 3, 4, 5, 6, 7, and 8; p < 0.05), indicating a unity-suppressing effect under stylistic incongruence. By contrast, adding a Chinese background increased perceived unity for seven windows (Windows 2, 4, 5, 6, 7, 8, and 10), but none of these increases remained reliable after FDR correction (all p > 0.05), suggesting that any unity enhancement under stylistic congruence was comparatively weak and inconsistent at the stimulus level. In the with-background condition, unity increased for two windows (Windows 2 and 10) but did not survive FDR correction, whereas eight windows decreased, with three reliable decreases (Windows 1, 3, and 7; p < 0.05), as shown in Table 7.

In conclusion, Western backgrounds consistently suppressed unity across most stimuli, whereas Chinese backgrounds generally maintained or slightly enhanced unity; inconsistent background effects may exceed consistent background effects at the level of unity.

The procedure was replicated for variety. For variety ratings, we used the same fixed-effect structure and random intercepts for participants and stimuli as the baseline model. We then compared this base model to (a) a model with random slopes of background at the participant level and (b) a model with random slopes of background at the stimulus level. The model with participant-level random slopes returned a boundary (singular) fit and was therefore discarded. In contrast, the model with stimulus-level random slopes substantially improved model fit over the random-intercepts model (ΔAIC ≈ 146; χ2 (5) = 155.69, p < 0.001) and did not exhibit singularity. We did not reveal violations of model assumptions in residual diagnostics. We therefore adopted the model with random intercepts for participants and random intercepts plus random slopes for stimuli as the final specification for variety.

Variety ratings were estimated at 4.62 when windows were viewed in isolation. Adding any background insignificantly increased perceived variety (B = 0.02, SE = 0.04, t = 0.54, p = 0.59). Follow-up results showed that the Chinese background was significantly lower than the no-background baseline (B = −0.16, SE = 0.05, t = −3.12, p = 0.002, partial R2 = 0.012), indicating a small but non-negligible unique effect on perceived variety. In contrast, the Western background increased variety significantly (B = 0.21, SE = 0.05, t = 4.00, p < 0.001, partial R2 = 0.020), suggesting a small yet reliable increase. Thus, stylistic incongruence Western background increased variety significantly, while the congruent Chinese background decreased variety significantly, as shown in Table 8.

Across the ten windows, the per-window Welch tests showed a pattern broadly consistent with the LMM results when inference was based on BH-FDR-adjusted p values (Table 9). Under the Chinese background, perceived variety decreased for eight windows (Windows 2, 4–10) and remained statistically reliable for six of them (Windows 2, 4, 5, 6, 7, and 10; p < 0.05). Variety increased for two windows (Windows 1 and 3), but only Window 1 showed a reliable increase after FDR correction (p < 0.05). Under the Western background, variety increased for eight windows (Windows 1, 4–10) and remained reliable for four windows (Windows 4, 5, and 10; p < 0.001, and Window 7; p < 0.05), whereas the apparent increases for Windows 1 and 8 did not survive FDR correction. Variety decreased for two windows (Windows 2 and 3), with only Window 3 showing a reliable decrease (p < 0.001). In the with-background condition, variety decreased for six windows (Windows 2, 3, 6–9), but only Window 7 remained reliably lower after FDR correction (p < 0.05). By contrast, variety increased for four windows (Windows 1, 4, 5, and 10), and this increase remained reliable for Windows 1 and 10 (p < 0.05). As shown in Table 9.

In conclusion, Chinese backgrounds decreased variety, whereas Western backgrounds increased it, mirroring the direction and magnitude of the LMM coefficients.

The following steps focused on the composition to examine the effects of unity, variety, and background at the composition level. We analyzed aesthetic preference ratings using LMM with unity and variety as continuous predictors and background style as a three-level factor (no background, Chinese background, and Western background). Unity and variety ratings were standardized before analysis. The fixed-effects structure included main effects of Unity and Variety, the three-level factor of Background (treatment-coded with no background as the baseline), and all two-way interactions between Unity and Variety with Background. We began with a random-intercepts model, including random intercepts for participants and stimuli. We then compared this baseline specification to (a) a model with random slopes of Unity and Variety at the participant level and (b) a model with random slopes of Unity and Variety at the stimulus level. Relative to the random-intercepts model, the participant-level random-slope model substantially improved fit (ΔAIC ≈ 235; χ2(5) = 245.02, p < 0.001) without evidence of singularity, indicating reliable between-participant variability in the slopes of unity and variety. The stimulus-level random-slope model yielded only a small improvement in AIC (ΔAIC ≈ 7.66; χ2(5) = 17.66, p = 0.003) at the cost of additional model complexity. We did not reveal violations of model assumptions in residual diagnostics. We therefore retained the model with random intercepts for stimuli and random intercepts plus random slopes for participants as the final specification.

In the no-background situation, unity showed a significantly positive relation with aesthetic preference (B = 0.36, SE = 0.02, t = 20.50, p < 0.001, partial R2 = 0.054), indicating a small-to-moderate unique effect: a one-standard-deviation increase in unity was associated with a 0.36-point increase in aesthetic preference and accounted for about 5% of the residual variance. Variety also showed a significantly positive relation with aesthetic preference (B = 0.21, SE = 0.02, t = 11.61, p < 0.001, partial R2 = 0.019), representing a smaller but still meaningful unique effect on preference. The Chinese background had a small positive effect on preference (B = 0.09, SE = 0.04, t = 1.97, p = 0.048, partial R2 = 0.005), whereas the Western background showed a small negative effect (B = −0.10, SE = 0.04, t = −2.29, p = 0.022, partial R2 = 0.007).

All interaction terms were significant and negative, but their partial R2 values were very small (≤0.001), indicating subtle yet reliable effects. The interactions between unity and the Chinese (B = −0.13, SE = 0.04, t = −3.40, p < 0.001) and Western (B = −0.27, SE = 0.04, t = −6.46, p < 0.001) backgrounds indicated that the unity effect was weakened relative to the no-background situation. Likewise, the interactions between variety and the Chinese (B = −0.08, SE = 0.04, t = −2.17, p = 0.029) and Western (B = −0.12, SE = 0.04, t = −3.08, p = 0.002) backgrounds showed that the variety effect was also weakened relative to no background. In conclusion, unity exerted the strongest and most practically meaningful influence on aesthetic preference, variety contributed to a smaller yet non-trivial effect, and background mainly acted to attenuate these positive unity and variety effects at the composition level, as shown in Table 10.

Although all interaction terms between unity/variety and background reached statistical significance, their partial R2 values were very small (all ≤0.01). This pattern indicates that it slightly attenuates or amplifies the positive effects of unity and variety on preference but does not overturn the overall dominance of unity (and, to a lesser extent, variety) in predicting aesthetic preference. Given the large sample size and the relatively constrained manipulation of background (two stylized roof types applied to the same window stimuli), such small but reliable interaction effects are methodologically plausible and theoretically consistent with the view that stylistic context fine-tunes, rather than replaces, the core unity–variety mechanism.

The LMM was conducted to examine how unity influenced variety, with crossed random intercepts for participants and stimuli. We then compared this baseline specification with more complex models that additionally allowed the unity slope to vary across participants and/or across stimuli. Model comparison based on AIC/BIC and likelihood-ratio tests indicated that these random-slope extensions either failed to provide a meaningful improvement in fit or led to boundary (singular) solutions. Following current recommendations for multilevel modelling [63,64], we therefore retained the random-intercepts model as the selected specification. We did not reveal violations of model assumptions in residual diagnostics. Consequently, we retained this model as the final specification.

Variety ratings were first examined as a function of unity and background. In the no-background condition, higher unity was strongly associated with lower perceived variety (B = −0.62, SE = 0.02, t = −36.76, p < 0.001, partial R2 = 0.149), indicating that unity alone accounted for about 15% of the variance in variety, a medium-to-large unique effect. Two significant interaction terms showed that this negative unity–variety relation was clearly influenced by background. In the Chinese-background condition, the unity × Chinese interaction was positive, indicating a modest but reliable moderating effect (B = 0.65, SE = 0.04, t = 15.62, p < 0.001, partial R2 = 0.032), meaning that the strong negative slope of unity on variety observed with no background was weakened in the presence of a congruent Chinese background. The unity × Western interaction was also positive, though slightly smaller in magnitude (B = 0.57, SE = 0.04, t = 15.20, p < 0.001, partial R2 = 0.021), indicating that an incongruent Western background likewise attenuated the opposition between unity and variety, reducing the steep negative slope to a much weaker level. Together, these findings show that while unity is a strong negative predictor of perceived variety in isolation, background context reliably softens this opposition, especially under the stylistically congruent Chinese background (Table 11).

4.7. Study 2 Conclusions

Study 2 showed that adding a stylistic background primarily reshaped the balance between unity and variety while preserving their core roles in aesthetic preference. Relative to windows viewed in isolation, stylistically incongruent Western backgrounds reliably suppressed perceived unity, whereas stylistically congruent Chinese backgrounds generally maintained or only slightly enhanced it. At the same time, Chinese backgrounds significantly decreased perceived variety, whereas Western backgrounds significantly increased it, a pattern that was consistent in both the LMM estimates and the Welch t-tests across most individual windows.

At the composition level, unity and variety continued to exert positive, independent effects on aesthetic preference when backgrounds were added. However, both Chinese and Western backgrounds slightly weakened these predictive effects, as indicated by the significant but small interaction terms between unity/variety and background. Direct background effects on preference were modest: Chinese backgrounds produced a small increase, whereas Western backgrounds yielded a small decrease.

Finally, unity and variety remained negatively related when backgrounds were present, but this opposition was substantially attenuated compared with the no-background condition. In isolation, higher unity strongly predicted lower perceived variety; under both Chinese and Western backgrounds, this negative slope was markedly reduced, indicating that embedding windows within architectural compositions softens the trade-off between unity and variety.

5. Discussion

5.1. Unity, Variety, and Aesthetic Preference of Individual Windows

In line with previous studies on products and simple visual patterns [1,8,11], Study 1 showed that both unity and variety were positively associated with aesthetic preference for the ten Chinese carved windows viewed in isolation. When unity and variety were entered simultaneously in the LMM model, each predictor made a statistically significant unique contribution to preference after controlling for the other. At the same time, unity emerged as the more robust predictor: its partial R2 lay in the small-to-moderate range, whereas variety accounted for a comparatively small but non-negligible portion of unique variance. Thus, within this restricted stimulus set, increases in unity were more strongly linked to higher preference than comparable increases in variety.

Study 1 showed that both unity and variety significantly contributed to participants’ aesthetic evaluations of individual windows. When entered simultaneously into the LMM model, both predictors remained significant, but the impact of unity was clearly stronger (B = 0.40 vs. B = 0.19). This pattern indicates that, within this restricted set of ten Chinese carved windows, viewers placed greater weight on unity than on variety when judging aesthetic preference. Put differently, for these element-level window patterns, unity appears earlier and variety later: designs that first secure a high degree of unity and then balance variety around that structure tended to be aesthetically preferred.

One plausible interpretation, consistent with prototype-based accounts of aesthetic preference [39], is that observers favor windows that remain close to a culturally and perceptually familiar configuration. In our Chinese sample, such prototypes may emphasize symmetry, repetition, and structural coherence. At the same time, we did not directly measure participants’ mental prototypes or elicit their explicit expectations about Chinese windows, so this remains a theoretical interpretation rather than a direct test of prototype representations. Future work could more rigorously examine this account by asking participants to generate or rate prototypical windows and relating these judgments to unity–variety manipulations.

The stronger role of unity is also consistent with processing fluency accounts, which propose that stimuli that are easier to organize and interpret tend to elicit more positive aesthetic responses [2,32]. Unity designs typically involve symmetry, orderly repetition, and consistent structural relations, all of which are likely to facilitate fluent perceptual processing and reduce cognitive load [2]. These fluency-enhancing qualities can in turn evoke positive effects, making unity an important predictor of aesthetic preference. However, we did not obtain direct measures of processing fluency or arousal in this study; our reference to fluency is therefore an interpretative framework for understanding the pattern of coefficients rather than an empirically tested mechanism.

By contrast, although variety introduces visual interest and novelty, its contribution appears more context dependent. Berlyne’s theory of collative variables [7] suggests that moderate levels of novelty, complexity, and variety enhance arousal and aesthetic pleasure, whereas very low or very high levels may be less preferred. The relatively weaker predictive power of variety in our data may reflect the specific nature of the stimuli: for carved windows whose dominant prototypes are likely to be symmetrical and unified, excessive variety can easily disrupt perceived order and reduce preference. In this sense, variety seems to function best as a secondary refinement that enriches an already coherent structure, rather than as a primary design driver.

Unity and variety were negatively correlated, implying a trade-off between these dimensions: further increases in one tended to coincide with decreases in the other. This trade-off partly reflects the way unity and variety were operationalized in the stimulus design (e.g., manipulations that increase global regularity often reduce local diversity), but it is also consistent with cognitive accounts suggesting that observers perceive highly regular configurations as less varied. We therefore interpret the strength of the negative correlation cautiously, recognizing both the structural constraints of the stimuli and the perceptual tendencies of observers.

This emphasis on unity over variety echoes broader findings in environmental aesthetics and design cognition, where coherence, balance, and compositional harmony are repeatedly linked to higher aesthetic judgments in architectural and product contexts [4,8,58]. Our contribution is to show that a similar pattern emerges even for highly simplified, grayscale line drawings of façade ornaments: when unity and variety are disentangled and modelled simultaneously, unity retains a stronger association with preference. At the same time, these conclusions should be interpreted cautiously. They arise from a deliberately limited set of ten digitally rendered window patterns and do not yet encompass the full richness of materials, color, lighting, and spatial depth in real architectural experience. Subsequent sections, therefore, consider how these unity–variety relationships change once the windows are integrated into simplified backgrounds.

5.2. Differences in Unity and Variety Perceptions

The LMM results for Study 2 supported the proposed variety compression effects when windows were embedded in different backgrounds. Relative to the no-background baseline, placing the same ten windows in a stylistically congruent Chinese background significantly reduced perceived variety while leaving unity largely unchanged. By contrast, embedding the same windows in a stylistically incongruent Western background significantly increased perceived variety and decreased unity. These background effects were echoed by Welch t-tests conducted at the level of individual windows, which showed that, for most stimuli, Chinese backgrounds compressed variety towards the mid-range, whereas Western backgrounds reliably inflated variety and suppressed unity.

One plausible interpretation, consistent with prototype-based accounts of aesthetic judgment [15,65], is that backgrounds modulate how easily observers can assimilate window patterns into a familiar stylistic schema. The Chinese background likely activates a Chinese architectural schema (prototype), making the carved window appear as a natural continuation of that style. Under such congruent conditions, heterogeneous details can be perceptually absorbed into the overall schema, preserving unity and compressing perceived variety into a tolerable range. By contrast, the Western background activates a different schema (prototype); combining a Chinese-style window with a Western-style eaves profile introduces a style inconsistency that raises processing costs and weakens perceptual fluency, thereby decreasing unity and amplifying perceived variety. Notably, the effect of incongruence (unity decrease and variety increase) was stronger and more consistent than the modest unity gain observed under stylistic congruence. This asymmetry suggests that architectural coherence judgments may be more sensitive to violations of prototypical expectations than to their fulfillment, aligning with fluency accounts that emphasize the affective cost of disfluency [2]. At the same time, these interpretations remain theoretical: we did not directly measure mental prototypes or perceptual fluency in this study, and each background condition was represented by a single background exemplar. Consequently, we cannot rule out the possibility that some of these effects reflect idiosyncratic properties of the particular background designs rather than style-level differences alone. Future work should therefore employ multiple background exemplars per style and include more direct measures of prototypicality and fluency.

Although the aggregate results reveal systematic differences between congruent and incongruent backgrounds, a small number of windows deviated from these general trends. These reversals can be clarified by drawing on Tversky’s feature comparison framework [41], which distinguishes between diagnostic features—those that are most informative for category membership or stylistic identity—and intensive features—those that are especially salient or attention-grabbing [41]. In our stimuli, diagnostic features can be thought of as the cues that signal “Chinese” versus “Western” architecture (e.g., eaves profile), whereas intensive features include motifs that strongly attract attention because of their exceptional complexity, prominence, or contrast. Once category membership is identified, the brain rapidly activates corresponding knowledge and uses it to structure subsequent evaluations [39,41]. However, when intensive features are sufficiently strong, they can partially override or modulate the influence of diagnostic features by redirecting attentional weight. For example, in the reversed case discussed in this paper (such as the eight-petal flower pattern in Window 3), a highly striking motif may become the perceptual anchor. When such a window is placed on either background, the intense local feature can dominate attention, making the background style less influential and pulling unity and variety ratings closer to the baseline pattern for that window.

It is also important to recognize the limited scope of the present background manipulation. To isolate stylistic congruence and incongruence, we restricted the difference between Chinese and Western façades to the eaves design while holding other façade attributes constant, and we rendered all window–background compositions as grayscale line drawings. This quasi-experimental approach enhances internal validity by focusing on a single, highly diagnostic stylistic cue and eliminating potential confounds from color, material, and lighting. At the same time, it necessarily simplifies the richness of real architectural experience. As such, the current findings speak primarily to how simplified, two-dimensional window–background compositions guide unity and variety judgments in our stimuli. They do not yet capture the full range of contextual influences present in real buildings, such as material contrast, natural illumination, depth, or spatial sequence. The mechanisms discussed above, prototype-consistent congruence, disfluency under incongruence, and occasional dominance of intensive features, should therefore be treated as tentative explanations that require further testing with more diverse, ecologically rich architectural stimuli.

5.3. From Isolated Windows to Simple Window–Background Compositions

Taken together, the two studies provide a coherent yet qualified picture of how unity and variety shape aesthetic preference for architectural components under controlled conditions. At the element level, Study 1 showed that both unity and variety made independent, positive contributions to preference for carved window patterns, with unity emerging as the clearly stronger predictor. Within this restricted stimulus set, the most preferred windows combined high unity with some degree of variety, suggesting that viewers valued a robust structural order in which diversity is introduced only to a limited extent. At the composition level, Study 2 demonstrated that embedding the same windows into different background contexts systematically reshaped unity and variety perceptions. Stylistically congruent Chinese backgrounds compressed perceived variety while broadly preserving unity, whereas stylistically incongruent Western backgrounds reliably decreased unity and increased variety. Yet even in these more complex configurations, unity and variety remained positive, partly independent predictors of preference, although their predictive strength was slightly attenuated once backgrounds were added.

These findings are broadly consistent with the UiV principle [8] but suggest a more nuanced formulation for architectural ornaments. Rather than identifying a single global optimal point on a continuous unity–variety axis, our data indicate that, for simplified carved windows, preference tends to be highest when unity is firmly established and variety is introduced as a secondary refinement. In this sense, unity functions as a primary structural condition that anchors the configuration, while variety enriches the design as long as it does not undermine coherence. The result that the negative association between unity and variety was weaker in the window–background compositions than in isolated windows further suggests that architectural context can partially relax the apparent trade-off between these properties. Integrating elements into a broader background may allow observers to tolerate more heterogeneity at the element level, provided that the overall composition retains an intelligible order. This resonates with recent proposals that architecture achieves variety within the prototype by embedding local deviations in a recognizable global schema [12,39,40].

Overall, the pattern of results is compatible with prototype-based and processing-fluency explanations, but it does not establish these mechanisms because typicality/prototypicality and fluency were not directly measured. The stronger role of unity across both studies is compatible with the idea that observers prefer stimuli that lie closer to an internalized schema for what a Chinese carved window normally looks like [39], with such prototypes likely emphasizing symmetry, repetition, and structural coherence. Similarly, the asymmetry between stylistic congruence and incongruence, where incongruent combinations reliably reduced unity and inflated variety, whereas congruent combinations produced only modest unity gains, fits processing fluency accounts that emphasize the affective cost of disfluency and expectation violations [2,31]. At the same time, our use of Tversky’s distinction between diagnostic and intensive features [41] to interpret a few anomalous stimuli illustrates how attention may occasionally be captured by highly salient local motifs, which can partially override background-based style signals. These theoretical framings provide a useful way to organize the current findings, but they remain interpretative; we did not directly measure mental prototypes, processing fluency, or feature salience in the present studies.

Beyond their theoretical relevance, the results also have preliminary implications for design thinking about architectural ornaments. For element-level patterns such as carved windows, the data suggest that securing a clear, coherent geometric framework may be more consequential for preference than aggressively increasing variety. Variety still has a role in differentiating motifs and avoiding monotony, but its benefits appear contingent on being calibrated within a stable organizational grid. Once integrated into backgrounds, the stylistic coherence between elements and backgrounds emerges as another important consideration. In our simplified stimuli, stylistic incongruence tended to suppress unity and inflate perceived variety, which, in turn, weakened preference. Designers who wish to explore contrast or hybridization may therefore need to balance stylistic departures against the risk of diminished coherence, for instance by anchoring unconventional elements within an otherwise familiar compositional context.

These implications should nonetheless be regarded as tentative and context-bound. They arise from a deliberately simplified set of grayscale, line-based window and façade configurations and from a single cultural sample. The next section summarizes the contributions of the present research, outlines its main limitations, and points towards directions for future work that are needed to develop a more comprehensive and ecologically grounded account of unity and variety in architecture.

6. Conclusions and Limitations

Two quasi-experimental studies were conducted using digitally rendered Chinese carved windows as stimuli. Study 1 focused on individual windows viewed in isolation and showed that both unity and variety positively predicted aesthetic preference, with unity being the stronger predictor. Study 2 embedded the same windows into stylistically congruent Chinese and stylistically incongruent Western backgrounds. It demonstrated that backgrounds systematically reshaped perceived unity and variety, Chinese backgrounds compressed perceived variety while broadly maintaining unity, whereas Western backgrounds decreased unity and increased variety, and that unity and variety continued to have positive, partly independent associations with preference, although with somewhat attenuated effect sizes.

Taken together, these studies contribute to empirical aesthetics and architectural theory in several ways. First, they extend discussions of Unity in Variety from products and abstract patterns to architectural ornaments, showing that unity and variety can be experimentally disentangled and modelled simultaneously in a quasi-architectural domain. Second, the results suggest that, at least for these simplified carved-window stimuli, unity tended to play a primary role while variety played a secondary role; some forms of variety may be tolerated when overall coherence is maintained (a pattern compatible with variety within a prototype, but not a direct test of prototypicality). Third, the results highlight the importance of context: the same windows yield different unity–variety profiles and predictive strengths when embedded in congruent versus incongruent façades, underlining that the evaluation of architectural elements cannot be fully understood without considering their compositional setting.

At the same time, several limitations constrain the generalizability of these conclusions and point to clear directions for future research. Most importantly, the stimulus set was deliberately narrow: ten window designs and two background exemplars (one Chinese and one Western), all presented as grayscale line drawings. This design maximized control over unity- and variety-related features and isolated a single highly diagnostic stylistic cue (the eaves profile), but it also stripped away core components of architectural experience such as materiality, color, lighting, and depth. The present findings, therefore, speak primarily to how observers evaluate highly controlled two-dimensional representations of façade ornaments under laboratory conditions; they cannot be straightforwardly extrapolated to full façades, interior spaces, or real-world environments. Future studies should employ larger and more diverse sets of windows, multiple background exemplars per style, and more ecologically rich stimuli, such as color photographs, high-fidelity renderings, or immersive environments, to test whether similar unity–variety dynamics emerge in more realistic settings.

Methodologically, unity and variety were manipulated through design decisions and validated via subjective ratings, but we did not incorporate objective computational measures of visual structure. Nor did we directly measure mental prototypes, processing fluency, or feature salience. As a result, the links drawn to prototype theory, processing-fluency accounts, and Tversky’s feature comparison framework remain interpretative rather than conclusively demonstrated. Incorporating image-based metrics (e.g., symmetry indices and fractal dimension), explicit typicality ratings, response time or fluency measures, and physiological indicators would help to strengthen construct validity and clarify the cognitive mechanisms underlying unity–variety judgments.

The sample characteristics also constrain the generalizability of our findings. All participants were young Chinese adults (19–40 years) residing in Guangzhou and without formal training in art, design, or architecture. This relatively homogeneous group was deliberately selected to minimize confounds due to expertise and to ensure familiarity with Chinese carved windows in everyday life. Nevertheless, aesthetic prototypes and stylistic expectations are likely shaped by cultural background, age cohort, and professional experience. The present results should therefore be interpreted as reflecting unity–variety judgments in this specific cultural group, rather than universal preferences. Future work should test cross-cultural studies comparing observers from diverse regions, as well as comparisons between experts and laypersons and across age groups, which would be valuable to determine how general the present patterns are.

Finally, the effect sizes observed in the present research warrant a cautious interpretation. Partial R2 values indicated that unity accounted for a modest portion of variance in aesthetic preference, with variety and background effects contributing smaller yet reliable amounts. These are meaningful but not large effects, reminding us that aesthetic judgments are multiply determined and influenced by many factors beyond unity and variety, including personal taste, prior experience, and situational context. The contribution of this work is therefore not to offer a deterministic formula for beauty in architecture, but to demonstrate, under carefully constrained conditions, that unity and variety have separable and systematic associations with preference and that these associations are modulated by architectural context.

In conclusion, the present studies provide an initial empirical step toward a more fine-grained understanding of how unity and variety operate in architectural aesthetics across levels of analysis and contextual conditions. They suggest that, for simplified carved windows and minimalistic façades, unity tends to serve as a structural anchor while variety functions as a controlled source of visual interest and that stylistic congruence between elements and backgrounds helps maintain this balance. Building on these tentative insights with richer stimuli, more diverse samples, objective visual metrics, and direct measures of cognitive mechanisms will be essential for developing a more comprehensive and ecologically grounded theory of unity and variety in architecture.

Author Contributions

Conceptualization, S.C., A.W., S.N.S. and Y.H.; methodology, S.C., Z.A.Z. and Y.H.; software, S.C.; validation, Y.W., Y.H. and S.C.; formal analysis, Y.C.; investigation, S.C. and Y.H.; resources, S.N.S.; data curation, S.C.; writing—original draft preparation, S.C. and Y.H.; writing—review and editing, S.N.S. and A.W.; visualization, S.C.; supervision, Z.A.Z.; project administration, Z.A.Z. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

All materials and methods were carried out in accordance with the guidelines and regulations and were approved by the Jawatankuasa Etika Penyelidikan Manusia Universiti Sains Malaysia (JEPeM-USM) on 16 May 2025, with the assigned study protocol code USM/JEPeM/PP/24090829. This ethical approval is valid from 16 May 2025 until 15 May 2026. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors sincerely thank all individuals who contributed to the data collection process and provided valuable support during the study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UiVUnity in Variety
LMMLinear Mixed-effects Model

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 The final set of stimuli.

View Image -

Figure 2 The Ratings of Mean Unity, Variety, and Aesthetic Preference for Each Stimulus (Sorted by Unity).

View Image -

Figure 3 The sample of stimuli. (Left) Chinese background; (Right) Western background.

View Image -

Figure 4 The Ratings of Mean Unity, Variety, and Aesthetic Preference for Each Stimulus in Chinese and Western Backgrounds (Sorted by Unity).

View Image -

Reliability test result of Study 1.

Scale Cronbach’s Alpha Number of Items
Unity 0.952 3
Variety 0.944 3
Aesthetic preference 0.931 3

Descriptive statistics for unity, variety, and aesthetic preference.

Window Unity Variety Aesthetic Preference
1 5.42 ± 1.29 3.90 ± 1.62 5.15 ± 1.34
2 4.18 ± 1.67 5.31 ± 1.42 4.62 ± 1.42
3 4.92 ± 1.53 5.02 ± 1.42 4.86 ± 1.36
4 4.65 ± 1.84 4.61 ± 1.72 4.67 ± 1.49
5 4.74 ± 1.91 4.46 ± 1.95 4.69 ± 1.47
6 5.21 ± 1.38 4.38 ± 1.59 4.96 ± 1.48
7 4.54 ± 1.70 5.11 ± 1.49 4.79 ± 1.57
8 5.26 ± 1.49 4.22 ± 1.75 4.85 ± 1.13
9 5.13 ± 1.73 4.29 ± 1.81 4.93 ± 1.48
10 4.02 ± 1.84 4.91 ± 1.29 4.62 ± 1.55

Result for Correlation Tests.

Control Variable Variable Pair r p
None AP–Unity 0.28 *** <0.001
AP–Variety 0.08 *** <0.001
Unity–Variety −0.38 *** <0.001
Unity AP–Variety 0.20 *** <0.001
Variety AP–Unity 0.33 *** <0.001

AP = Aesthetic Preference. Cells contain zero-order (Pearson) correlations and p-values are unadjusted for repeated measures. *** p < 0.001.

Summary of the model.

Predictor B SE 95% CI for B t Partial R2 p
Intercept 4.80 *** 0.05 [4.71, 4.89] 105.06 none <0.001
Unity 0.40 *** 0.03 [0.34, 0.46] 12.38 0.28 <0.001
Variety 0.19 *** 0.03 [0.13, 0.25] 6.23 0.09 <0.001

B indicates Standardized Beta, SE indicates Standard Error for the Standardized Beta, and CI indicates Confidence Interval. *** p < 0.001.

Reliability test result of Study 2.

Scale Cronbach’s Alpha Number of Items
Unity 0.900 3
Variety 0.860 3
Aesthetic preference 0.873 3

Summary for Unity.

Predictor B SE 95% CI for B t Partial R2 p
Intercept (Baseline)No background 4.81 *** 0.13 [4.52, 5.10] 36.76 none <0.001
Chinese background 0.06 0.05 [−0.04, 0.16] 1.18 0.002 0.24
Western background −0.28 *** 0.05 [−0.38, −0.19] −5.71 0.039 <0.001
Planned contrast
With background −0.11 ** 0.04 [−0.19, −0.03] −2.78 none <0.01

B indicates Unstandardized Beta, SE indicates Standard Error for the Unstandardized Beta, and CI indicates Confidence Interval. ** p < 0.01, *** p < 0.001.

Welch t-Tests for Each Window for Unity Difference.

Window Chinese (t, p) Western (t, p) With Background (t, p)
1 −0.05 (–0.50, 0.886) −0.39 *** (−3.65, <0.001) −0.22 * (−2.56, <0.049)
2 0.21 (2.04, 0.208) −0.13 (−1.12, 0.265) 0.04 (0.41, 0.680)
3 −0.13 (−1.39, 0.546) −0.42 *** (−4.33, <0.001) −0.28 * (−3.18, 0.015)
4 0.10 (0.84, 0.808) −0.49 *** (−4.60, <0.001) −0.20 (−1.93, 0.109)
5 0.03 (0.26, 0.984) −0.34 ** (−3.04, 0.004) −0.16 (−1.51, 0.187)
6 0.01 (0.02, 0.984) −0.30 ** (−3.31, <0.002) −0.15 (−1.77, 0.129)
7 0.06 (0.54, 0.886) −0.54 *** (−4.91, <0.001) −0.24 * (−2.45, 0.049)
8 0.11 (0.87, 0.808) −0.26 ** (−2.73, 0.009) −0.07 (−0.78, 0.499)
9 −0.01 (−0.02, 0.984) −0.15 (−1.40, 0.171) −0.08 (−0.76, 0.499)
10 0.26 (2.27, 0.208) 0.19 (1.78, 0.09) 0.23 (2.22, 0.066)

Note. p-values were adjusted for multiple comparisons using the Benjamini–Hochberg FDR procedure within each column (10 stimuli per contrast). * p < 0.05, ** p < 0.01, *** p < 0.001.

Summary of linear mixed-effects model for Variety.

Predictor B SE 95% CI for B t Partial R2 p
Intercept (Baseline)No background 4.62 *** 0.13 [4.35, 4.89] 38.27 none <0.001
Chinese background −0.16 ** 0.05 [−0.27, −0.06] −3.12 0.012 0.002
Western background 0.21 *** 0.05 [0.11, 0.31] 4.00 0.020 <0.001
Planned contrast
With background 0.02 0.04 [−0.06, 0.11] 0.54 none 0.59

B indicates Unstandardized Beta, SE indicates Standard Error for the Unstandardized Beta, and CI indicates Confidence Interval. ** p < 0.01, *** p < 0.001.

Welch t-Tests for Each Window for Variety Difference.

Window Chinese (t, p) Western (t, p) With Background (t, p)
1 0.83 *** (5.97, <0.001) 0.25 (2.10, 0.061) 0.54 *** (4.95, <0.001)
2 −0.29 ** (−3.10, 0.004) −0.06 (−0.65, 0.586) −0.17 (−2.10, 0.090)
3 0.09 (0.92, 0.357) −0.40 *** (−4.37, <0.001) −0.15 (−1.78, 0.151)
4 −0.32 ** (−2.89, 0.007) 0.45 *** (4.44, <0.001) 0.06 (0.65, 0.571)
5 −0.28 * (−2.26, 0.035) 0.55 *** (4.99, <0.001) 0.14 (1.27, 0.342)
6 −0.32 ** (−3.08, 0.004) 0.15 (1.30, 0.275) −0.09 (−0.90, 0.458)
7 −0.54 *** (−5.75, <0.001) 0.06 (0.63, 0.586) −0.24 * (−2.67, 0.026)
8 −0.26 (−1.89, 0.068) 0.25 (2.23, 0.053) −0.01 (−0.08, 0.934)
9 −0.24 (−1.88, 0.068) 0.02 (0.22, 0.824) −0.11 (−1.05, 0.421)
10 −0.31 ** (−3.46, 0.002) 0.82 *** (7.28, <0.001) 0.26 * (2.88, 0.021)

Note. p-values were adjusted for multiple comparisons using the Benjamini–Hochberg FDR procedure within each column (10 stimuli per contrast). * p < 0.05, ** p < 0.01, *** p < 0.001.

Summary of linear mixed-effects model.

Predictor B SE 95% CI for B t Partial R2 p
Intercept 4.80 *** 0.05 [4.69, 4.91] 96.36 none <0.001
Unity 0.36 *** 0.02 [0.33, 0.40] 20.50 0.054 <0.001
Variety 0.21 *** 0.02 [0.18, 0.25] 11.61 0.019 <0.001
Chinese background 0.09 * 0.04 [0.00, 0.18] 1.97 0.005 0.048
Western background −0.10 * 0.04 [−0.19, −0.01] −2.29 0.007 0.022
Unity × Chinese background −0.13 *** 0.04 [−0.21, −0.06] −3.40 0.001 <0.001
Variety × Chinese background −0.08 * 0.04 [−0.15, −0.01] −2.17 0.001 0.029
Unity × Western background −0.27 *** 0.04 [−0.36, −0.19] −6.46 0.001 <0.001
Variety × Western background −0.12 ** 0.04 [−0.20, −0.04] −3.08 0.001 0.002

CI indicates Confidence Interval. * p < 0.05, ** p < 0.01, *** p < 0.001.

Summary of linear mixed-effects model of Variety.

Predictor B SE 95% CI for B t Partial R2 p
Intercept 4.64 *** 0.10 [4.42, 4.87] 45.80 none <0.001
Unity −0.62 *** 0.02 [−0.66, −0.59] −36.76 0.149 <0.001
Unity × Chinese background 0.65 *** 0.04 [0.57, 0.73] 15.62 0.032 <0.001
Unity × Western background 0.57 *** 0.04 [0.48, 0.65] 15.20 0.021 <0.001

CI indicates Confidence Interval. *** p < 0.001.

Supplementary Materials

The following supporting information can be downloaded at https://osf.io/2vjbp/overview?view_only=ad94d2deacd94ab3ae69716747cfba7b (accessed on 14 December 2025).

References

1. Berghman, M.; Hekkert, P. Towards a Unified Model of Aesthetic Pleasure in Design. New Ideas Psychol.; 2017; 47, pp. 136-144. [DOI: https://dx.doi.org/10.1016/j.newideapsych.2017.03.004]

2. Reber, R.; Schwarz, N.; Winkielman, P. Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver’s Processing Experience?. Personal. Soc. Psychol. Rev.; 2004; 8, pp. 364-382. [DOI: https://dx.doi.org/10.1207/s15327957pspr0804_3] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15582859]

3. Zajonc, R.B. Feeling and Thinking: Preferences Need No Inferences. Am. Psychol.; 1980; 35, pp. 151-175. [DOI: https://dx.doi.org/10.1037/0003-066X.35.2.151]

4. Vartanian, O.; Navarrete, G.; Chatterjee, A.; Fich, L.B.; Leder, H.; Modroño, C.; Nadal, M.; Rostrup, N.; Skov, M. Impact of Contour on Aesthetic Judgments and Approach-Avoidance Decisions in Architecture. Proc. Natl. Acad. Sci. USA; 2013; 110, pp. 10446-10453. [DOI: https://dx.doi.org/10.1073/pnas.1301227110]

5. Shi, A.; Huo, F.; Hou, G. Effects of Design Aesthetics on the Perceived Value of a Product. Front. Psychol.; 2021; 12, 670800. [DOI: https://dx.doi.org/10.3389/fpsyg.2021.670800]

6. Arnheim, R.; Moles, A. Information Theory and Esthetic Perception. J. Aesthet. Art Crit.; 1968; 26, 552. [DOI: https://dx.doi.org/10.2307/428335]

7. Berlyne, D.E. Novelty, Complexity, and Hedonic Value. Percept. Psychophys.; 1970; 8, pp. 279-286. [DOI: https://dx.doi.org/10.3758/BF03212593]

8. Post, R.A.G.; Blijlevens, J.; Hekkert, P. ‘To Preserve Unity While Almost Allowing for Chaos’: Testing the Aesthetic Principle of Unity-in-Variety in Product Design. Acta Psychol.; 2016; 163, pp. 142-152. [DOI: https://dx.doi.org/10.1016/j.actpsy.2015.11.013]

9. Berlyne, D.E.; Boudewijns, W.J. Hedonic Effects of Uniformity in Variety. Can. J. Psychol. Rev. Can. Psychol.; 1971; 25, pp. 195-206. [DOI: https://dx.doi.org/10.1037/h0082381]

10. Cupchik, G.C.; Gebotys, R. The Experience of Time, Pleasure, and Interest during Aesthetic Episodes. Empir. Stud. Arts; 1988; 6, pp. 1-12. [DOI: https://dx.doi.org/10.2190/5YN3-J3P8-FWHY-UDB3]

11. Post, R.A.G.; Blijlevens, J.; Hekkert, P.; Saakes, D.; Arango, L. Why We like to Touch: Consumers’ Tactile Esthetic Appreciation Explained by a Balanced Combination of Unity and Variety in Product Designs. Psychol. Mark.; 2023; 40, pp. 1249-1262. [DOI: https://dx.doi.org/10.1002/mar.21798]

12. Bar, M. Visual Objects in Context. Nat. Rev. Neurosci.; 2004; 5, pp. 617-629. [DOI: https://dx.doi.org/10.1038/nrn1476]

13. Navon, D. Forest before Trees: The Precedence of Global Features in Visual Perception. Cogn. Psychol.; 1977; 9, pp. 353-383. [DOI: https://dx.doi.org/10.1016/0010-0285(77)90012-3]

14. Wagemans, J.; Feldman, J.; Gepshtein, S.; Kimchi, R.; Pomerantz, J.R.; van der Helm, P.A.; van Leeuwen, C. A Century of Gestalt Psychology in Visual Perception: II. Conceptual and Theoretical Foundations. Psychol. Bull.; 2012; 138, pp. 1218-1252. [DOI: https://dx.doi.org/10.1037/a0029334] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22845750]

15. Rosch, E. Principles of Categorization. Readings in Cognitive Science; Elsevier: Amsterdam, The Netherlands, 1978; pp. 312-322. ISBN 978-1-4832-1446-7

16. Cupchik, G.C.; Gebotys, R.J. Interest and Pleasure as Dimensions of Aesthetic Response. Empir. Stud. Arts; 1990; 8, pp. 1-14. [DOI: https://dx.doi.org/10.2190/L789-TPPY-BD2Q-T7TW]

17. Fingerhut, J.; Prinz, J.J. Aesthetic Emotions Reconsidered. Monist; 2020; 103, pp. 223-239. [DOI: https://dx.doi.org/10.1093/monist/onz037]

18. Wassiliwizky, E.; Menninghaus, W. Why and How Should Cognitive Science Care about Aesthetics?. Trends Cogn. Sci.; 2021; 25, pp. 437-449. [DOI: https://dx.doi.org/10.1016/j.tics.2021.03.008]

19. Coburn, A.; Vartanian, O.; Chatterjee, A. Buildings, Beauty, and the Brain: A Neuroscience of Architectural Experience. J. Cogn. Neurosci.; 2017; 29, pp. 1521-1531. [DOI: https://dx.doi.org/10.1162/jocn_a_01146]

20. Parsons, G.; Carlson, A. Functional Beauty; Oxford University Press: Oxford, UK, 2008; ISBN 978-0-19-920524-0

21. Spence, C. Senses of Place: Architectural Design for the Multisensory Mind. Cogn. Res. Princ. Implic.; 2020; 5, 46. [DOI: https://dx.doi.org/10.1186/s41235-020-00243-4]

22. Berger, C.; Mahdavi, A. Exploring Cross-Modal Influences on the Evaluation of Indoor-Environmental Conditions. Front. Built Environ.; 2021; 7, 676607. [DOI: https://dx.doi.org/10.3389/fbuil.2021.676607]

23. Hvass, M.; Van Den Wymelenberg, K.; Boring, S.; Hansen, E.K. Intensity and Ratios of Light Affecting Perception of Space, Co-Presence and Surrounding Context, a Lab Experiment. Build. Environ.; 2021; 194, 107680. [DOI: https://dx.doi.org/10.1016/j.buildenv.2021.107680]

24. Rasmussen, S.E. Experiencing Architecture; The MIT Press: Cambridge, MA, USA, 2005; ISBN 978-0-262-68002-8

25. Eysenck, H.J. The Experimental Study of the ’Good Gestalt’—A New Approach. Psychol. Rev.; 1942; 49, pp. 344-364. [DOI: https://dx.doi.org/10.1037/h0057013]

26. Kellett, K.R. A Gestalt Study of the Function of Unity in Aesthetic Perception. Psychol. Monogr.; 1939; 51, pp. 23-51. [DOI: https://dx.doi.org/10.1037/h0093475]

27. Lidwell, W.; Holden, K.; Butler, J. Universal Principles of Design; Rockport Publishers: Gloucester, MA, USA, 2003; ISBN 978-1-59253-007-6

28. Nasar, J.L. Urban Design Aesthetics: The Evaluative Qualities of Building Exteriors. Environ. Behav.; 1994; 26, pp. 377-401. [DOI: https://dx.doi.org/10.1177/001391659402600305]

29. Biederman, I.; Vessel, E.A. Perceptual Pleasure and the Brain: A Novel Theory Explains Why the Brain Craves Information and Seeks It through the Senses. Am. Sci.; 2006; 94, pp. 247-253. [DOI: https://dx.doi.org/10.1511/2006.59.247]

30. Gifford, R.; Hine, D.W.; Muller-Clemm, W.; Reynolds, D.J., Jr.; Shaw, K.T. Decoding Modern Architecture: A Lens Model Approach for Understanding the Aesthetic Differences of Architects and Laypersons. Environ. Behav.; 2000; 32, pp. 163-187. [DOI: https://dx.doi.org/10.1177/00139160021972487]

31. Hagerhall, C.M.; Purcell, T.; Taylor, R. Fractal Dimension of Landscape Silhouette Outlines as a Predictor of Landscape Preference. J. Environ. Psychol.; 2004; 24, pp. 247-255. [DOI: https://dx.doi.org/10.1016/j.jenvp.2003.12.004]

32. Leder, H.; Belke, B.; Oeberst, A.; Augustin, D. A Model of Aesthetic Appreciation and Aesthetic Judgments. Br. J. Psychol.; 2004; 95, pp. 489-508. [DOI: https://dx.doi.org/10.1348/0007126042369811]

33. Jacobsen, T.; Höfel, L. Aesthetic Judgments of Novel Graphic Patterns: Analyses of Individual Judgments. Percept. Mot. Ski.; 2002; 95, pp. 755-766. [DOI: https://dx.doi.org/10.2466/pms.2002.95.3.755]

34. Che, J.; Sun, X.; Gallardo, V.; Nadal, M. Cross-Cultural Empirical Aesthetics. Progress in Brain Research; Elsevier: Amsterdam, The Netherlands, 2018; 237, pp. 77-103. ISBN 978-0-12-813981-3

35. Nisbett, R.E.; Peng, K.; Choi, I.; Norenzayan, A. Culture and Systems of Thought: Holistic versus Analytic Cognition. Psychol. Rev.; 2001; 108, pp. 291-310. [DOI: https://dx.doi.org/10.1037/0033-295X.108.2.291]

36. Masuda, T.; Nisbett, R.E. Culture and Change Blindness. Cogn. Sci.; 2006; 30, pp. 381-399. [DOI: https://dx.doi.org/10.1207/s15516709cog0000_63]

37. Chua, H.F.; Boland, J.E.; Nisbett, R.E. Cultural Variation in Eye Movements during Scene Perception. Proc. Natl. Acad. Sci. USA; 2005; 102, pp. 12629-12633. [DOI: https://dx.doi.org/10.1073/pnas.0506162102]

38. Nasar, J.L. Visual Preferences in Urban Street Scenes: A Cross-Cultural Comparison between Japan and the United States. J. Cross-Cult. Psychol.; 1984; 15, pp. 79-93. [DOI: https://dx.doi.org/10.1177/0022002184015001005]

39. Post, R.; Nguyen, T.; Hekkert, P. Unity in Variety in Website Aesthetics: A Systematic Inquiry. Int. J. Hum. Comput. Stud.; 2017; 103, pp. 48-62. [DOI: https://dx.doi.org/10.1016/j.ijhcs.2017.02.003]

40. Palmer, S.E.; Schloss, K.B.; Sammartino, J. Visual Aesthetics and Human Preference. Annu. Rev. Psychol.; 2013; 64, pp. 77-107. [DOI: https://dx.doi.org/10.1146/annurev-psych-120710-100504] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23020642]

41. Vessel, E.A.; Isik, A.I.; Belfi, A.M.; Stahl, J.L.; Starr, G.G. The Default-Mode Network Represents Aesthetic Appeal That Generalizes across Visual Domains. Proc. Natl. Acad. Sci. USA; 2019; 116, pp. 19155-19164. [DOI: https://dx.doi.org/10.1073/pnas.1902650116] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31484756]

42. Hooge, I.T.h.C.; Erkelens, C.J. Peripheral Vision and Oculomotor Control during Visual Search. Vis. Res.; 1999; 39, pp. 1567-1575. [DOI: https://dx.doi.org/10.1016/S0042-6989(98)00213-2]

43. Malewczyk, M.; Taraszkiewicz, A.; Czyż, P. Visual Perception of Regularity and the Composition Pattern Type of the Facade. Buildings; 2024; 14, 1389. [DOI: https://dx.doi.org/10.3390/buildings14051389]

44. Palmer, S.E. Hierarchical Structure in Perceptual Representation. Cogn. Psychol.; 1977; 9, pp. 441-474. [DOI: https://dx.doi.org/10.1016/0010-0285(77)90016-0]

45. Frascaroli, J.; Leder, H.; Brattico, E.; Van de Cruys, S. Aesthetics and Predictive Processing: Grounds and Prospects of a Fruitful Encounter. Philos. Trans. R. Soc. B Biol. Sci.; 2023; 379, 20220410. [DOI: https://dx.doi.org/10.1098/rstb.2022.0410]

46. von Ballestrem, M.; Gleiter, J.H. Type-Prototype-Archetype: Type Formation in Architecture. Int. J. Archit. Theory; 2019; 24, pp. 5-9.

47. Tversky, A. Features of Similarity. Psychol. Rev.; 1977; 84, pp. 327-352. [DOI: https://dx.doi.org/10.1037/0033-295X.84.4.327]

48. Veryzer, R.W., Jr.; Hutchinson, J.W. The Influence of Unity and Prototypicality on Aesthetic Responses to New Product Designs. J. Consum. Res.; 1998; 24, pp. 374-394. [DOI: https://dx.doi.org/10.1086/209516]

49. Hassenzahl, M.; Diefenbach, S.; Göritz, A. Needs, Affect, and Interactive Products–Facets of User Experience. Interact. Comput.; 2010; 22, pp. 353-362. [DOI: https://dx.doi.org/10.1016/j.intcom.2010.04.002]

50. Lavie, T.; Tractinsky, N. Assessing Dimensions of Perceived Visual Aesthetics of Web Sites. Int. J. Hum. Comput. Stud.; 2004; 60, pp. 269-298. [DOI: https://dx.doi.org/10.1016/j.ijhcs.2003.09.002]

51. van Schaik, P.; Ling, J. Modelling User Experience with Web Sites: Usability, Hedonic Value, Beauty and Goodness. Interact. Comput.; 2008; 20, pp. 419-432. [DOI: https://dx.doi.org/10.1016/j.intcom.2008.03.001]

52. Karapanos, E.; Martens, J.-B.; Hassenzahl, M. Reconstructing Experiences with iScale. Int. J. Hum. Comput. Stud.; 2012; 70, pp. 849-865. [DOI: https://dx.doi.org/10.1016/j.ijhcs.2012.06.004]

53. Ceballos, L.M.; Hodges, N.N.; Watchravesringkan, K. The MAYA Principle as Applied to Apparel Products: The Effects of Typicality and Novelty on Aesthetic Preference. J. Fash. Mark. Manag.; 2019; 23, pp. 587-607. [DOI: https://dx.doi.org/10.1108/JFMM-09-2018-0116]

54. Hekkert, P.; Snelders, D.; Van Wieringen, P.C.W. ‘Most Advanced, yet Acceptable’: Typicality and Novelty as Joint Predictors of Aesthetic Preference in Industrial Design. Br. J. Psychol.; 2003; 94, pp. 111-124. [DOI: https://dx.doi.org/10.1348/000712603762842147]

55. Cao, Z.; Mustafa, M.; Mohd Isa, M.H. The Role of Artistic Quality in a Heritage Architectural Style in Modulating Tourist Interest and Aesthetic Pleasure: A Case Study of Hui-Style Architecture in the Hongcun Scenic Area, China. J. Herit. Tour.; 2024; 19, pp. 896-918. [DOI: https://dx.doi.org/10.1080/1743873X.2024.2378805]

56. Zhang, M.; Du, J.; Meng, Y. Biophilia and Visual Preference for Chinese Vernacular Windows: An Investigation into Shape. Taylor Fr. J. Asian Archit. Build. Eng.; 2022; 22, pp. 2448-2459. [DOI: https://dx.doi.org/10.1080/13467581.2022.2160203]

57. Wang, Z.; Zheng, R.; Tang, J.; Wang, S.; He, X. The Aesthetic Imagery of Traditional Garden Door and Window Forms: A Case Study of the Four Major Traditional Gardens of Lingnan. Buildings; 2025; 15, 513. [DOI: https://dx.doi.org/10.3390/buildings15040513]

58. Blijlevens, J.; Thurgood, C.; Hekkert, P.; Chen, L.-L.; Leder, H.; Whitfield, T.W.A. The Aesthetic Pleasure in Design Scale: The Development of a Scale to Measure Aesthetic Pleasure for Designed Artifacts. Psychol. Aesthet. Creat. Arts; 2017; 11, pp. 86-98. [DOI: https://dx.doi.org/10.1037/aca0000098]

59. Krejcie, R.V.; Morgan, D.W. Determining Sample Size for Research Activities. Educ. Psychol. Meas.; 1970; 30, pp. 607-610. [DOI: https://dx.doi.org/10.1177/001316447003000308]

60. Crilly, N.; Good, D.; Matravers, D.; Clarkson, P.J. Design as Communication: Exploring the Validity and Utility of Relating Intention to Interpretation. Des. Stud.; 2008; 29, pp. 425-457. [DOI: https://dx.doi.org/10.1016/j.destud.2008.05.002]

61. Whitfield, T.W.A.; Slatter, P.E. The Effects of Categorization and Prototypicality on Aesthetic Choice in a Furniture Selection Task. Br. J Psychol.; 1979; 70, pp. 65-75. [DOI: https://dx.doi.org/10.1111/j.2044-8295.1979.tb02144.x]

62. Zajonc, R.B. Attitudinal Effects of Mere Exposure. J. Personal. Soc. Psychol.; 1968; 9, pp. 1-27. [DOI: https://dx.doi.org/10.1037/h0025848]

63. Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using Lme4. J. Stat. Softw.; 2015; 67, pp. 1-48. [DOI: https://dx.doi.org/10.18637/jss.v067.i01]

64. Matuschek, H.; Kliegl, R.; Vasishth, S.; Baayen, H.; Bates, D. Balancing Type I Error and Power in Linear Mixed Models. J. Mem. Lang.; 2017; 94, pp. 305-315. [DOI: https://dx.doi.org/10.1016/j.jml.2017.01.001]

65. Halberstadt, J.; Rhodes, G. It’s Not Just Average Faces That Are Attractive: Computer-Manipulated Averageness Makes Birds, Fish, and Automobiles Attractive. Psychon. Bull. Rev.; 2003; 10, pp. 149-156. [DOI: https://dx.doi.org/10.3758/BF03196479]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.