1. Introduction
1.1. Overview of the Traditional Villages in the Hakka Region of Guangdong Province
Traditional villages in China are integral to Chinese civilization by not only carrying forwards 5000 years of agricultural history but also preserving the unique social cultures and lifestyles of various regions. These villages reflect multiple aspects of China’s historical, cultural, aesthetic, and tourism value [1,2,3]. Chinese culture, which is vast and profound, is deeply intertwined with local and regional cultures, giving rise to numerous traditional villages, each of which have distinct forms and features. To date, through three ministries, the Chinese government has designated 8155 villages as traditional villages, which are spread across six batches [4]. Guangdong Province, which is located at the southernmost tip of mainland China and borders the South China Sea, is the heartland of the Lingnan culture. Owing to its distinct language, history, and cultural characteristics, Guangdong is home to 293 traditional villages, which are primarily categorized into three groups [5], namely, Cantonese, Chaoshan, and Hakka, as shown in Figure 1 [6]. Among them, traditional villages in the Hakka region of Guangdong have attracted considerable scholarly attention because of their large number and the distinctiveness of the Hakka culture. Currently, 124 villages in the Hakka region of Guangdong have been designated as traditional Chinese villages, representing 42% of the total number of such villages in Guangdong Province. These traditional villages are concentrated primarily in the cities of Meizhou, Heyuan, Huizhou, and Shaoguan, as well as in the Hakka people’s main settlement areas [7]. The traditional villages located in the Hakka region of Guangdong Province preserve a rich cultural heritage, making them critical subjects for studying the culture of traditional villages in China [8].
The term “Hakka” literally refers to “guest families” or “families of migrants”, reflecting a history marked by migration and adaptation [9]. As a significant subgroup of the Han nationality, the Hakka people originally hailed from the Central Plains region along the Yellow River, with a history spanning thousands of years. To fight wars and natural disasters, the Hakka people gradually migrated southwards, undergoing three significant waves of migration before ultimately settling in the Hakka region of Guangdong, particularly in areas such as Meizhou and Heyuan, as shown in Figure 2 (which shows a revised version of the migration routes based on the migration map of the Hakka people provided in [7]). In their new settlements, the Hakka people humbly referred to themselves as “guests”, which is a term that reflects their history of migrating from the Central Plains region to the southern regions. Over time, “Hakka” evolved from a temporary label into a distinct ethnonym that embodies the group’s unique cultural, historical, and linguistic characteristics [10].
After migrating to southern China, the Hakka people found that most of the plains had already been settled by local populations, forcing them to establish themselves in mountainous or hilly areas [8]. As a result, the traditional villages in the Hakka region of Guangdong were predominantly built near mountains and water sources. Moreover, the Hakka people place great importance on clan relations, and the spatial layout of their villages is typically organised around family units. Central to this arrangement are ancestral halls, family residences, and public activity areas, which together form the heart of the villages [11]. This layout provides defence against external threats and strengthens the bonds and cohesion among family members. The traditional Hakka architecture is characterised primarily by two distinctive dwelling forms, namely, the “Weilongwu” [12] and the “Tulou” [13]. These unique structures are typically large, multistorey buildings that are either square or round in shape, with high walls surrounding them for protection. Serving both as fortifications and as shared living spaces, these structures can accommodate extended families [14]. The traditional villages in the Hakka region of Guangdong embody a rich cultural heritage, with their landscapes and soundscapes being uniquely distinctive. A well-integrated audio-visual environment allows residents and visitors to immerse themselves fully in the charm and cultural atmosphere of traditional villages.
The traditional villages in the Hakka region of Guangdong Province represent a rich cultural heritage, reflecting the unique history, architectural forms, and customs of the Hakka people. Through centuries of migration and adaptation, their distinctive visual landscapes and soundscapes have become essential components in understanding the culture of traditional Chinese villages.
1.2. Overview of the Research on the Landscape and Soundscape in Traditional Villages
According to the Council of Europe Landscape Convention of 2000 [15], a “landscape” is defined as “an area, as perceived by people, whose character is the result of the action and interaction of natural and/or human factors”. Research on landscape perception and evaluation has a long history [16,17]. The term “soundscape” was popularised by the Canadian composer R. Murray Schafer in the late 1960s [18] and has garnered significant academic attention, with the field of soundscape research continuously expanding. In 2014, the International Organization for Standardization (ISO) defined a soundscape as an “acoustic environment as perceived or experienced and/or understood by a person or groups, in context“, highlighting the critical roles of the listener, sound, and spatial environment in soundscape studies [19]. The ISO has also formalised soundscape research and evaluation methods and have published relevant international standards [20,21]. Moreover, the scope of soundscape research has broadened to include a variety of urban environments and building types [22,23,24,25].
In recent years, traditional villages in China and rural areas in other parts of the world have attracted increasing attention from scholars both domestically and internationally owing to their beautiful natural environments and tranquil characteristics, which are conducive to rest and relaxation. In fact, as early as the inception of soundscape studies, scholars abroad began systematically exploring rural soundscapes. In 1977, Schafer, in his seminal works known as Five Village Soundscapes [18] and European Sound Diary [26], meticulously recorded and analysed the soundscape characteristics of five rural villages, thereby laying the foundation for the study of rural soundscapes.
Over the following decades, an increasing number of studies have highlighted the significant psychological and physiological benefits of rural soundscapes, underscoring their importance. For example, Payne et al. [27], through a comparative study of urban, urban park, and rural soundscapes, reported that rural soundscapes possess the highest restorative potential, making them ideal environments for psychological recovery. Daugstad [28] further emphasised that one of the primary reasons that tourists engage in rural tourism is to experience the peaceful soundscapes of the countryside and to reconnect with a natural, pastoral way of life. Nilsson and Berglund [29] conducted a survey and found that more than 80% of respondents rated rural soundscapes as either “good” or “very good”, highlighting the unique appeal of rural soundscapes. These studies indicate that the unique value of rural soundscapes has been widely recognised, further confirming the important role that rural soundscapes play in modern society.
Although the value of rural soundscapes has been widely recognised, traditional villages are facing the risk of decline or disappearance due to the rapid development of urbanisation. These villages have been subjected to varying degrees of destruction, with many intangible cultural heritages, particularly visual landscapes and soundscapes, being severely impacted. The traditional villages located in the Hakka region of Guangdong are no exception, where various urban noise sources disrupt the tranquillity of these villages, significantly impairing the auditory experience of their residents. Therefore, it is crucial to clarify the evaluation characteristics of the soundscapes of traditional Hakka villages in the context of modern development, to enhance people’s overall perception and evaluation, and to protect and preserve the visual landscapes and soundscapes of these villages.
More academic research is needed on the soundscape of traditional villages. Some scholars have focused on the soundscape characteristics of traditional villages. For example, Deng et al. [30], on the basis of a survey of the soundscape in Zhaoxing County, proposed an analytical case study focusing on the soundmark of Dong nationality village. Through sound collection and field testing, Mao et al. [31] evaluated and summarised the soundscape characteristics of Hmong villages in Guizhou Province. Zou et al. [32] investigated six traditional villages at different stages of tourism development, exploring trends in the evolution of their soundscape characteristics. Other studies have focused on subjective evaluations of the soundscape in traditional villages. Qin et al. [33] investigated tourists’ perceptions of the soundscape and its influencing factors in Diantou village, Shanxi Province. Liu et al. [34] studied three traditional She nationality villages in Zhejiang Province and analysed perceptions and evaluations of the primary sound sources and the overall soundscape characteristics. Recent studies have combined objective acoustic measurements with subjective evaluations of the soundscape in traditional villages. For example, Xie et al. [35] conducted objective measurements and subjective evaluations of the soundscape in Azhake village, which is a traditional village of the Hani nationality in Sichuan Province. Similarly, Wang et al. [36], using sound walks and the semantic differential method, conducted both objective measurements and subjective evaluations of the soundscape in Zhaoxing County. Overall, research on the soundscapes of traditional villages has focused primarily on exploring the soundscape characteristics of villages, people’s perceptions of soundscapes, and their impacts on the experiences of residents and visitors. These studies have been conducted in diverse locations, with sound elements in villages predominantly consisting of natural and human-made sounds. The utilised subjective evaluation indicators focus mainly on acoustic comfort and sound preference. However, research on the soundscape of traditional villages in Guangdong Province remains relatively scarce.
In recent years, an increasing amount of attention has been given to the study of visual landscapes and soundscapes in traditional villages, particularly with respect to the characteristics of soundscapes and people’s perceptions. However, research on the soundscapes of traditional villages in Guangdong Province remains relatively limited. Against the backdrop of increasing pressures stemming from urbanisation, the visual landscapes and soundscapes of these traditional villages are facing significant threats, highlighting the urgent need to strengthen the study of the audio-visual environment in traditional villages.
1.3. Overview of the Research on Audio-Visual Interaction
Currently, research on the visual landscapes and soundscapes of traditional villages is being systematised, including the use of GIS for landscape analysis. For example, Liu et al. [37] utilised GIS to conduct a landscape analysis of Linpu village, primarily examining the village’s landscape features and revealing its spatial configuration of “one core, two wings”. Duan et al. [38] utilised GIS to analyse the spatial distribution of traditional villages in the Fuzhou area of Jiangxi Province and further examined the impact of different terrains and waterbody types on the landscapes of these villages. Zheng et al. [39] used GIS to analyse the spatial distribution patterns of traditional villages in southwest China and further explored the landscape patterns of these villages.
On the other hand, some scholars have employed psychoacoustic models to evaluate soundscapes. For example, Chen et al. [40] used psychoacoustic models to analyse the impact of different sounds on tourists’ emotions and perceptions in Zhuquan village, providing guidance for the planning and management of rural tourism destination soundscapes. Chi et al. [41] conducted a categorisation analysis of traditional village soundscapes and used psychoacoustic models to study the soundscape of Sizhai village, which is a traditional village located in Zhejiang Province. However, these studies often focus on one sensory factor in isolation, thereby limiting their application in the audio-visual interaction of traditional villages. Therefore, we used audio-visual interaction experiments to explore the evaluation characteristics of visual landscapes and soundscapes and the mechanisms of audio-visual interaction in traditional villages in the Hakka region, providing a novel perspective for enhancing the audio-visual environment of traditional villages.
The perceptual process’s interaction between visual and auditory stimuli has also attracted a significant amount of attention. Research on audio-visual interactions dates back to the 1980s, when Anderson et al. [42] explored how audio-visual interactions affect people’s preferences for outdoor environments. Over the past 40 years, contemporary researchers have further employed field assessments or laboratory experiments to investigate the mechanisms underlying audio-visual interactions [43]. Some studies have indicated that visual stimuli significantly influence people’s perception and evaluation of a given soundscape [44]. For example, natural landscapes can reduce noise interference, thereby enhancing people’s positive response to environmental sounds and improving their overall perception of the soundscape [45,46]. Through audio-visual interaction experiments, Liu et al. [47] reported that visual stimuli have a more significant impact on the perception of single sounds, further emphasising the interactive relationship between landscape and soundscape perceptions.
Additionally, different types of visual stimuli significantly affect soundscape perception [48]. In outdoor environments, various visual stimuli notably influence the perception of sounds, with natural landscapes improving the evaluation of environmental sounds. In contrast, artificial landscapes do not have the same effect [49]. Moreover, a soundscape can influence people’s perception and evaluation of a visual landscape [15]. One study reported that music and noise capture people’s visual attention at large train stations, with sounds that match the visual landscape having a powerful effect on visual attention [50]. Furthermore, Preis et al. [51] confirmed that auditory stimuli significantly impact people’s perception of audio-visual comfort more than visual stimuli do. These findings suggest a strong correlation between visual landscape and soundscape evaluations. Therefore, when the influence of the soundscape on visual evaluation is explored, it is equally important to consider the role of visual factors in audio perception.
In summary, the traditional villages located in the Hakka region of Guangdong not only embody rich Hakka cultural characteristics but also present unique spatial layouts and lifestyles, all of which have a profound impact on people’s perceptions and experiences. However, research on the visual landscapes and soundscape evaluations of traditional villages in the Hakka region of Guangdong remains limited. Moreover, with the accelerating process of urbanisation, balancing the preservation of distinct Hakka cultural features with the need to meet the audio-visual environment demands of residents and visitors has become a pressing issue. Therefore, it is crucial to conduct in-depth studies on the evaluation characteristics of the visual landscapes and soundscapes of these villages and explore the mechanisms of audio-visual interactions.
In this context, the current study focuses on the visual landscapes and soundscapes of traditional villages in the Hakka region of Guangdong, particularly how they shape people’s perceptions within these villages. To do so, field research and audio-visual material collection were conducted in four representative traditional villages in the Hakka region of Guangdong, followed by the quantification of visual materials via the common objects in context (COCO) model. Subsequently, an audio-visual interaction experiment was carried out, accompanied by survey questionnaires of the participants. This study aims to analyse the evaluation characteristics of people’s visual landscape satisfaction, acoustic comfort, and audio-visual harmony, further exploring the mechanisms of audio-visual interaction that influence people’s perceptions in these villages. The findings could contribute to creating healthier and more comfortable audio-visual environments for both residents and visitors and help provide recommendations for the modernisation and renewal of traditional villages. The specific research questions examined are as follows:
(1) How do participants perceive and evaluate different types of visual landscapes and soundscapes in traditional villages in the Hakka region of Guangdong Province? Do these visual and audio factors contribute to overall comfort and pleasantness in the village environment? What is the relationship between visual landscape satisfaction, acoustic comfort, and audio-visual harmony as evaluation indicators?
(2) Do various audio factors influence the evaluation of visual landscape satisfaction? Can sounds that match visual materials enhance the evaluation of visual landscape satisfaction?
(3) Do different visual factors affect the evaluation of acoustic comfort? Can visual images that match audio materials improve the evaluation of acoustic comfort?
(4) How do different visual and audio factors influence the evaluation of audio-visual harmony?
2. Methods
In this study, we use audio-visual interaction experiments and survey questionnaires to evaluate the landscape and soundscape of traditional villages in the Hakka region of Guangdong Province. To do so, four of the most representative traditional villages in the Hakka region of Guangdong, where visual and audio materials were collected onsite, were selected as research sites. Subsequently, various combinations of audio-visual materials were played in an audio-visual laboratory, where participants were asked to rate three evaluation indicators, namely, visual landscape satisfaction, acoustic comfort, and audio-visual harmony. Through statistical analyses of the experimental results, this study aims to explore the interaction mechanisms between visual and audio factors and compare these findings with those of previous studies. Figure 3 presents the research framework and the procedural steps involved to provide a more straightforward overview of this study.
2.1. Research Sites
To select the research sites, we comprehensively considered the scale, cultural heritage, spatial layout, and adaptability to the process of modernisation of traditional villages in the Hakka region of Guangdong, aiming to include villages that are representative, culturally significant, and diverse. Therefore, four typical and widely representative villages were chosen, namely, Qiaoxiang village and Qiaoxi village in Meizhou, Huxinba village in Shaoguan, and Huangsiyang village in Huizhou. The locations of these four villages are shown in Figure 4.
These villages are typical across multiple dimensions and hold significant positions, as detailed in Table 1. First, from a historical perspective, all four villages were established during the Ming dynasty; thus, they all boast a long history and a rich cultural heritage. They are not only key carriers of Hakka cultural traditions but also witnesses to the historical changes and cultural evolution of the Hakka people. Qiaoxiang village, which is renowned as a famous hometown of Chinese people living overseas, is one of the birthplaces of the Hakka culture and retains many cultural heritage elements related to the history of Chinese people living overseas. Qiaoxi village, which is notable for its historical Hakka remains, has a unique village layout and historical relics that reflect the historical footprint of the Hakka people. Huxinba village is famous for its distinctive Hakka culture and village layout, which highlights the unique spatial patterns formed by the Hakka people in their interactions with the natural environment. Huangsiyang village, owing to its well-preserved traditional Hakka dwellings, has become a hotspot for research on the Hakka culture. Its architectural form and spatial layout provide valuable field samples for studying traditional Hakka dwellings.
Second, in terms of scale, these villages are all medium- to small-sized, with representative landscapes and broad influence within their respective regions, all of which makes them highly important for the study of Hakka culture. In terms of cultural heritage, all four villages aim to preserve a wealth of intangible cultural heritage, such as traditional festivals, handicrafts, and local customs, and they have demonstrated strong cultural continuity and stability in the face of modernisation. Additionally, the villages have aimed to preserve not only their overall layout but also traditional Hakka dwellings, forming a harmonious relationship with the natural environment and highlighting the deep integration of the Hakka culture with nature.
Moreover, the resident populations of these villages are stable and relatively large, with a high level of cultural identity and collective memory. The lifestyles of residents are largely traditional, and their daily activities are often accompanied by natural sounds such as birdsong and flowing water, as well as human-made sounds generated by activities. The interactions of these visual landscapes and soundscapes collectively form an immersive audio-visual experience that reflects the cultural characteristics and liveliness of traditional villages in the Hakka region.
Selecting these representative traditional villages provides valuable field data for studying traditional villages. It serves as a foundation for the planning and designing of traditional villages from an audio-visual perspective. By analysing the audio-visual interaction in traditional villages, the study offers new insights into creating more immersive and culturally resonant spatial designs in traditional environments, thereby enhancing the spatial experience of people and their sense of belonging.
2.2. Experimental Materials
The research team conducted field surveys at the research sites from March to May 2024 during daytime hours (i.e., from 9:00 a.m. to 5:00 p.m.). Professional equipment was used to collect the visual and audio data.
2.2.1. Visual Material Collection
During the preliminary field surveys, we collected many representative visual images of traditional villages in the Hakka region of Guangdong. A high-quality camera set-kit (manufacturer: Canon, model: EOS R8, location: Tokyo, Japan) was used to capture the visual materials. These images were taken on sunny days from a horizontal eye level, approximately 1.6 m in height, using typical angles for visual landscape photography. The camera specifications are as follows: the range is 24.2 MP and the accuracy is ±1%. The lens specifications are as follows: the range is 24 mm–50 mm, the accuracy is ±2%, and the resolution is f/4.5–f/6.3. The high configuration of the camera ensured the provision of high-quality images during the capture and saving of visual materials, guaranteeing the accuracy and reliability of the research data.
We also ensured that the exposure settings and white balance remained consistent across all the images, thereby minimising visual differences caused by external factors such as lighting and colour temperature. The camera angle was kept level, with the eye height set at approximately 1.6 m. The resolution of each image was set to 300 dpi to ensure a sufficient level of detail for subsequent analysis. To control the spatial scope, the area covered by each captured image typically ranged from 5 to 20 m, ensuring that representative visual elements of the region were included. All the collected images were taken at the selected research sites following standardised shooting procedures. The captured content covered various scenes within the villages, including natural landscapes, areas with dense human activity, key cultural landmarks, and traditional Hakka residential areas, ensuring a comprehensive representation of the village’s typical visual characteristics.
A total of 316 visual images were collected. After the shooting was completed, all the images underwent professional postprocessing, which was limited to necessary colour correction and detail enhancement, avoiding excessive retouching. Additionally, the collected visual materials served as the basis for fine-tuning the COCO model. Through retraining, the COCO model was optimised to better adapt to traditional village scenes, thereby improving its segmentation accuracy for visual elements related to traditional landscapes and providing more precise data support for the audio-visual interaction experiment.
On the basis of the research objectives and findings from preliminary field surveys, as well as the landscape characteristics of the four selected villages, the collected visual images were categorised into four types, namely, natural space, agricultural space, living space, and folklore space. Four representative images were selected for each category to create a set of visual materials, totalling 16 images, as shown in Figure 5. Figure 5a depicts natural spaces located outside the villages, which typically feature diverse topographies and plant communities. Such spaces offer unique landscape perspectives, as well as an ecological environment, and play a crucial role in maintaining the health and stability of the village environment. Figure 5b illustrates the agricultural landscapes of the villages, revealing extensive farming areas such as tea fields, rice paddies, and vegetable plots. Such spaces reflect the long history of agricultural civilization in China and highlight the distinctive Hakka characteristics. Figure 5c shows living spaces, which encompass the traditional residential areas and public spaces in the Hakka region and offer a strong sense of daily life that reflects the lifestyle of the Hakka people. Figure 5d shows folklore spaces, featuring various traditional Hakka activities, including conventional opera, dance, and distinctive folk customs, effectively reflecting the Hakka region’s historical heritage.
The types of visual images collected in this study reveal that the proportion of visual elements varies across different spaces within traditional villages. As shown in the visual images presented in Figure 5, this study utilised the COCO model to perform semantic segmentation on the most representative images from each of the four visual landscape types, extracting the proportion of each visual element in these images.
The quality and proportion of visual landscape elements in the environment significantly affect people’s perceptions. To further analyse the influence of visual elements on perception, this study employed semantic segmentation. Semantic segmentation allows for the precise labeling of various visual elements in an image at the pixel level, providing more detailed information for spatial analysis. Unlike traditional object detection methods, which typically only mark the bounding boxes of objects and encounter difficulties in handling complex backgrounds or overlapping elements, semantic segmentation has advantages when processing the visually complex backgrounds of traditional villages. Semantic segmentation allowed us to extract key visual elements that play a critical role in shaping the atmosphere of the village and quantify their proportions, thereby providing more precise data support for the in-depth analysis of how the visual landscape of traditional villages in the Hakka region influences people’s perceptions.
This study selected the COCO model for visual image segmentation because of its outstanding performance in image segmentation and object detection tasks, particularly its strong adaptability in object recognition within complex backgrounds. The COCO-Text dataset provides over 173,000 text annotations for the training set, covering more than 63,000 images, and approximately 5000 images for the validation set, ensuring the diversity of model training and a high-quality data foundation. All the images in the dataset are meticulously annotated, covering a wide range of common object categories and effectively reflecting the characteristics of the objects and elements in various complex scenes.
The visual environment of traditional villages often features complex backgrounds and unique elements, such as traditional Hakka architecture and distinctive landscapes, which are more variable and denser than those found in urban street scenes. Compared with models specifically designed for urban street landscapes, the COCO model demonstrates superior cross-scene adaptability that can effectively handle a variety of visual environments, including those of traditional villages in the Hakka region of Guangdong Province.
In addition, on the basis of the visual images collected during the preliminary field survey, we fine-tuned the COCO model. In accordance with the characteristics of traditional village images, we performed data augmentation on the training set. This process allowed the model to more accurately identify and segment visual information associated with the unique landscape elements of traditional villages, such as traditional Hakka architecture, vegetation, and the sky.
The images selected for semantic segmentation were chosen to represent their respective landscape types, as these perspectives and scenes effectively capture the core characteristics of each type. For example, the images chosen to represent natural spaces depicted expansive skies, distant mountains, and dense vegetation; the images chosen to represent agricultural spaces illustrated residents working in vast fields, emphasising the agricultural production atmosphere; the images chosen to represent living spaces portrayed traditional Hakka dwellings and residents’ lifestyles, reflecting the typical village living environment; and the images chosen to represent folk spaces captured vibrant scenes from traditional festive activities, highlighting unique Hakka cultural events. The selection of these images ensures the representativeness and comprehensiveness of each type, effectively highlighting the distinct features of each landscape category.
As shown in Table 2, the segmented images reveal that the visual elements in the visual materials collected from traditional villages in the Hakka region of Guangdong are primarily categorised into sky, vegetation, construction, and dynamic aspects. Notably, the proportions of sky and vegetation are generally greater in natural and agricultural spaces, where expansive blue skies and lush vegetation are predominant. These visual elements convey a sense of openness and tranquillity, reflecting the natural environment and ecological vitality in these villages. In contrast, in living and folklore spaces, the proportions of construction and dynamic spaces are more prominent, highlighting human activities and cultural practices. These two visual elements reflect the interactive nature of village life and the continuation of traditional customs.
As one moves from a natural space to a folklore space, the proportions of sky and vegetation gradually decrease, whereas the proportions of construction and dynamic space gradually increase. This shift reflects a transformation in the spatial characteristics of these villages, evolving from an open, natural environment to a more enclosed space marked by human activity, ultimately culminating in a folklore scene rich in cultural symbolism. These changes in the proportion of visual elements provide an essential foundation for further analyses of how different visual factors influence the participants’ perceptions and evaluations.
2.2.2. Audio Material Collection
To record audio material, this study captured sounds corresponding to visual scenes in four types of spaces. A binaural recording and playback measurement system (manufacturer: BSWA TECH, model: MicW iDAQ2022, location: Beijing, China) was used to ensure high-quality recordings, with microphones positioned close to the sound sources to minimise interference from other noises. The bit resolution used was 16 bits, and the sampling frequency was 44.1 kHz. Four sets of the most representative audio materials were created after typical sounds from the four types of spaces were selected and edited.
The characteristics of the four sets of audio materials are as follows. In scene (a), natural sounds, including the sounds of flowing water, wind, and birdsong, were recorded. In scene (b), agricultural sounds, such as the sounds of tilling the land, picking tea, and shouts from residents, were captured. In scene (c), living sounds, including conversations, market calls, and traffic noises generated by residents during their daily activities, were recorded. In scene (d), the sounds characteristic of Hakka folklore were captured, including the sounds of Hakka mountain folk songs, lion and dragon dance performances, and traditional lighting ceremonies. These recordings reflect the rich and diverse soundscapes of traditional villages in the Hakka region.
In the audio-visual interaction experiments of this study, we paired the four recorded sounds with the four sets of visual images, resulting in sixteen audio-visual combinations. Among these, four matched audio-visual combinations were found to accurately represent authentic audio-visual scenes from specific locations and were thus used as the baseline for this study, with comparisons made against other mismatched virtual audio-visual combinations.
2.3. Survey Questionnaire Design
In designing the audio-visual interaction evaluation scale, we focused on the interaction effects of audio-visual factors on the basis of preliminary field survey results and the relevant literature [15,52,53,54,55] and established three evaluation indicators. Additionally, we referred to the data collection and reporting requirements for soundscapes outlined in the ISO 12913-2 standard [20], as well as related psychological studies [56,57,58], to optimise the standardised questionnaire in accordance with the research requirements and the characteristics of the villages. After a selection process, the final questionnaire identified three evaluation indicators, namely, “visual landscape satisfaction”, “acoustic comfort“, and “audio-visual harmony”. We subsequently validated these indicators through focus group discussions, pilot experiments, and expert consultations. The results demonstrated that these three indicators reliably assess the participants’ perceptions and evaluations of audio-visual interactions in traditional villages in the Hakka region of Guangdong.
The survey questionnaire used in this study utilised the semantic differential method to measure the participants’ psychological responses through verbal scales [59]. The participants evaluated various items on the basis of pairs of opposite descriptive terms and rated them on a Likert scale with several levels in between. A Likert scale typically ranges between five points or seven points, with the evaluation results helping to clarify the meaning and intensity of a participants’ perceptions of objects or concepts at different levels. In the audio-visual interaction experiments conducted herein, a 7-point Likert scale was used to ensure that the participants could clearly express their feelings. Each participant was asked to choose a value ranging from one to seven on the basis of their actual feelings in different scenes, where one point represented that the visual landscape satisfaction was exceptionally shallow (the acoustic comfort was extraordinarily uncomfortable or the audio-visual harmony was extremely poor) and seven points represented that the visual landscape satisfaction was exceptionally high (the acoustic comfort was extraordinarily comfortable or the audio-visual harmony was extremely perfect).
2.4. Experimental Process
The audio-visual interaction experiment was conducted in an audio-visual laboratory measuring 7.0 m in length, 4.8 m in width, and 4.5 m in height. The experimental setup was divided into two parts, namely, the laboratory area and the viewing area, as shown in Figure 6. The visual materials were displayed on an 80-inch display screen before the participants. Each participant was instructed to wear wired headphones (manufacture: Sennheiser, model: MOMENTUM 4, location: Wedemark, Germany) while seated at the position of the participant seat, which was located 2 m away from the display screen, as depicted in Figure 7.
Careful attention was given to the audio playback setup using wired headphones in the experiments. Combinations of Artemis SUITE 9.0 software, an ear simulator (manufacturer: Hangzhou Aihua, model: AWA6160, location: Hangzhou, China), and a bass compensation system (a subwoofer, manufacturer: Genelec, model: 7370A, location: Iisalmi, Finland) were used to calibrate the sound pressure level and frequency response. The frequency response calibration followed the guidelines of IEC 60268-4 [60] and IEC 60268-7 [61]. The sound pressure level of the audio playback was adjusted to approximately 55 dB (A). We used bass compensation during playback to address the limitations of closed headphones regarding low-frequency response. The bass compensation system uses a low-pass filter with a cut-off frequency of 125 Hz [62]. The headphones were wired and played audio materials simultaneously with the wired subwoofer. The audio-visual laboratory was kept quiet throughout the experiment, with background noise kept below 30 dB (A). The laboratory’s noise floor was measured via a sound-level metre (manufacturer: Brüel & Kjær, model: BK2240, location: Nærum, Denmark). The specific details of all the instruments used in this study are shown in Table 3.
Before conducting the experiments, we conducted detailed communication with potential participants. Participant selection was based on their familiarity with the Hakka culture to ensure that they had a certain level of cultural awareness. Additionally, none of the participants had previously participated in similar audio-visual interaction experiments, which helped eliminate potential biases or prior experiences that could affect the results. The selection of participants also considered the distribution of demographic factors to ensure the representativeness of the sample, covering a diverse range of characteristics such as gender, age, educational background, ethnic background, and social status. The statistical results of the participants’ personal information are shown in Table 4. All potential participants underwent basic vision and hearing tests to ensure that they could effectively express their emotions and perceptions. The vision testing process was carried out in reference to ISO 18490 [63], and the hearing testing process was carried out in reference to ISO 8253-3 [64]; each participant’s vision and hearing were found to fall within normal ranges.
The experimental process is shown in Figure 8. To minimise the effect of fatigue on the accuracy of the assessment, we spread the experiment out over the course of four days. The participants could extend the duration if they felt unable to participate continuously. From the day before the experiment until its completion, the participants were instructed to refrain from vigorous physical activity, alcohol consumption, or any stimulating behaviours, ensuring that they had sufficient sleep. Before starting, we allowed them to acclimate to the audio-visual laboratory, including a brief training session to familiarise them with the experimental process and the evaluation scale. In the 5 min preceding the experiment, the participants could further familiarise themselves with the setup. Additionally, we shared basic information about Hakka culture and traditional villages in the Hakka region of Guangdong to enhance the participants’ contextual understanding. Moreover, the research team ensured that the visual materials were adequately calibrated and that the audio materials were adjusted for accurate playback to maintain consistency throughout the experiment.
Considering the repeated measures design of this study, we ensured that the order of the audio-visual materials presented to each participant was random to control for order effects. At the beginning of the experiment, four sets of purely visual materials were randomly presented, with each set being displayed for 1 min (the four images within the same space category were shown consecutively). After each set of images, the participants were given 2 min to evaluate their level of visual landscape satisfaction with the images they had just seen. A 2 min interval was used to minimise potential interference from the previously shown images. This process continued until all four sets of purely visual materials had been evaluated. Afterwards, the participants took a 30 min break. Once the break was completed, the experiment resumed with four sets of purely audio materials presented randomly. Each audio set was played for 1 min while a black screen was shown. After each audio set, the participants had 2 min to evaluate their acoustic comfort level related to the purely audio materials they had just heard. A 2 min interval followed after each evaluation to minimise interference from the previous purely audio set. This process continued until all four sets of purely audio materials had been evaluated. After a one-day break, participants were randomly presented with all 16 audio-visual combinations (the first set of 6 combinations, the second set of 6, and the third set of 4, with a one-day break between each set). Each audio-visual combination was played for 1 min, followed by a 2 min evaluation period during which they provided their level of visual landscape satisfaction, acoustic comfort, and audio-visual harmony of the combination they had just viewed and heard. After each evaluation, a 2 min interval was provided, and a new combination was subsequently played. This process continued until all 16 audio-visual combinations were evaluated; the experiment concluded once all the materials had been assessed.
In selecting the exposure time for the audio-visual materials, we referenced other audio-visual interaction experiments, where most studies used exposure times ranging from 6 s to 60 s. Typically, the exposure time for static images is less than 30 s, whereas videos are played for 30–60 s [43,65]. Furthermore, the participants in the pre experiments generally felt that a 60s exposure time was more conducive to accurate perception. On this basis, we designed the exposure time for each audio-visual material to be 60 s, followed by a 2 min evaluation period and a 2 min interval. The interval was intended to minimise temporary memory interference from the previously presented material. This design aimed to ensure the scientific rigor and accuracy of the experiment while enhancing the effectiveness of the participants’ evaluations.
3. Results
3.1. Evaluation Characteristics
3.1.1. Evaluation of Visual Landscape Satisfaction
The participants’ visual landscape satisfaction ratings (mean values and standard deviations) for the different audio-visual materials are presented in Table 5. The values in parentheses represent standard deviations that remain consistent throughout the following tables.
For purely visual materials, we analysed the correlation between the proportions of the four visual elements and the participants’ visual landscape satisfaction ratings. The correlation coefficients and significance levels are shown in Table 6. Visual landscape satisfaction was found to be significantly correlated with the proportions of all four visual elements. Specifically, it was positively correlated with the proportion of sky (r = 0.241, p < 0.01) and vegetation (r = 0.258, p < 0.01). In contrast, visual landscape satisfaction was negatively correlated with the proportions of constructions (r = −0.267, p < 0.01) and dynamic (r = −0.142, p < 0.05). This finding suggests that a more significant proportion of sky and vegetation in the visual materials was associated with higher visual landscape satisfaction ratings. As shown in Table 2, the proportion of sky in the visual materials used in this study ranged from 2.79% to 24.54%, whereas the proportion of vegetation ranged from 0.46% to 72.46%. As the proportions of these two visual elements increased, the participants’ visual landscape satisfaction ratings improved accordingly. However, varying degrees of correlation were found to exist between the proportions of different visual elements, indicating that relying solely on the proportions of visual elements does not fully capture the participants’ visual landscape satisfaction ratings. Therefore, it is essential to incorporate audio factors to assess the participants’ visual evaluation characteristics comprehensively.
After the addition of various sounds, the mean visual landscape satisfaction ratings for the different types of visual materials, ranked from highest to lowest, were found to be as follows: natural space (mean = 5.55) > agricultural space (5.38) > folklore space (4.67) > living space (4.61). Specifically, the natural space with the highest proportion of sky and vegetation is reported to provide a pleasant and comfortable visual experience, leading to the highest mean visual landscape satisfaction ratings. In contrast, the mean visual landscape satisfaction ratings for the living and folklore spaces are relatively low, likely due to the lack of appealing sky and vegetation in these two spaces, which negatively affects the participants’ reported levels of visual perception and evaluation.
Among the various audio-visual combinations, the combination of natural space and natural sound received the highest mean visual landscape satisfaction rating (5.94), followed by the combination of agricultural space and natural sound (5.76). In contrast, combining living space with folklore sounds (4.35) and folklore space with living sounds (4.39) received the lowest mean visual landscape satisfaction ratings.
Additionally, the standard deviation of visual landscape satisfaction ratings ranged from 0.84 to 1.26, indicating a considerable variation in the participants’ ratings of natural spaces (with a standard deviation ranging from 0.84 to 1.25) and agricultural spaces (with a standard deviation ranging from 0.85 to 1.26). In contrast, the participants’ evaluations of living spaces (with a standard deviation ranging from 1.08 to 1.24) and folklore spaces (with a standard deviation ranging from 1.03 to 1.17) were more consistent, with a more significant agreement overall.
3.1.2. Evaluation of Acoustic Comfort
The participants’ acoustic comfort ratings (mean value and standard deviation) for the different audio-visual materials are shown in Table 7. After various visual materials were added, the mean acoustic comfort ratings for the different sounds, ranked from highest to lowest, were found to be as follows: natural sounds (5.55) > agricultural sounds (4.50) > living sounds (4.17) > folklore sounds (3.92). Specifically, the participants rated natural sounds as the highest acoustic comfort rating, with a mean score of 5.55. Natural sound sources, such as wind, flowing water, and birdsong, are typically soft and soothing, which likely contributed to the higher acoustic comfort ratings. In contrast, the folklore sound received the lowest acoustic comfort rating. The sources of the folklore sounds tend to be monotonous or repetitive, often consisting of loud, festive noises that are more likely to be perceived as background noise or disturbances, making it difficult to establish a comfortable acoustic environment.
Among the various audio-visual combinations, the combination of natural space and natural sound received the highest mean acoustic comfort rating (5.93), followed by the combination of agricultural space and natural sound (5.75). In contrast, the combination of living space with folklore sound (3.79) and natural space with folklore sound (3.80) received the lowest mean acoustic comfort ratings.
Additionally, the standard deviation of acoustic comfort ratings ranged from 0.84 to 1.26. The combination of natural space and natural sound (mean = 5.93, SD = 0.98) presented the highest mean rating and the lowest standard deviation. In contrast, the purely folklore sound material presented a relatively large standard deviation (mean = 3.88, SD = 1.85), as did the combinations involving folklore sound (mean = 3.92, SD = 1.48), indicating a more significant variability in the acoustic comfort ratings. This variability may be attributed to perceptual differences among participants from diverse cultural backgrounds.
3.1.3. Evaluation of Audio-Visual Harmony
The participants’ audio-visual harmony ratings (mean value and standard deviation) for the different audio-visual materials are shown in Table 8. Among the various audio-visual combinations, agricultural spaces and agricultural sounds (mean = 6.09) received the highest mean audio-visual harmony rating. In contrast, the combination of folklore spaces and natural sounds (2.56) received the lowest rating, with the combination of folklore spaces and agricultural sounds (2.59) also having a relatively low score. As shown in Table 8, the mean audio-visual harmony ratings for different visual images combined with various sounds, ranked from highest to lowest, are as follows: agricultural spaces (4.47) > natural spaces (4.10) > living spaces (3.97) > folklore spaces (3.50). This suggests that agricultural spaces, which have a high proportion of sky (18.78%) and vegetation (66.47%), along with a smaller proportion of construction (11.86%) and dynamic (2.73%), are more easily coordinated with various sounds in traditional villages. Furthermore, the mean audio-visual harmony ratings for different sounds combined with various visual images, ranked from highest to lowest, were found to be as follows: natural sounds (4.44) > agricultural sounds (4.03) > living sounds (3.94) > folklore sounds (3.63). This finding indicates that natural sounds, with their regional and authentic characteristics, integrate more seamlessly into the overall atmosphere of traditional villages when combined with a diverse visual landscape.
Additionally, the standard deviation of the audio-visual harmony ratings ranged from 1.00 to 2.03. Specifically, the combination of agricultural space and agricultural sounds (mean = 6.09, SD = 1.00) presented the lowest standard deviation, whereas the combinations involving folklore sounds presented the highest standard deviation (mean = 3.63, SD = 2.03). Similar to the acoustic comfort ratings, there was a general trend, i.e., the lower the mean audio-visual harmony rating was, the greater the standard deviation was. Furthermore, the range of standard deviations for audio-visual harmony ratings was notably greater than that for the previous two evaluation indicators, suggesting a greater degree of variability in participants’ evaluations of audio-visual harmony.
Figure 9 analyses the mean audio-visual harmony ratings for each audio-visual combination from the perspectives of different spaces and sounds. As shown in Figure 9a, natural sounds and agricultural sounds exhibit significant variability in audio-visual harmony ratings when combined with different types of visual materials (the differences between the maximum and minimum values were found to be 3.2 and 3.5, respectively); this indicates that these two sounds cause substantial changes in audio-visual perception across various visual landscapes. In contrast, the changes in audio-visual harmony ratings for living sounds and folklore sounds were found to be relatively minor (the maximum and minimum values were found to be 1.91 and 2.88, respectively). Moreover, the trends in audio-visual harmony ratings for natural sounds and folklore sounds were found to follow opposing patterns. As the audio-visual harmony rating for natural sounds increased, the rating for folklore sounds decreased, and vice versa. These trends may reflect differences in the coordination between the soundscape and the visual materials with varying proportions of visual elements. Natural sounds may be more easily harmonised with the visual landscape featuring a higher proportion of sky and vegetation.
In contrast, the sound of folklore may be better suited to spaces with rich cultural ambiance and regional characteristics. Thus, the combination of different sounds and visual materials within specific spaces may lead to distinct audio-visual harmony trends. Figure 9b further illustrates that natural and agricultural spaces, with higher proportions of sky and vegetation, show considerable changes in audio-visual harmony when combined with various sounds (the maximum and minimum values were found to be 3.1 and 3.23, respectively). These spaces demonstrate greater adaptability but are also more susceptible to significant fluctuations depending on the type of sound. In contrast, audio-visual harmony in living spaces and folklore spaces, which have higher proportions of construction and dynamic elements, shows more minor variations (the maximum and minimum values were found to be 2.05 and 2.98, respectively). Owing to their distinct spatial characteristics, these spaces maintain relatively stable audio-visual harmony, even when exposed to different types of sounds.
3.1.4. Correlations Between the Three Types of Evaluation Indicators
We analysed the correlations between the three evaluation indicators for each audio-visual combination. The results showed that all pairwise correlations between the indicators were significantly positive. Among them, the correlation between acoustic comfort and audio-visual harmony was found to be the strongest (r = 0.513, p < 0.01), followed by the correlation between visual landscape satisfaction and acoustic comfort (r = 0.473, p < 0.01). The weakest correlation was found between visual landscape satisfaction and audio-visual harmony (r = 0.390, p < 0.01).
3.2. Influence of Audio Factors on Visual Landscape Satisfaction and Audio-Visual Harmony
Figure 10 analyses the mean visual landscape satisfaction ratings for each audio-visual combination from the perspectives of different spaces and sounds. As shown in Figure 10, compared with the various audio-visual combinations, the trend in visual landscape satisfaction ratings for purely visual materials remained relatively stable, with ratings generally concentrated in most spatial categories and without significant fluctuations. This suggests that the participants’ visual landscape satisfaction ratings for purely visual materials tended to stabilise in the absence of audio factors, showing no significant differences based on the visual element characteristics or different proportions of elements.
In contrast, after adding different sounds, the changes in visual landscape satisfaction across various spatial types became more pronounced, indicating that audio factors significantly mediate visual perception. Moreover, there were notable differences found in the perception and evaluation of visual landscape satisfaction based on the type of sound. Compared with the addition of purely visual materials, the addition of natural sounds resulted in a significant change in the participants’ visual landscape satisfaction ratings. Overall, the addition of natural sounds generally improved visual landscape satisfaction across all spatial categories, with an average rating increase of 0.16 points. Specifically, in natural, agricultural, and living spaces, the mean visual landscape satisfaction scores increased by 0.36, 0.32, and 0.32 points, respectively, after adding natural sounds. However, in the folklore space, the mean rating decreased by 0.35 points after adding natural sounds, indicating a different trend. The average visual landscape satisfaction ratings across the four spatial types decreased by 0.11 points when agricultural sounds were added. The addition of living sounds led to a decrease of 0.24 points, and the addition of folklore sounds had the most pronounced negative effect, with the average visual landscape satisfaction rating decreasing by 0.35 points. Natural sounds were found to have a significantly positive impact on visual landscape satisfaction, particularly in scenes closely associated with natural space. In contrast, human-made sounds, such as agricultural sounds, living sounds, and folklore sounds, were found to have a negative impact on overall visual landscape satisfaction. These results suggest that audio factors play a crucial role in moderating and influencing visual landscape perception and that their effects vary depending on the type of sound.
This study compared the visual landscape satisfaction scores of matched and mismatched audio-visual combinations via a paired-sample t test to investigate the effects of different audio materials on the visual landscape satisfaction scores of audio-visual combinations. The results are presented in Table 9.
In comparisons showing significant differences, when the visual material consisted of natural spaces, the mean visual landscape satisfaction ratings for the addition of agricultural sounds, living sounds, folklore sounds, and purely visual material were 0.25 (p < 0.05), 0.5 (p < 0.01), 0.81 (p < 0.001), and 0.36 (p < 0.01) points lower, respectively, than those for the addition of matched natural sounds. When the visual material consisted of agricultural spaces, the mean visual landscape satisfaction ratings for the addition of living sounds and folklore sounds were 0.29 (p < 0.01) and 0.68 (p < 0.001) points lower, respectively, than those for the addition of matched agricultural sounds. When the visual material consisted of living spaces, the mean visual landscape satisfaction rating for the addition folklore sounds was 0.31 (p < 0.01) points lower than that for the addition of matched living sounds. In contrast, adding natural sounds increased the mean rating by 0.34 (p < 0.01) points. When the visual material consisted of folklore spaces, the mean visual landscape satisfaction ratings for the addition of natural sounds, agricultural sounds, and living sounds were 0.27 (p < 0.05), 0.37 (p < 0.01) and 0.58 (p < 0.001) points lower, respectively, than those for the addition of matched folklore sounds.
The above results suggest that the participants’ evaluation of visual landscape satisfaction was influenced by the degree of matching between visual and audio materials. The extent to which the audio material was harmonised with the visual landscape affected the participants’ visual landscape satisfaction. This effect was particularly pronounced in culturally specific scenes. In contrast, mismatched sounds were more likely to disrupt the participants’ visual perception and evaluation, leading to a decline in visual landscape satisfaction. These findings further highlight the importance of coordination between audio-visual materials in enhancing the visual landscape experience.
Table 8 shows that the mean audio-visual harmony evaluation for the combination of natural sounds with their matched visual materials in natural spaces was 1.76 points higher on average than that for the combinations of natural sounds with the other three mismatched visual materials. Similarly, the mean audio-visual harmony evaluation for the combination of agricultural sounds with their corresponding visual materials in the agricultural space was 2.74 points higher on average than that for the other three mismatched combinations. The mean audio-visual harmony evaluation was 1.68 points higher on average than those of the other three mismatched combinations for the combination of living sounds with their matched visual living space materials. Similarly, the mean audio-visual harmony evaluation for the combination of the folklore sound with its matched visual materials in the folklore space was 2.55 points higher on average than that for the other three mismatched combinations. These results suggest that audio factors should be matched with corresponding visual materials that align with their audio characteristics, significantly enhancing people’s visual harmony evaluation.
3.3. Influence of Visual Factors on Acoustic Comfort and Audio-Visual Harmony
Figure 11 illustrates the variation in the mean acoustic comfort ratings for different audio-visual combinations from the perspectives of different spaces and sounds. Among the purely audio materials, natural sounds received the highest mean acoustic comfort ratings, whereas the ratings for the other three types of human-made sounds remained relatively stable. These findings suggest that the types of sounds had less influence on the participants’ perceptions of the soundscape in the villages and that they exhibited a relatively high degree of adaptability to the four types of sounds. In contrast, after adding different visual materials, the changes in acoustic comfort ratings became more pronounced, suggesting that visual factors also influenced the perception of the soundscape.
As shown in Table 7, compared with the mean acoustic comfort ratings for purely natural sounds, the overall acoustic comfort rating for natural sounds decreased by an average of 0.29 points after the addition of visual materials. In contrast, after adding visual materials, the acoustic comfort ratings for agricultural sounds increased by an average of 0.15 points, those for living sounds increased by 0.42 points, and those for folklore sounds increased by 0.04 points. Although the average acoustic comfort rating for the four purely audio materials (mean = 4.46) was similar to the average rating for the 16 audio-visual combinations (4.53), natural sounds, which had the highest acoustic comfort ratings under purely audio conditions, showed a significant decrease in ratings after the addition of visual materials. Moreover, agricultural sounds, living sounds, and folklore sounds, which had lower acoustic comfort ratings under purely audio conditions, all showed varying degrees of improvement after the addition of visual materials. These findings suggest that visual factors, which act as a moderating influence, may help to alleviate a participants’ negative perceptions of noise or other undesirable sound sources, making the overall audio experience more comfortable and thus enhancing the acoustic comfort ratings. The results also highlight the potential role of audio-visual interactions in improving soundscape perception.
This study compared the acoustic comfort scores of matched and mismatched audio-visual combinations via a paired-sample t test to investigate the effects of different visual materials on the acoustic comfort scores of audio-visual combinations. The results are presented in Table 10.
In comparisons showing significant differences, when the audio material consisted of natural sounds, the mean acoustic comfort ratings for the addition of living space and folklore space images were 0.57 (p < 0.001) and 0.79 (p < 0.001) points lower, respectively, than those for the addition of matched natural spaces images. When the audio material consisted of agricultural sounds, the mean acoustic comfort ratings for the addition of natural space, living space, folklore space images, and purely audio material were 0.55 (p < 0.001), 1.3 (p < 0.001), 1.13 (p < 0.001), and 0.89 (p < 0.001) points lower, respectively, than those for the addition of matched agricultural spaces images. When the audio material consisted of living sounds, the mean acoustic comfort ratings for the addition of agricultural space, folklore space images, and purely audio material were 0.35 (p < 0.05), 0.51 (p < 0.001), and 0.66 (p < 0.001) points lower, respectively, than those for the addition of matched living space images. When the audio material consisted of folklore sounds, the mean acoustic comfort ratings for the addition of natural space, agricultural space, and living space images were 0.45 (p < 0.05), 0.41 (p < 0.05), and 0.46 (p < 0.01) points lower, respectively, than those for the addition of matched folklore space images.
The above results indicate that the acoustic comfort ratings for audio-visual combinations with matched visual materials were generally higher than those for mismatched audio-visual combinations, particularly in the cases of agricultural sounds and living sounds. This finding suggests that the coordination between visual and audio factors significantly enhances the audio-visual experience.
As shown in Table 8, the mean evaluation of audio-visual harmony for matched audio-visual combinations was found to be significantly greater than that for mismatched combinations. Specifically, when the proportion characteristics of each visual element in the visual materials aligned closely with the characteristics of the audio materials, it was found to enhance the overall audio-visual harmony. In contrast, when the visual and audio materials did not match, the mean evaluation of audio-visual harmony significantly decreased, further emphasising the importance of coordination between visual and audio materials in improving overall audio-visual perception. Further analysis revealed that when the proportion of sky in the images decreased from 24.54% to 2.79%, the proportion of vegetation decreased from 72.46% to 0.46%, the proportion of construction increased from 2.15% to 15.71%, or the proportion of dynamic increased from 0.85% to 15.71%, the mean evaluation of audio-visual harmony significantly decreased.
4. Discussions
Research on audio-visual interactions in traditional villages remains scarce in the academic literature. Therefore, in the following, we compare the findings of this study with those of previous studies, with a focus on discussing the factors that influence the evaluations of soundscapes in traditional villages, as well as the results of audio-visual interaction experiments.
4.1. Comparison of Factors That Influence the Soundscape Evaluation in Traditional Villages
The results of this study indicate that in traditional villages in the Hakka region of Guangdong Province, individuals generally rate natural sounds highly, whereas agricultural, living, and folklore sounds are evaluated less favourably. This finding aligns with those of previous studies. For example, a survey conducted regarding the soundscape of Zhuquan village in Fujian Province revealed that participants preferred natural sounds, particularly those dominated by geophonies and biophonies, such as birdsong and flowing water [40]. Similarly, a study on the soundscape of Azhake village revealed that visitors are more inclined to favour sounds associated with natural spaces, believing that such sounds evoke positive, pleasant, or calm emotions. In contrast, they tend to avoid human-made and living sounds [35].
Furthermore, a survey on the soundscape of Zhaoxing County revealed that respondents favoured regionally distinctive sounds, such as the grand song of the Dong minority, the sounds of reeds, and natural sounds. The acoustic comfort level of natural sounds has been shown to have a parabolic relationship with the equivalent continuous A-weighted sound level, with a sound pressure threshold of 56 dB (A), suggesting that excessively high natural sound pressure levels could cause auditory discomfort for respondents [36]. A study of the soundscape in a traditional village in Zhejiang Province revealed that natural sounds, as indigenous sound sources, play a dominant role in the village. The respondents reported frequently perceiving natural sounds, which they preferred over human-made sounds, and noted that their lowest preference was for traffic noise [34]. These studies indicate that natural sounds are central to soundscape experiences in traditional villages and play a significant role in shaping the atmosphere and improving quality of life. The consistent preference for natural sounds, particularly in traditional villages with rich natural environments, highlights the universal importance of such sounds in soundscape experiences. In contrast, human-made sounds, such as those associated with daily life, tend to lead to negative soundscape experiences, contributing to discomfort and negatively impacting overall soundscape evaluations.
Dynamic soundscapes offer an alternative perspective for soundscape research. Brigitte et al. [66] proposed that soundscapes can be considered dynamic systems, where variations in sound significantly affect individuals’ perceptions. Yan et al. [67] further emphasised that dynamic natural soundscapes not only create more immersive experiences but also help individuals to relax, positively influencing their emotions and perceptions. In traditional villages, owing to the lower levels of traffic noise, the dynamic variation in natural sounds not only enhances the sense of tranquillity but also has a positive effect on overall well-being and emotional regulation [68]. This finding aligns with the results of the present study, particularly regarding the positive influence of natural sounds on emotional well-being and comfort. Therefore, carefully considering natural sounds in traditional villages can serve as a key element in the optimisation of their soundscape.
4.2. Comparison with the Results of Audio-Visual Interaction Experiments
Given the limited research on audio-visual interactions in traditional villages in the existing academic literature, this study also provides a comparative analysis of findings concerning the results of other audio-visual interaction experiments.
In terms of the impact of visual factors on audio evaluation, the results of this study suggest that visual factors may play a neutralising role in audio perception, influencing people’s soundscape evaluation. Specifically, the mean acoustic comfort rating for purely natural sounds was found to be relatively high among the purely audio materials, but it decreased when these materials were combined with visual materials. In contrast, the three types of human-made sounds, which had lower acoustic comfort ratings under purely audio conditions, were found to increase their average ratings when combined with visual materials. This finding is consistent with previous research, which indicates that in urban spaces, people’s soundscape preferences are indirectly influenced by their visual landscape preferences. The visual landscape captures attention and diverts focus from harmful sounds, thereby enhancing the perceived preference for the soundscape [69].
Furthermore, herein, in traditional villages, audio-visual combinations with matched audio-visual materials were found to receive significantly higher acoustic comfort evaluations than mismatched combinations did. This outcome contrasts slightly with those of previous studies. For example, Xu et al. [70] reported that in protected areas of China, the acoustic evaluation of audio-visual combinations is significantly greater than that of purely audio materials, regardless of whether the audio-visual materials are matched. This may be due to the diversity and ecological characteristics of the visual and audio environments in protected areas, leading to higher audio evaluations, even when the audio-visual materials do not match. In contrast, in traditional villages in the Hakka region of Guangdong, matched audio-visual combinations are found to more likely create a harmonious village environment, thereby increasing people’s evaluation of acoustic comfort and better reflecting the interaction between the visual landscape and the soundscape.
Herein, this study also analysed the impact of audio factors on visual evaluation. The results indicate that different sounds significantly influence visual landscape satisfaction ratings. Specifically, adding natural sounds to various visual materials was found to increase the average visual landscape satisfaction score by 0.16 points compared with the addition of purely visual materials. However, when agricultural sounds, living sounds, and folklore sounds were added, the average visual landscape satisfaction scores were found to decrease by 0.11, 0.24, and 0.35 points, respectively. This finding is consistent with previous research. Ren et al. [48] noted that people’s evaluation of visual landscapes is significantly influenced by audio factors, with human-made sounds generally receiving lower ratings than natural sounds or music, which often contain higher proportions of natural elements. Owing to their harmonious frequency, rich timbres, and varied rhythms, natural sounds evoke calming and pleasurable sensations, enhancing positive evaluations of the visual landscape. In contrast, human-made sounds (such as agricultural, living, or folklore sounds) are often complex and monotonous and are frequently accompanied by disharmonious tones and harsh noises. As a result, these human-made sounds tend to hinder visual evaluations and may even cause psychological discomfort, leading to lower overall perceptions.
Furthermore, regarding the impact of different degrees of audio-visual combination matching on visual evaluation, the results of this study indicate that audio-visual combinations with matched visual and audio materials lead to significantly higher visual landscape satisfaction ratings than mismatched combinations. Specifically, in the four matched audio-visual combinations (i.e., natural spaces and natural sounds, agricultural spaces and agricultural sounds, living spaces and living sounds, and folklore spaces and folklore sounds), the mean visual landscape satisfaction ratings were found to be higher on average by 0.52, 0.26, 0.07, and 0.41 points, respectively, than the mean ratings of the other three mismatched sound combinations were (as shown in Table 5). This finding is consistent with previous research. In a series of laboratory experiments, Carles et al. [71] demonstrated that sound–image congruence significantly influences people’s preference for visual landscapes.
In contrast, herein, mismatched audio-visual combinations were found to significantly decrease visual landscape satisfaction ratings. This may be because when visual images and sounds do not match, effective audio-visual interactions cannot be established, leading to a disruption in overall perception, which in turn affects visual landscape satisfaction ratings. Kang [72] noted that when a visual scene and sounds are unrelated, a positive audio-visual interaction cannot be formed, which makes it difficult for participants to form a sense of involvement; this weakens their sense of immersion and comfort, which ultimately leads to a decline in overall perception and evaluation. Ren et al. [48] noted that in wetland parks, when landscape elements do not match sounds, the participants’ comfort level decreases, indicating that the mismatch between landscape elements and sounds significantly affects people’s perception and comfort levels. Brambilla and Maffei [73] also reported that in quiet rural areas, a decrease in the consistency between the surrounding environment and where sounds are heard by participants leads to a decline in satisfaction ratings. The findings of this study are consistent with these findings, further emphasising the crucial role of audio-visual coordination in the perception experience. Mismatched audio-visual combinations not only fail to facilitate good audio-visual interaction but also struggle to evoke a sense of involvement, significantly impacting people’s visual perceptions and leading to a decrease in visual landscape satisfaction ratings. Therefore, the harmony between the visual landscape and the soundscape plays a critical role in the perception and evaluation of traditional villages.
In conclusion, there is an interaction between visual landscapes and soundscapes; thus, matched audio-visual combinations can significantly enhance the participants’ perceptual experience and evaluation. In the context of traditional villages, the coordination between the visual landscape and soundscape notably improves visual landscape satisfaction and acoustic comfort ratings, highlighting the importance of audio-visual coordination in shaping the environmental perception of traditional villages in the Hakka region of Guangdong. The results suggest that in traditional villages, vision and hearing are not independent sensory experiences but rather they constitute a holistic perception formed through the interaction of both senses. The visual landscapes and soundscapes of traditional villages need to be considered as a whole, as their coordination can optimise the audio-visual experience for both residents and visitors, thus creating a healthier and more comfortable environment.
5. Conclusions
This study utilised audio-visual interaction experiments and survey questionnaires, with levels of visual landscape satisfaction, acoustic comfort, and audio-visual harmony serving as evaluation indicators, to explore the evaluation characteristics of the visual landscape and soundscape in traditional villages in the Hakka region of Guangdong Province. The study also explored the mechanisms of audio-visual interactions. The key conclusions of the research are as follows:
Evaluation differences across audio-visual materials: Significant differences were observed in the evaluations of visual landscape satisfaction, acoustic comfort, and audio-visual harmony across different audio-visual materials drawn from traditional villages in the Hakka region of Guangdong Province. Among these, the highest proportions of sky (24.54%) and vegetation (72.56%) in the visual images from natural spaces, when combined with natural sounds—such as the sounds of wind, flowing water, and birdsong (sound pressure level of approximately 55 dB)—resulted in the highest ratings for visual landscape satisfaction (mean = 5.94) and acoustic comfort (mean = 5.93). Furthermore, a significant positive correlation was found between the three evaluation indicators, with the correlation coefficients ranging from 0.390 to 0.513.
Impact of audio factors on visual landscape satisfaction: The results indicated that different sounds lead to significant differences in visual landscape evaluation. Specifically, adding natural sounds enhances an individuals’ visual landscape satisfaction, whereas adding human-made sounds (such as agricultural, living, or folklore sounds) tends to decrease visual evaluation. Additionally, the impact of the same sound was found to vary when different visual materials were combined. People’s visual evaluations were found, to some extent, to depend on the degree of alignment between visual images and sounds. When both audio and visual materials were matched, they significantly improved individuals’ visual perceptions.
Impact of visual factors on acoustic comfort: The study also revealed that natural sounds, which received higher ratings under purely audio conditions, experienced a decrease in acoustic comfort level when combined with visual materials. In contrast, human-made sounds, which had lower ratings under purely audio conditions, increased acoustic comfort levels when combined with visual materials. This finding suggests that the visual landscape is adaptive in modulating the evaluation of acoustic comfort. Additionally, the impact on audio evaluation was found to vary when the same visual material was paired with different sounds. Audio-visual combinations that were well matched resulted in significantly higher average acoustic comfort ratings than mismatched combinations, reflecting a similar influence of audio factors on visual evaluation.
Evaluation of audio-visual harmony: Regarding audio-visual harmony, the combination of visual materials with a high proportion of sky (18.78%) and vegetation (66.47%), alongside a moderate amount of construction (11.86%) and dynamic (2.73%) aspects in the images of agricultural spaces, complemented by soft and pleasant natural sounds, was generally considered the most harmonious with the village environment. Furthermore, in this study, when the proportion of sky in the visual materials ranged from 18.78% to 24.54% and the proportion of vegetation ranged from 66.47% to 72.46%, these audio-visual combinations tended to achieve higher ratings for audio-visual harmony.
Thus, in regard to spatial planning and landscape design, landscape architects must first conduct a detailed analysis of the different spaces within traditional villages to identify the spatial characteristics and functional roles of each area. Designers should consider the visual characteristics of these spaces, such as the range of sight and the distribution of visual elements (e.g., sky, vegetation, construction, etc.), to achieve coordination between the visual landscape and the soundscape.
For example, in spaces with Hakka cultural characteristics, such as living or folklore spaces, visual landscapes are often dominated by construction and dynamic elements. On this basis, architects should introduce sounds that match the Hakka culture, such as the sounds of Hakka folk songs or performances of lion and dragon dances. The goal is to enhance the cultural atmosphere of the space, foster emotional connections between individuals and the environment, and improve the sense of immersion and comfort for both residents and visitors within the traditional village setting.
To address noise environments, designers should optimise them according to the environmental characteristics of traditional villages, particularly the distribution of noise sources. Areas with significant noise pollution in villages typically include those located near transportation routes, markets, public activity spaces, and densely populated residential areas. Owing to high levels of activity from both residents and visitors, these zones are often subjected to noise from traffic, market activities, conversations, and agricultural machinery. To enhance the auditory environment, designers can introduce natural sounds, such as birdsong or flowing water, and incorporate landscapes such as green buffers and sound barriers. Furthermore, careful spatial planning of noise sources and quiet zones should be implemented to minimise noise disturbances. In agricultural areas and historical-cultural sites where quietness is paramount, efforts should be made to reduce the level of mechanical noise and emphasise natural sounds, thereby enhancing the coordination between visual landscapes and soundscapes. Therefore, it is crucial for local governments and designers to consider audio-visual coordination in the planning of traditional villages and establish relevant standards or design guidelines.
6. Limitations
Owing to limitations related to time, knowledge, and experience, this study has certain unavoidable constraints. First, the audio-visual interaction experiments were conducted in a controlled virtual laboratory environment. However, there may be discrepancies between the audio-visual perception effects presented in the virtual simulation and those that occur in the actual village environment, particularly concerning dynamic sound variations and spatial perception. As such, the experimental results may not fully replicate the complexity of the real village audio-visual environment. Additionally, the sample size of the participants was relatively small and did not adequately represent a broader range of demographic groups. Moreover, the research was limited to a few representative traditional villages located in the Hakka region of Guangdong Province and thus did not comprehensively cover other types of traditional villages within the broader Hakka region.
Future research should consider employing more advanced methods, such as immersive virtual reality (VR) and physiological response monitoring. Additionally, subsequent studies should aim to increase the sample size and expand the geographical scope of the research by conducting field studies on the soundscapes of traditional villages of varying scales and types. Furthermore, the methodologies used in this study should be applied to traditional villages in other cultural or geographical contexts, thereby enabling comparisons of audio-visual interaction mechanisms across different regions. This process would help provide new approaches and insights for the preservation and modernisation of traditional villages.
Conceptualisation, D.Z. and H.C.; methodology, D.Z. and L.T.; software, D.Z. and H.C.; validation, D.Z., H.C. and X.Z.; formal analysis, D.Z., H.C., X.Z. and L.T.; investigation, D.Z., H.C. and X.Z.; data curation, H.C.; writing—original draft preparation, D.Z. and H.C.; writing—review and editing, D.Z., H.C. and L.T. All authors have read and agreed to the published version of the manuscript.
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
The authors would like to express their thanks to all the participants who volunteered to participate in the research and the valuable comments of editors.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 5. Images of the visual landscape of four traditional villages: (a) natural space, (b) agricultural space, (c) living space, and (d) folklore space. The images marked with black dashed lines are the most representative images from each group of visual materials used for semantic segmentation.
Figure 7. Images from the audio-visual laboratory where the participants were exposed to the audio-visual experiment.
Figure 9. Evaluation of audio-visual harmony for various audio-visual combinations in (a) different spaces and (b) different sounds.
Figure 10. Evaluation of visual landscape satisfaction for various audio-visual combinations in (a) different spaces and (b) different sounds.
Figure 11. Evaluation of acoustic comfort for various audio-visual combinations in (a) different spaces and (b) different sounds.
Basic information of the four villages as research sites.
Village | Location | Selection | Established Dynasty | Area | Topography | Notable Features |
---|---|---|---|---|---|---|
Qiaoxiang | Nankou, | 1st | Ming | 1.5 | Hilly area | Recognised as “The Most Typical Traditional Hakka Village in China”. |
Qiaoxi | Yanyang, Meixian, Meizhou | 1st | Ming | 1.8 | Valley basin | Displaying a rich diversity of Hakka dwellings. |
Huxinba | Jiangwei, | 2nd | Ming | 0.3 | Hilly area | Home to numerous Hakka dwellings and recognised as one of the most representative Hakka cultural villages. |
Huangsiyang | Duozhu, Huidong, Huizhou | 2nd | Ming | 0.2 | Hilly area | Exhibiting strong Hakka cultural characteristics, one of the prominent Hakka communities in Guangdong. |
The proportion of visual landscape elements in semantic segmentation images of each space type.
Type | Original | Semantic Segmentation | Proportion of Each Visual Element | |||
---|---|---|---|---|---|---|
Sky | Vegetation | Construction | Dynamic | |||
Natural | [Image omitted. Please see PDF.] | [Image omitted. Please see PDF.] | 24.54% | 72.46% | 2.15% | 0.85% |
Agricultural Space | [Image omitted. Please see PDF.] | [Image omitted. Please see PDF.] | 18.78% | 66.47% | 11.86% | 2.73% |
Living Space | [Image omitted. Please see PDF.] | [Image omitted. Please see PDF.] | 8.23% | 11.28% | 75.03% | 5.46% |
Folklore | [Image omitted. Please see PDF.] | [Image omitted. Please see PDF.] | 2.79% | 0.46% | 81.04% | 15.71% |
Critical specifications of instruments used in this study.
Instrument | Function | Metric | Range | Accuracy | Resolution/ | |
---|---|---|---|---|---|---|
Canon | Camera | Capture high-quality | Image resolution | 24.2 MP | ±1% | |
Lens | Photography. | Focal length | 24 mm | ±2% | f/4.5–f/6.3 | |
BSWA TECH | Record and | Frequency Response | 20 Hz | +0.1/−0.6 | ||
Sennheiser | Play audio materials | Frequency response | 6 Hz | 106 dB SPL | ||
Hangzhou Aihua | Calibrate the sound pressure level and frequency response of the laboratory. | Frequency response | 20 Hz | ±1 dB | ||
Genelec | Play with headphones to address the limitations of closed headphones regarding low-frequency response. | Frequency response | 19 Hz | ±3 dB | ||
Brüel & Kjær BK2240 | Measure the noise floor of the laboratory. | Sound pressure level | 30 dB | 31.6 mV/Pa |
Participant demographic information of the questionnaire database, N = 80.
Variable | Category | Sample Size | Proportion (%) |
---|---|---|---|
Gender | Male | 36 | 45 |
Female | 44 | 55 | |
Age | ≤24 | 48 | 60 |
25–34 | 24 | 30 | |
≥35 | 8 | 10 | |
Educational | High School | 6 | 7.5 |
University | 38 | 47.5 | |
Postgraduate | 36 | 45 | |
Ethnic Background | Hakka | 14 | 17.5 |
Non-Hakka | 66 | 82.5 | |
Social Status | Exclusively | 48 | 60 |
Exclusively | 32 | 40 |
Mean values (standard deviations) of the visual landscape satisfaction ratings for different audio-visual combinations.
Category | Natural | Agricultural | Living | Folklore | |
---|---|---|---|---|---|
Evaluation Values of Audio-visual Combinations | Natural Sound | 5.94 (±0.99) | 5.76 (±0.95) | 5.00 (±1.17) | 4.70 (±1.17) |
Agricultural Sound | 5.69 (±0.84) | 5.58 (±0.85) | 4.43 (±1.24) | 4.60 (±1.14) | |
Living Sound | 5.44 (±1.00) | 5.29 (±0.93) | 4.66 (±1.09) | 4.39 (±1.16) | |
Folklore Sound | 5.13 (±1.25) | 4.90 (±1.26) | 4.35 (±1.08) | 4.97 (±1.03) | |
Mean Value of Audio-visual Combinations | 5.55 (±1.07) | 5.38 (±1.06) | 4.61 (±1.17) | 4.67 (±1.14) | |
Purely Visual Material | 5.58 (±1.24) | 5.44 (±1.04) | 4.68 (±1.19) | 5.05 (±1.08) |
Note: The depth of color shading in the table reflects the magnitude of the mean values, with darker shades indicating higher values and lighter shades indicating lower values. The same applies to the tables below.
Correlation analysis between the visual landscape satisfaction and proportion of each visual element.
Visual Landscape Satisfaction | Sky | Vegetation | Construction | Dynamic | |
---|---|---|---|---|---|
Visual landscape satisfaction | 1 | 0.241 ** | 0.258 ** | −0.267 ** | −0.142 * |
Sky | 1 | 0.979 ** | −0.975 ** | −0.894 ** | |
Vegetation | 1 | −0.998 ** | −0.842 ** | ||
Construction | 1 | 0.810 ** | |||
Dynamic | 1 |
Note: * and ** indicate levels of significance, where * represents p < 0.05 and ** represents p < 0.01.
Mean values (standard deviations) of the acoustic comfort ratings for different audio-visual combinations.
Category | Natural Sound | Agricultural Sound | Living Sound | Folklore Sound | |
---|---|---|---|---|---|
Evaluation Values of Audio-visual Combinations | Natural Space | 5.93 (±0.98) | 4.69 (±1.33) | 4.29 (±1.35) | 3.80 (±1.24) |
Agricultural Space | 5.75 (±1.03) | 5.24 (±1.16) | 4.06 (±1.32) | 3.84 (±1.41) | |
Living Space | 5.36 (±1.17) | 3.94 (±1.30) | 4.41 (±1.20) | 3.79 (±1.40) | |
Folklore Space | 5.14 (±1.27) | 4.11 (±1.39) | 3.90 (±1.26) | 4.25 (±1.47) | |
Mean Value of Audio-visual Combinations | 5.55 (±1.13) | 4.50 (±1.39) | 4.17 (±1.29) | 3.92 (±1.48) | |
Purely Audio Material | 5.84 (±1.14) | 4.35 (±1.30) | 3.75 (±1.28) | 3.88 (±1.85) |
Mean values (standard deviations) of the audio-visual harmony ratings for different audio-visual combinations.
Category | Natural Space | Agricultural Space | Living Space | Folklore Space | Mean Value |
---|---|---|---|---|---|
Natural Sound | 5.76 (±1.29) | 5.35 (±1.31) | 4.08 (±1.55) | 2.56 (±1.58) | 4.44 (±1.90) |
Agricultural Sound | 4.30 (±1.58) | 6.09 (±1.00) | 3.15 (±1.61) | 2.59 (±1.62) | 4.03 (±1.99) |
Living Sound | 3.68 (±1.73) | 3.58 (±1.52) | 5.20 (±1.30) | 3.29 (±1.63) | 3.94 (±1.72) |
Folklore Sound | 2.66 (±1.69) | 2.86 (±1.78) | 3.44 (±1.81) | 5.54 (±1.44) | 3.63 (±2.03) |
Mean Value | 4.10 (±1.93) | 4.47 (±1.93) | 3.97 (±1.76) | 3.50 (±1.98) |
D values, representing the differences in visual landscape satisfaction between matched and mismatched audio-visual combinations with identical visual materials, are presented along with the results of the significance analysis.
Visual Material | Natural Sound | Natural Sound | Natural Sound | Natural Sound |
---|---|---|---|---|
Natural Space | 0.25/0.024 * | 0.5/0.001 ** | 0.81/0.000 *** | 0.36/0.004 ** |
Visual Material | Agricultural Sound | Agricultural Sound | Agricultural Sound | Agricultural Sound |
Agricultural Space | −0.18/0.079 | 0.29/0.006 ** | 0.68/0.000 *** | 0.14/0.224 |
Visual Material | Living Sound | Living Sound | Living Sound | Living Sound |
Living Space | − 0.34/0.004 ** | 0.23/0.135 | 0.31/0.005 ** | −0.02/0.925 |
Visual Material | Folklore Sound | Folklore Sound | Folklore Sound | Folklore Sound |
Folklore Space | 0.27/0.048 * | 0.37/0.004 ** | 0.58/0.000 *** | −0.08/0.602 |
Note: */** and *** indicate levels of significance, where * represents p < 0.05, ** represents p < 0.01, and *** represents p < 0.001.
D values, representing the differences in acoustic comfort between matched and mismatched audio-visual combinations with identical audio materials, are presented along with the results of the significance analysis.
Audio Material | Natural Space | Natural Space | Natural Space | Natural Space |
---|---|---|---|---|
Natural Sound | 0.18/0.123 | 0.57/0.000 *** | 0.79/0.000 *** | 0.09/0.517 |
Audio Material | Agricultural Space | Agricultural Space | Agricultural Space | Agricultural Space |
Agricultural Sound | 0.55/0.000 *** | 1.3/0.000 *** | 1.13/0.000 *** | 0.89/0.000 *** |
Audio Material | Living Space | Living Space | Living Space | Living Space |
Living Sound | 0.12/0.444 | 0.35/0.031 * | 0.51/0.000 *** | 0.66/0.000 *** |
Audio Material | Folklore Space | Folklore Space | Folklore Space | Folklore Space |
Folklore Sound | 0.45/0.014 * | 0.41/0.013 * | 0.46/0.001 ** | 0.37/0.054 |
Note: */** and *** indicate levels of significance, where * represents p < 0.05, ** represents p < 0.01, and *** represents p < 0.001.
References
1. Wang, T.; Ma, J.M.; Wang, D.S.; Kemi, A.; Yu, P. Extenics: A new approach for the Design, Reconstruction and Renewal of Traditional Villages. Procedia Comput. Sci.; 2019; 162, pp. 908-915.
2. Li, X.; Wang, Z.; Xia, B.; Chen, S. Testing the Associations between Quality-Based Factors and Their Impacts on Historic Village Tourism. Tour. Manag. Perspect.; 2019; 32, 100573. [DOI: https://dx.doi.org/10.1016/j.tmp.2019.100573]
3. Wu, W.; Wang, J. Gentrification Effects of China’s Urban Village Renewals. Urban Stud.; 2016; 54, pp. 214-229. [DOI: https://dx.doi.org/10.1177/0042098016631905]
4. Fan, L.; Liu, Y.; Zhang, D. Spatial Pattern of Tourism Development of Chinese Traditional Villages and Its Influencing Factors. Econ. Geogr.; 2023; 43, pp. 203-214. (In Chinese)
5. Yuan, S.; Tang, G.; Zhang, H.; Gong, Q.; Yin, X.; Huang, G. Spatial Distribution Pattern of Traditional Villages and Brief Analysis of Han Chinese Subgroup Characteristics in Guangdong. Trop. Geogr.; 2017; 37, pp. 318-327. (In Chinese)
6. Public Map Service of Guangdong Province. Available online: https://guangdong.tianditu.gov.cn/ggdt/#/public/thematic-map/atlas-map/%E5%9C%B0%E5%9B%BE%E9%9B%86/150 (accessed on 13 October 2024).
7. Luo, X.L. An Investigation into the Origins of the Hakka; China Huaqiao Publishing House: Beijing, China, 1989; (In Chinese)
8. Xie, G.; Zhou, Y.; Liu, C. Spatial Distribution Characteristics and Influencing Factors of Hakka Traditional Villages in Fujian, Guangdong, and Jiangxi, China. Sustainability; 2022; 14, 12068. [DOI: https://dx.doi.org/10.3390/su141912068]
9. Erbaugh, M. The secret history of the Hakkas: The Chinese revolution as a Hakka enterprise. China Q.; 1992; 132, pp. 937-968. [DOI: https://dx.doi.org/10.1017/S0305741000045495]
10. Liu, P. On Construction and Utilization of Chinese Traditional Settlements Landscape’s Genetic Map. Ph.D. Thesis; Beijing University: Beijing, China, 2011.
11. Zhou, L.; Luo, D. The Hakka Settlements Beside Upper Reach of Hanjiang River: The Research of Traditional Villages in Meizhou. South Archit.; 2016; 1, pp. 24-27.
12. Kawai, H. Traditional Environmental Knowledge in Hakka Weilongwu: From the View of Anthropology of Landscape. Proceedings of the International Symposium on Innovation and Sustainability of Structures in Civil Engineering; Xiamen, China, 18–30 October 2011; Southeast University Press: Nanjing, China, 2011; pp. 332-339.
13. Lowe, K. Heaven and Earth—Sustaining Elements in Hakka Tulou. Sustainability; 2012; 4, pp. 2795-2802. [DOI: https://dx.doi.org/10.3390/su4112795]
14. Katayama, K. Spatial Order and Typology of Hakka Dwellings. Proceedings of the International Symposium on Innovation and Sustainability of Structures in Civil Engineering; Xiamen, China, 18–30 October 2011; Southeast University Press: Nanjing, China, 2011; pp. 323-331.
15. Jeon, J.Y.; Jo, H.I. Effects of Audio-Visual Interactions on Soundscape and Landscape Perception and Their Influence on Satisfaction with the Urban Environment. Build. Sci.; 2020; 169, 106544. [DOI: https://dx.doi.org/10.1016/j.buildenv.2019.106544]
16. Arriaza, M.; Cañas-Ortega, J.F.; Cañas-Madueño, J.A.; Ruiz-Aviles, P. Assessing the Visual Quality of Rural Landscapes. Landsc. Urban Plan.; 2004; 69, pp. 115-125. [DOI: https://dx.doi.org/10.1016/j.landurbplan.2003.10.029]
17. Fry, G.; Tveit, M.S.; Ode, Å.; Velarde, M.D. The Ecology of Visual Landscapes: Exploring the Conceptual Common Ground of Visual and Ecological Landscape Indicators. Ecol. Indic.; 2009; 9, pp. 933-947. [DOI: https://dx.doi.org/10.1016/j.ecolind.2008.11.008]
18. Murray Schafer, S. Five Village Soundscapes; Arc: Vancouver, BC, Canada, 1977; ISBN 9780889850057
19.
20.
21.
22. Echevarria Sanchez, G.M.; Van Renterghem, T.; Sun, K.; De Coensel, B.; Botteldooren, D. Using Virtual Reality for Assessing the Role of Noise in the Audio-Visual Design of an Urban Public Space. Landsc. Urban Plan.; 2017; 167, pp. 98-107. [DOI: https://dx.doi.org/10.1016/j.landurbplan.2017.05.018]
23. Zhang, Y.; Kang, J.; Kang, J. Effects of Soundscape on the Environmental Restoration in Urban Natural Environments. Noise Health; 2017; 19, pp. 65-72. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29192615]
24. Guillermo, R.G.; Miguel, J. Perceptions and Effects of the Acoustic Environment in Quiet Residential Areas. J. Acoust. Soc. Am.; 2017; 141, pp. 2418-2429.
25. Meng, Q.; Luo, P.; Li, Y.R.; Guo, J.C. The Influence of Users’ Behavioral Characteristics on Soundscape in the Waiting Halls of Railway Stations. Adv. Mater. Res.; 2012; 518–523, pp. 3830-3833. [DOI: https://dx.doi.org/10.4028/www.scientific.net/AMR.518-523.3830]
26. Murray Schafer, R. European Sound Diary; Arc: Vancouver, BC, Canada, 1977; ISBN 9780889850040
27. Payne, S.R. The Production of a Perceived Restorativeness Soundscape Scale. Appl. Acoust.; 2013; 74, pp. 255-263. [DOI: https://dx.doi.org/10.1016/j.apacoust.2011.11.005]
28. Daugstad, K. Negotiating Landscape in Rural Tourism. Ann. Tour. Res.; 2008; 35, pp. 402-426. [DOI: https://dx.doi.org/10.1016/j.annals.2007.10.001]
29. Nilsson, M.; Berglund, B. Soundscape Quality in Suburban Green Areas and City Parks. Acta Acust. United Acust.; 2006; 92, pp. 903-911.
30. Deng, Z.; Dong, K.; Bai, D.; Tong, K.; Liu, A. A Case Study on Soundscape Analysis for the Historical and Ethnic Village of Dong Nationality in Zhaoxing County. Acoustics; 2021; 3, pp. 221-234. [DOI: https://dx.doi.org/10.3390/acoustics3020016]
31. Mao, L.; Zhang, X.; Ma, J.; Jia, Y. Cultural Relationship between Rural Soundscape and Space in Hmong Villages in Guizhou. Heliyon; 2022; 8, e11641. [DOI: https://dx.doi.org/10.1016/j.heliyon.2022.e11641] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36439751]
32. Zuo, L.; Zhang, J.; Zhang, R.; Zhang, Y.; Hu, M.; Zhuang, M.; Liu, W. The Transition of Soundscapes in Tourist Destinations from the Perspective of Residents’ Perceptions: A Case Study of the Lugu Lake Scenic Spot, Southwestern China. Sustainability; 2020; 12, 1073. [DOI: https://dx.doi.org/10.3390/su12031073]
33. Qin, Y.; Lei, Y.; Dong, X.; Gao, J. Subjective Evaluation of the Soundscape of Taiyuan Diantou Village and Its Influencing Factors Analysis. Noise Vib. Control; 2024; 44, pp. 209-214. (In Chinese)
34. Liu, Y.; Zhang, Y.; Du, M. Villagers’ and Tourists’ Perception of the Soundscape of She Ethnic Minority Villages from the Perspective of Placeness. Chin. Landsc. Archit.; 2024; 40, pp. 123-128. (In Chinese)
35. Xie, H.; Zhu, Y.; Luo, J.; Tian, Y. Soundscape Features and Conservation Strategies of Hani Traditional Villages Based on the Integrated Four-fold System: A Case Study in Azheke. Chin. Landsc. Archit.; 2024; 40, pp. 116-122. (In Chinese)
36. Wang, X.; Han, F.; Yan, H. Survey of Soundscape in Traditional Villages and Construction of Satisfaction Evaluation Model—Take Zhaoxing Dong Village as an Example. Build. Sci.; 2021; 37, pp. 56-62. (In Chinese)
37. Liu, S.; Wu, L.; Xiang, C.; Dai, W. Revitalizing Rural Landscapes: Applying Cultural Landscape Gene Theory for Sustainable Spatial Planning in Linpu Village. Buildings; 2024; 14, 2396. [DOI: https://dx.doi.org/10.3390/buildings14082396]
38. Duan, Y.; Yan, L.; Lai, Z.; Chen, Q.; Sun, Y.Y.; Zhang, L. The Spatial Form of Traditional Villages in Fuzhou Area of Jiangxi Province Determined via GIS Methods. Front. Earth Sci.; 2022; [DOI: https://dx.doi.org/10.1007/s11707-022-0986-1]
39. Zheng, X.; Wu, J.; Deng, H. Spatial Distribution and Land Use of Traditional Villages in Southwest China. Sustainability; 2021; 13, 6326. [DOI: https://dx.doi.org/10.3390/su13116326]
40. Chen, M.; Yu, P.; Zhang, Y.; Wu, K.; Yang, Y. Acoustic Environment Management in the Countryside: A Case Study of Tourist Sentiment for Rural Soundscapes in China. J. Environ. Plan. Manag.; 2021; 64, pp. 2154-2171. [DOI: https://dx.doi.org/10.1080/09640568.2020.1862768]
41. Chi, F.; Li, G.; Guan, B. Listening Art of Traditional Dwellings in Zhejiang: A Case Study of the “Traditional Village Soundscape” and Protection Policy of Sizhai Village. City Plan. Rev.; 2019; 43, pp. 84-90. (In Chinese)
42. Anderson, L.M.; Mulligan, B.E.; Goodman, L.S.; Regen, H.Z. Effects of sounds on preferences for outdoor settings. Environ. Behav.; 1983; 15, pp. 539-566. [DOI: https://dx.doi.org/10.1177/0013916583155001]
43. Li, H.; Lau, S.K. A Review of Audio-Visual Interaction on Soundscape Assessment in Urban Built Environments. Appl. Acoust.; 2020; 166, 107372. [DOI: https://dx.doi.org/10.1016/j.apacoust.2020.107372]
44. Hunter, M.D.; Eickhoff, S.B.; Pheasant, R.J.; Douglas, M.J.; Watts, G.R.; Farrow, T.F.D.; Hyland, D.; Kang, J.; Wilkinson, I.D.; Horoshenkov, K.V. The State of Tranquility: Subjective Perception Is Shaped by Contextual Modulation of Auditory Connectivity. NeuroImage; 2010; 53, pp. 611-618. [DOI: https://dx.doi.org/10.1016/j.neuroimage.2010.06.053] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20600971]
45. Li, H.N.; Chau, C.K.; Tang, S.K. Can Surrounding Greenery Reduce Noise Annoyance at Home?. Sci. Total Environ.; 2010; 408, pp. 4376-4384. [DOI: https://dx.doi.org/10.1016/j.scitotenv.2010.06.025]
46. Van Renterghem, T.; Botteldooren, D. View on Outdoor Vegetation Reduces Noise Annoyance for Dwellers near Busy Roads. Landsc. Urban Plan.; 2016; 148, pp. 203-215. [DOI: https://dx.doi.org/10.1016/j.landurbplan.2015.12.018]
47. Liu, J.; Kang, J.; Behm, H.; Luo, T. Effects of Landscape on Soundscape Perception: Soundwalks in City Parks. Landsc. Urban Plan.; 2014; 123, pp. 30-40. [DOI: https://dx.doi.org/10.1016/j.landurbplan.2013.12.003]
48. Ren, X.; Kang, J. Effects of the Visual Landscape Factors of an Ecological Waterscape on Acoustic Comfort. Appl. Acoust.; 2015; 96, pp. 171-179. [DOI: https://dx.doi.org/10.1016/j.apacoust.2015.03.007]
49. Hsieh, M. Research on the Effect of Subjective Evaluation of Environmental Sound under Different Scenery: The Case of Taiwan. J. Environ. Eng. (Trans. AIJ); 2008; 73, pp. 519-525. [DOI: https://dx.doi.org/10.3130/aije.73.519]
50. Liu, C.; Kang, J.; Xie, H. Effect of Sound on Visual Attention in Large Railway Stations: A Case Study of St. Pancras Railway Station in London. Build. Environ.; 2020; 185, 107177. [DOI: https://dx.doi.org/10.1016/j.buildenv.2020.107177]
51. Preis, A.; Kociński, J.; Hafke-Dys, H.; Wrzosek, M. Audio-Visual Interactions in Environment Assessment. Sci. Total Environ.; 2015; 523, pp. 191-200. [DOI: https://dx.doi.org/10.1016/j.scitotenv.2015.03.128] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25863510]
52. Zhang, Z.; Gao, Y.; Zhou, S.; Zhang, T.; Zhang, W.; Meng, H. Psychological Cognitive Factors Affecting Visual Behavior and Satisfaction Preference for Forest Recreation Space. Forests; 2022; 13, 136. [DOI: https://dx.doi.org/10.3390/f13020136]
53. Chen, X.; Kang, J. Acoustic Comfort in Large Dining Spaces. Appl. Acoust.; 2017; 115, pp. 166-172. [DOI: https://dx.doi.org/10.1016/j.apacoust.2016.08.030]
54. Abe, K. The Effects of Visual Information on the Impression of Environmental Sounds. 1999; Available online: https://www.ingentaconnect.com/contentone/ince/incecp/1999/00001999/00000002/art00020 (accessed on 25 November 2024).
55. Guo, X.; Liu, J.; Chen, Z.; Hong, X.-C. Harmonious Degree of Sound Sources Influencing Visiting Experience in Kulangsu Scenic Area, China. Forests; 2023; 14, 138. [DOI: https://dx.doi.org/10.3390/f14010138]
56. Jeon, J.Y.; Jo, H.I.; Lee, K. Psycho-Physiological Restoration with Audio-Visual Interactions through Virtual Reality Simulations of Soundscape and Landscape Experiences in Urban, Waterfront, and Green Environments. Sustain. Cities Soc.; 2023; 99, 104929. [DOI: https://dx.doi.org/10.1016/j.scs.2023.104929]
57. Deng, R.; Liao, W. The Subjective Assessment of Soundscape in Campus—Taking YaoHu Campus of Jiangxi Normal University for Example. Adv. Mater. Res.; 2012; 518–523, pp. 3792-3795. [DOI: https://dx.doi.org/10.4028/www.scientific.net/AMR.518-523.3792]
58. Raimbault, M.; Lavandier, C.; Bérengier, M. Ambient Sound Assessment of Urban Environments: Field Studies in Two French Cities. Appl. Acoust.; 2003; 64, pp. 1241-1256. [DOI: https://dx.doi.org/10.1016/S0003-682X(03)00061-6]
59. Ma, K.; Wong, H.; Mak, C.M. A Systematic Review of Human Perceptual Dimensions of Sound: Meta-Analysis of Semantic Differential Method Applications to Indoor and Outdoor Sounds. Build. Environ.; 2018; 133, pp. 123-150. [DOI: https://dx.doi.org/10.1016/j.buildenv.2018.02.021]
60.
61.
62.
63.
64.
65. Axelsson, Ö.; Nilsson, M.E.; Berglund, B. A Principal Components Model of Soundscape Perception. J. Acoust. Soc. Am.; 2010; 128, pp. 2836-2846. [DOI: https://dx.doi.org/10.1121/1.3493436]
66. Schulte-Fortkamp, B.; Fiebig, A. The Daily Rhythm of Soundscape. J. Acoust. Soc. Am.; 2006; 120, 3238. [DOI: https://dx.doi.org/10.1121/1.4788252]
67. Yan, R.; Ren, X.; Wang, S.; Bai, X.; Zhang, X. RainMind: Investigating Dynamic Natural Soundscape of Physiological Data to Promote Self-Reflection for Stress Management. Int. J. Hum. Comput. Interact.; 2024; pp. 1-18. [DOI: https://dx.doi.org/10.1080/10447318.2024.2364468]
68. Alías, F.; Socoró, J. Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes. Appl. Sci.; 2017; 7, 146. [DOI: https://dx.doi.org/10.3390/app7020146]
69. Liu, J.; Wang, Y.; Zimmer, C.; Kang, J.; Yu, T. Factors Associated with Soundscape Experiences in Urban Green Spaces: A Case Study in Rostock, Germany. Urban For. Urban Green.; 2019; 37, pp. 135-146. [DOI: https://dx.doi.org/10.1016/j.ufug.2017.11.003]
70. Xu, X.; Wu, H. Audio-Visual Interactions Enhance Soundscape Perception in China’s National Parks. Urban For. Urban Green.; 2021; 61, 127090. [DOI: https://dx.doi.org/10.1016/j.ufug.2021.127090]
71. Carles, J.L.; Barrio, I.L.; de Lucio, J.V. Sound Influence on Landscape Values. Landsc. Urban Plan.; 1999; 43, pp. 191-200. [DOI: https://dx.doi.org/10.1016/S0169-2046(98)00112-1]
72. Kang, J. Urban Sousnd Environment; Taylor & Francis: London, UK; New York, NY, USA, 2007; ISBN 9780415358576
73. Brambill, A.G.; Maffei, L. Responses to Noise in Urban Parks and in Rural Quiet Areas. Acta Acust. United Acust.; 2006; 92, pp. 881-886.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Traditional villages in the Hakka region of Guangdong Province have attracted significant attention for their unique cultural heritage and traditional lifestyles. Their favourable audio-visual environments offer immersive and realistic experiences for both residents and visitors. Thus, we selected four representative villages and used semantic segmentation to extract the core visual elements (sky, vegetation, construction, and dynamic) from visual landscape images. Audio-visual interaction experiments and subjective surveys were conducted to investigate the participants’ evaluations of the visual landscape and soundscape to explore the mechanisms of audio-visual interaction. The results revealed that different audio-visual combinations significantly influenced the participants’ visual landscape satisfaction, acoustic comfort, and audio-visual harmony evaluations. Specifically, visual images of natural spaces with a high proportion of sky (24.54%) and vegetation (72.56%), matched with natural sounds (with a sound pressure level of approximately 55 dB) such as birdsong, wind, and flowing water, received excellent ratings for both visual landscape satisfaction and acoustic comfort evaluations. Moreover, the findings further revealed that coordination between visual and audio materials was crucial for enhancing the participants’ perceptions and assessments, highlighting the importance of audio-visual coordination in creating harmonious environments. These findings provide recommendations for spatial planning, landscape design, and soundscape optimisation in traditional villages.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 College of Architecture and Urban Planning, Guangzhou University, Guangzhou 510006, China;
2 School of Design, The Hong Kong Polytechnic University, Hong Kong SAR 100872, China