Content area
Immersive technology like Virtual Reality enables the integration of multisensory stimuli in data visualizations, enhancing comprehension and decision-making, especially when the visual channel is overloaded or ambiguous. While most Immersive Analytics (IA) applications rely on visual-centric approaches, multisensory feedback remains rare, and few studies have explored non-visual senses as conventional channels for data representation. Existing work often focuses on sensory substitution rather than leveraging the potential of redundant multisensory codification. This study addresses this gap by investigating the combined use of visual, auditory, and haptic stimuli for encoding data in a simple abstract visualization. We conducted a within-subjects study (n = 52) using a 3D point-cloud visualization to evaluate how redundant codifications (node radius, pitch, and vibration intensity) influence task efficiency, accuracy, and user workload. We also studied behaviour and user experience when presented with the different combinations of stimuli. Our findings show that redundant mappings positively affect performance metrics in situations where the visual modality might be compromised. The user experiences varied, underscoring the need for tailored training and adaptation to artificial sensory codifications such as vibration. While limited to simple visualizations, this study provides insights into designing effective multisensory redundant codifications.
Introduction
The way in which data are conveyed and interacted with profoundly affects its accessibility, processing, and the actions stemming from its interpretation. While the visual channel has traditionally dominated data representation (Posner et al. 1976; Hogan and Hornecker 2016), advances in immersive technologies offer unique opportunities to expand beyond the visual-centric approach. Virtual Reality (VR) enables multisensory interaction, integrating auditory and haptic signals with visual information to create richer and more engaging experiences (Skarbez et al. 2019; Klein et al. 2022). Research suggests that spatial interaction, immersion, and multisensory presentations can enhance data comprehension and support informed decision-making (Tak and Toet 2013; Marriott et al. 2018). Previous work has shown the potential benefits of multimodality and multisensory feedback in immersive environments, such as increased presence (Fröhlich and Wachsmuth 2013), user satisfaction (Nesbitt and Hoskens 2008), increased spatial awareness (Andreasen et al. 2019), reduced workload (Lecuyer et al. 2013), and improved task performance (Bhardwaj et al. 2021). However, despite its potential, multisensory feedback remains underutilized in immersive data analytics (Skarbez et al. 2019; Ens et al. 2021).
Previous research has primarily focused on translating 2D visualizations into 3D spaces (Korkut and Surer 2023; Fonnet and Prie 2021; Kraus et al. 2022) or using multisensory feedback as mechanisms for sensory substitution (Dubus and Bresin 2013; Yu and Brewster 2002), often targeting accessibility for users with physical impairments (Patnaik et al. 2019).
Few studies have explored how redundant multisensory codifications, where multiple sensory stimuli encode the same data attribute, may enhance performance and user experience in Immersive Analytics (Berger and Bill 2019; Patnaik et al. 2019; Prouzeau et al. 2019; Bhardwaj et al. 2021). By providing overlapping sensory cues, redundant codifications can compensate for limitations in individual modalities, such as when visual information is ambiguous or inaccessible (Nagel et al. 2008). This approach holds potential to deal with the varied preferences, expectations and perceptual abilities of users. However, the design and evaluation of redundant sensory mappings (Fonnet and Prie 2021) remain largely unexplored, especially in immersive environments. Moreover, the integration of multisensory mappings and their design affordances in immersive environments are still open challenges (Marriott et al. 2018; Martin et al. 2022). To address these opportunities and challenges, our study investigates the role of redundant multisensory mappings in enhancing performance and user experience in immersive data visualizations.
We aim at analysing whether adding multisensory stimuli to a visualization could add value in terms of performance metrics (e.g., efficiency, accuracy, workload). We also explore how the different multisensory representations present in a data visualization may impact the user behaviour or experience. In particular, we explore the use of redundant multisensory mappings (Fonnet and Prie 2021) to convey information that would be typically represented exclusively through visual properties (shape, colour, size, position, etc.) in an 3D abstract point-cloud visualization. We designed our study to explore the following questions:
How do multisensory mappings influence task performance in immersive analytics? This question aims to investigate the effect of using redundant encodings (such as node radius, pitch, and vibration intensity) on the efficiency and accuracy of task performance in a small abstract visualization.
What is the impact of multisensory data representations on user behaviour and experience in data visualization tasks? This question explores the broader implications of incorporating multiple sensory modalities into immersive data visualizations.
Related work
Our research lies in the intersection of multisensory data representation and Immersive Analytics.
In this section we begin by discussing previous work regarding perception and multisensory integration as to understand implications with regards to interface design. Next, we introduce background concepts related to using the various human sensory capabilities to represent information and elaborate on how previous work has attempted to use non-visual senses to encode data in non-immersive contexts.
Finally, we discuss related work focused precisely on multisensory data in representation immersive data analytics systems.
Cross-sensory perception for multisensory interface design
Interactions with objects in the physical environment are by default multisensory experiences in which different sensory modalities are stimulated in parallel (Martin et al. 2022; Stein 2012). Perception and integration of multisensory stimuli has been reported to be a process that depends on both internal factors (e.g., a person may be expecting to receive a specific stimuli—directing attention) and external ones (e.g., a sound catches the attention of a person). However, most research work in perception to date has only studied the perception in a unimodal fashion, that is to say, one sense at a time (Moloney et al. 2018).
Among the few cross-modal perception studies in the literature, we find that the combination of visual and sonic stimuli has received the most attention, followed by haptic stimulation combined with the other two (Spence and Ho 2015). A main focus of these perception studies has been to understand the mechanisms that make individuals perceive different multisensory stimuli as belonging together in the same event (Parise et al. 2012). For example, an alarm sound and a blinking light could be perceived as related, even if they are not. Conclusions suggest that for stimuli to be perceived as related, they should be rendered from approximately the same spatial location, and approximately at the same time (Spence and Ho 2015; Spence 2007).
Previous work also has found out that there seems to be an attentional cost (often referred to as the modality shifting effect) when people need to switch and perceive stimuli from different sensory channels (auditory, visual and tactile) (Spence and Ho 2015). The cost of switching between tactile and any of the other kinds of stimuli is the largest, while auditory-visual switching cost is smaller (see Spence et al. 2001). When designing multisensory interfaces, designers should also carefully choose stimuli origin and placement, with evidence suggesting that close stimuli are perceived as more “important” and thus requiring more immediate attention (Rizzolatti et al. 1997). Additionally, specific sensory variables from different sensory modalities have been found to be consistently associated across different sensory modalities. For instance larger and darker objects are usually associated with lower pitch and expected to be positioned lower in space (Spence 2011). Although little is known about the origin of crossmodal correspondences (Parise 2016), they have been hypothesized to contribute understanding the properties of the world surrounding us by means of sensory cue integration (Parise 2016). For instance, all different available sensory cues (such as sound, texture, color, etc.) contribute to how people may estimate the size of an object. There is evidence that when different multisensory mappings are chosen carefully to be congruent to each other (i.e., they are perceived as representing the same “dimension”) performance can be increased (e.g., response time reduced as in Miller (1991)).
Previous work has concluded that the use of multisensory cues is more effective to keep a user’s attention under conditions of high perceptual load (Spence and Ho 2015; Oskarsson et al. 2012). Finally, the affective component (i.e., the emotions that different stimuli can elicit on users) is an aspect which is also assumed to play an important role on user behaviour when exposed to multisensory interfaces. For instance, stimuli can be designed to induce negative emotions or to warn users effectively in critical situations (Ho and Spence 2009; Ho et al. 2013).
In our study, we designed sensory mappings to be congruent to each other, and all related stimuli was generated from a common spatial location and with close-to-zero delay. While more research on perceptual aspects is needed, this paper’s contribution is more focused in understanding the affordances of using different combinations of sensorial mapping that provide cross-sensory stimulation and how these encodings can shape user behaviour and experience. Additionally, we purposefully focus on the use of redundant mappings, to study whether cross-modal correspondences can foster the reliability of a visualization to convey information. The idea is that different redundant multisensory mappings (representing the same data attribute) can contribute to interpretation when a single sensory modality fails to deliver the message. By performing a relatively large user study we attempt to assert effects on task effectiveness of the proposed cross-modal correspondences (higher size, higher pitch, higher vibration).
Multi-sensory data representation
Creating a data visualisation is a pipelined process that involves transforming raw data into a set of real-world representations of the data that can be perceived, processed and decoded by people to extract information (see Jansen (2014)). Multisensory data representation extends the concept of information visualization by leveraging senses beyond sight and their properties to encode data in an attempt to enhance the ability of analysts to generate insight (Hogan and Hornecker 2016).
Previous theoretical work has extensively discussed and categorised the different properties that can be leveraged by visualization designers for data encoding, often called sensory variables (Carpendale 2003; Munzner 2016). Additionally, extensions to these taxonomies have been proposed to account for non-visual sensory variables, including sound variables (Krygier 1994), tactile and temporal (Nesbitt 2001), gustatory (Wang et al. 2016; ’Floyd’ Mueller et al. 2021), and olfactory ones (Patnaik et al. 2019). All of these attempts to characterise senses for data representation in depth aim to simplify the design of sensorial mappings for different visualization. A sensory mapping is the mechanism used to translate from data attributes (e.g., numbers, categories, etc.) to sensory channels (sight, hearing, touch, proprioception, smell and taste) and their respective sensory variables such as color, pitch or texture (Marriott et al. 2018).
Multisensory data representation has received attention specially in the areas of sonification,1haptification and physicalization.2 Here we summarize previous multisensory data representation research efforts that inspired our study.
In sonification, sonic variables such as pitch have been used in most instances to encode values so that different data points can be distinguished (Dubus and Bresin 2013). The sense of touch, conversely, has been more used to convey information about the structure of “traditional” visualizations rather than used to convey data attributes as such (Hornecker et al. 2024). Both sonification and haptification have been combined with results encouraging multimodality. For instance, audio has been found suitable to represent precise information or for confirmation when applied to “traditional charts” like bar-charts or pie-charts (Yu and Brewster 2002; Wai and Brewster 2002). This type of multimodality has also been reported to foster recall in map visualisations (e.g., Jeong 2005; Jeong and Gluck 2002). In network visualisations, haptics have been successfully used to encode structure information along with audio, with the multimodal condition leading to better recognition of nodes and structures in the graph (e.g,. Jay et al. 2008).
Following previous research legacy and highlighted potential, we designed our user study to provide sonic and haptic stimuli in combination with visual information in the context of data analysis applications. We chose pitch and vibration intensity as our non-visual sensory mappings to encode a data attribute, in our case the relevance to focus on the concept of multisensory information redundancy, which to our knowledge has rarely been discussed in the literature nor its effects on user behaviour evaluated.
Additionally, while many proposed alternatives to the visual modality to represent information have emerged in an attempt to improve accessibility to different sensory disabilities (Kramer et al. 2010), our study targets the general population of users.
Multisensory data representation in immersive analytics
Multisensory data representation has received very limited attention in the context of IA applications. Most examples found in the literature leverage exclusively the visual modality (see various IA reviews Ens et al. 2021; Skarbez et al. 2019; Saffo et al. 2023; Fonnet and Prie 2021) and do not make use of other sensory channels and mappings to convey information about a visualization.
Berger and Bill (2019) showcase an immersive visualization application that allows to explore city noise data through both visual (color) and sonic mappings (volume). Although their goal was to explore the use of sound properties to represent data which is inherently sonic, the system lacks a proper evaluation with users. In Scaptics, Prouzeau et al. (2019) explored the use of haptic feedback in combination with the visual modality to convey redundant information about 3D scatterplot point density, where density was mapped to vibration strength (of VR controllers). Participants of the user study were asked to find low and high density areas in the scatterplots and results showed that the haptic-visual stimuli combination can improve the performance with respect to a condition with only visual stimulation. In a similar fashion, Bhardwaj et al. (2021) leveraged a different kind of stimulation regarding the sense of touch (ultrasound haptic feedback) to represent also point density in 3D scatterplots as well as other non-abstract visualizations in the medical domain. Results support the hypotheses of improved performance in cluster identification when the combination of visual and haptic stimuli are available to analysts. Finally, senses other than touch, sight and vision have been thought of as suitable for data codification in IA systems. Patnaik et al. (2019) proposed viScent a prototype for information olfactation in VR consisting of specialized hardware that emits different scents that could potentially encode information that support an immersive visualization. However, even though they provide example use cases in the context of abstract visualizations (e.g., 3D network graph), the design of the proposed sensory mappings were arbitrary and the system was not evaluated. Consequently, no design conclusions can be generalised to other multi-sensory systems.
Our study contributes to provide some evidence on the efficacy of using non-visual sensory channels to redundantly encode information. In particular, we explore the use of two sensory channels (touch and hearing) and their combinations with visual stimuli as a means to encode the same data attribute (numeric) using a redundant encoding. In other words, the same data attribute can be mapped to either a single sensory mapping (e.g., size) or a combination of multisensory mappings (e.g., size, vibration, and pitch). As suggested by previous work, different sensory modalities may have different affordances, so we designed our user study in an attempt to further the understanding of how different and redundant combinations of sensory modalities may affect users’s performance and behaviour, building on top of observed potentials in non-immersive setups.
User study
Multisensory systems allows to reproduce sensory information from different senses, opening up many research opportunities in IA (Martin et al. 2022).
In this study we designed a set of multisensory mappings (namely node radius, pitch and vibration intensity) that encode the same data attribute, and analyze the implications of using these redundacy encodings in performance and behaviour of users while carrying a task on a point-cloud visualization.
Inspired by Hogan and Hornecker (2016), we set to determine whether adding more (multisensory) modalities adds value and how the different data representations present in a data visualization may alter the user behaviour or experience. The study investigates how redundant multisensory encodings, such as node radius, pitch, and vibration intensity, impact task performance in immersive analytics (RQ1), particularly in terms of efficiency and accuracy compared to traditional single-sensory approaches. It also examines how incorporating multiple sensory modalities influences user behavior and overall experience when interacting with data visualizations (RQ2).
Materials and experimental setup
The Meta Quest 2 headset was used to present the immersive visualization to participants with the Quest 2 Touch controllers used for interaction. The experiment session was carried out in a university room with 3 m 4.8 m of virtual usable space. These dimensions were configured to match the safe boundaries set in the headset, within which participants could (and were encouraged to) move around freely. We made sure there was extra free space around this safe boundaries to prevent users from worrying about hitting the walls during the VR experience. Also, we explained them how to know whether they were surpassing the boundaries or getting close to them via visual cues from the headset.
Participants were asked to wear Antilatency Bracers3 in both hands, which were used to render haptic feedback during the conditions involving the vibration encoding (SV and SVA, see Sect. 3.4). We used this hardware because it integrated seamlessly with our previous developments and allowed to provide precise and accurate vibration patterns in different parts of the body that we used in the task. Audio was emitted via the Meta Quest 2 built-in speakers, which were set to the maximum volume.
A researcher was present during the whole session, whose role was threefold: (1) provide instructions, (2) configure and monitoring the user actions in VR and during the session, (3) supervising data recording. See the Appendix A for further details on the experimental setup.
The software used to record the user view and camera action was OBS,4 and VR casting was made possible by the Meta Developer Hub application.5 The VR application used for the immersive experience was developed using the Unity6 game engine (2021.3.26f1). We leveraged the Unity Experiment Framework (Brookes et al. 2020) library to simplify the development of our repeated-measures study. To display post-task questionnaires within VR, we based our implementation on the VRQuestionnaireToolkit (Feick et al. 2020), and a self-developed software tool (Rey et al. 2022) was used to configure the VR application system remotely (e.g., configuring the order in which conditions should be presented to users, changing between phases of the experiment, restarting in case of errors, etc.).
Participants carried out the experiment standing up, and before starting they were reminded about the possibility to move around within the safe boundaries during the whole experiment. We chose real-walking as locomotion scheme in our study for several reasons. On the one hand, walking is an embodied locomotion mechanism that has shown potential to enhance the performance of analysts in IA tasks (e.g., Rey et al. 2023), also allowing them to leverage space for situated cognition. This would ultimately allow us to analyze task strategies (similarly to Lisle et al. 2021) and further reflect on affordances of the different multisensory stimuli. Additionally, the size of our ad-hoc visualization was small enough not to require any other kind of indirect locomotion scheme, which would also require extra learning effort from the participants and would introduce another confounding factor in the study.
No restrictions on time were imposed on the task but participants were told to perform trials “as accurately and fast as possible, finding a balance between speed and precision”. All participants began the immersive experience from the same point in the virtual environment and had a reference marker visible within VR at all times, to avoid disorientation. Users were instructed to return to the marked area between trials so that they started each trial from the same viewpoint. Participants were recorded performing the tasks in VR and encouraged but not forced to talk during the session, reflecting on what they were doing.
Participants
The recruitment of participants was performed via social networks and on campus advertising. A total of fifty-two (52) participants took part in the study. However, data from four of them had to be discarded due to data corruption, network connectivity errors or not understanding the task properly (as evidenced from the post-task interview). Thus all results reported in this paper are based on the forty-eight (48) remaining participants (40% female); with ages ranging from 18 to 63 (, ).
Participants did not receive any compensation for participating in the study. The study was approved by the Carlos III University’s ethics committee, and participants were duly informed about all of the data items that were to be recorded during the experiment via a consent form they signed at before starting the experiment session. Participants could leave at any point of the experiment session and we treated data in compliance with the GDPR. Power analyses before carrying out the study revealed that the minimum sample size to observe medium-size effects was twenty (20) participants. We used the G*Power software (Faul et al. 2007) to conduct the power analysis (Effect size f = 0.25, , power = 0.952).
Virtual environment and immersive visualization
In our study, participants were immersed in a VR room in which all of the main action of the experiment took place. Within this room-like environment, a point-cloud visualization was presented to the participants repeatedly and they were asked to perform a task on such visualization.
The room-like environment was designed to mimic in a very simplified fashion the disposition of our lab. A gray floor was used to represent the playable area, within whose boundaries all interactable objects would appear. Additional textured floor beyond the safe boundaries was introduced in order to prevent the feeling of falling off that some participants had reported in pilot studies. The boundaries to the left and right of the playable area were made explicit by low-poly tables, while the area in front of the user was limited by a large pink panel, which was created to show progress information to participants. Such information panel displays the percentage of completion of the whole experiment session, the block number (1–4), and trial number (1–10). This information was visible to participants at all times to help them feel they are making progress in the context of our repeated measures experiment (Conrad et al. 2010). Additional panels used for troubleshooting listed the devices connected to the headset and informed the researcher about the status of the internet connection, which was required for monitoring and retrieving the task logs from the headset as the experiment session came to an end. Such panels remained invisible unless the researcher made them visible remotely, to tackle runtime issues (Fig. 1).
The visualization used for the study is a point-cloud with eight nodes (polygonal spheres). The layout of the nodes is fixed, with center in eight predefined positions for all trials and experimental conditions (see Fig. 2a). The proposed visualization is purposefully designed to represent a single data attribute we refer to as relevance. Each node in the visualisation in a given trial is assigned a value for this attribute, which is then represented through one or more sensory mappings (size, pitch and/or vibration intensity) depending on the experimental condition.
[See PDF for image]
Fig. 1
Virtual environment and point-cloud visualization designed for the study. a First person view of a trial of the experiment. Participants had the point-cloud visualization appear in from of them. b VR environment in isolation, resembling our lab space. The dark gray surface represented the walkable area. c Feedback types available to participants to represent relevance (which are present at a given point in time depend on the experimental condition). d Top view of the VR experience. Participants started all trials from the same starting point looking towards the +z direction where, for each trial, a new version of the visualization would appear
Next, we detail the three sensory mappings implemented in our study. We chose three different sensory channels (vision, hearing and touch) to study affordances of different senses regarding data representation and interaction:
The simplest and most conventional sensory mapping we propose uses the visual channel. We encode information through the size of 3D nodes in the visualization. We resized nodes using the following formula given a base diameter and the base mesh of the node itself:
This sensory mapping is immediately accessible to users and no explicit interaction is required. We designed the encoded values so that differences in size between elements were not easy to discriminate, as concluded from a pilot study with multiple size differences. In this scenario, where it is very difficult to compare visually among the size of the nodes, we were interested to study whether other sensory channels could complement the visuals to overcome such difficulties.
Then we use sonification to map the same relevance values to a pitch. We chose the range of tones from B1 to E7 because pitches out of this range are harder to discriminate (Brown et al. 2002). We chose ranges within the given range randomly and assigned pitches from lowest to highest separated by a perfect 4th interval (four semitones apart). The lowest pitch in the obtained random interval is mapped always to the largest relevance value, and then, the rest of the values are assigned going a 4th up. The sounds with varying pitch used to encode data are sampled sounds that correspond to a grand piano. Using Cubase7 as a DAW (Digital Audio Workstation), we created a set of MIDI instructions that covered the whole range of notes in the desired interval, with each note having a duration of 2 s. Then, the exported music excerpt (containing consecutive individual two second samples) was cut into individual samples (using FFmmpeg8) and used for the VR application. Auditory feedback had to be elicited when participants made any of their hands collide with a node. Thus, sound was only triggered when the participant performed an action. Two sounds could be played simultaneously using both hands to reach for two nodes.
Finally, we map relevance to vibration intensity conveyed via the Antilatency Bracer actuators (see Sect. 3.1) in each hand. We linearly mapped the range of relevance values to the Antilatency API 0-1 range of intensity. Vibration was triggered when users made contact with a sphere, and had a fixed duration of 2 s (to match the duration of sounds). Again, vibration was only triggered when the participant performed an action, touching a node. Two vibrations could be played simultaneously using both hands to reach for two nodes.
Experimental conditions
Participants were exposed to four experimental conditions following a within subjects-design. The task objective remains the same for all trials but the sensory mappings available in each block change. We used a Balanced Latin Square design to determine the order of the conditions for each participant, in an attempt to mitigate learning effects. There are four (4) blocks, which we counterbalanced to be presented in different order to participants:
: the node radius is the only characteristic that represents the difference among relevance values in the visualization.
: both node radius and pitch encode the differences among relevance values.
: node radius and vibration intensity of the worn actuators encode the differences among relevance values.
: all the sensorial mappings proposed represent the relevance values at the same time in the visualization.
[See PDF for image]
Fig. 2
Point-cloud visualization details and data to sensory variables transformation. a Placement of nodes in the point-cloud visualization. Positions are relative to the origin (0,0,0) which corresponds to the starting position of the participant (see orange circle in the playable area). b Sensory mapping examples for two different trials in the experiment combining size, pitch and vibration. The “Raw data values” column represents the values to encode using the different sensory mappings, and the three last columns show the correspondence between raw values and actual mappings accross sensory channels
Procedure
An experiment session comprised five phases.
Pre-task questionnaires: participants were asked to fill in an informed consent form and complete a self-reported questionnaire to gather demographic data, prior experience with IA and VR systems and self-reported musical skill. Refer to Table 1 for a comprehensive description of the items of the questionnaire.
Setup phase: the researcher informed each participant about the devices they were going to use and wear, and the basic buttons and actions that would be required to perform the proposed task. After that, the researcher helped the participant put on the VR headset and performed calibration to adjust the inter-pupilary distance so that participants were comfortable. Participants were given at this point the Antilatency actuator accessories so that they could adjust them to their hands before starting the VR experience. Once the calibration was finished and participants felt comfortable wearing all the devices, they were instructed to move to the starting position in the room and the controllers were given to them to start the pre-task tutorial.
Training phase: once the participant was located at the starting position in the room, the researcher remotely started the experiment VR application on the HMD, and the tutorial began. Participants learnt first to grab elements and so that they could move them around the environment. Some items appeared in front of participants and they had to repeat the drag and drop action four times over different 3D elements. After that, a different tutorial section was triggered, in which participants were asked to find the 3D object that emitted the lowest pitched sound when touched (out of three elements that appeared in front of them). Confirmation was made by placing the chosen element in a reserved box to the left of the user. Participants repeated this same task five times where pitches were randomly chosen and reassigned to elements. After each confirmation, the elements disappeared and participants had to wait for 2 s until the new assignment of pitches was performed. Users received auditory feedback as soon as the new attempt was ready, and the sound emitter elements were made visible immediately. The purpose of this second tutorial was to assess the ability of users to distinguish between pitches and being able to say what pitch is lower, regardless of their self-reported musical ability. Additionally, this task follows the same exact procedure that participants would be required to follow during the non-tutorial trials.
Task execution phase: once participants were comfortable with the controls (they could attempt the tutorial several times if desired), they were relocated in the defined starting position so that the researcher could remotely trigger the start of the main task phase. Before loading the main VR task, participants were introduced to what they would be asked to perform repeatedly during the session. They were told that a set of nodes would appear in front of them, each representing a value of relevance using a set of multisensory channels (explained before starting each block) and that they would have to find the element/node that encodes the largest relevance value and confirming their selection by putting the chosen node into the reserved confirmation slot (see Fig. 3b). Additionally, participants were explicitly asked to perform the task “as accurately and quickly as possible, finding a balance between precision and speed”. Figure 3 illustrates the sequence of tasks the participants were presented with. This phase comprised four different blocks, one per experimental condition, whose presentation is counterbalanced to mitigate learning effect. Any given block had the same exact structure, and distinct blocks differ exclusively in the sensory mappings used to represent the relevance. Before each block, a text+audio guide appeared reminding participants what the following block was about, explicitly describing the sensory mappings to be expected in the subsequent trials. Participants had to press a 3D UI button to start the first trial of each block and could ask doubts (if any) before beginning the set of trials. A block required a total of 10 trials to be completed, and for each trial the actual values being represented in the visualization were reassigned among the different nodes. After each block, participants were asked to complete the NASA-TLX questionnaire in VR, using raycast+trigger interaction.
Post-task phase: we conducted a semi-structured interview with participants asking about the following topics:
General impressions: participants were asked to reflect on problems, frustration and things that came to them as surprising or worth a mention.
Strategies and approaches to the task in the different experimental conditions
Ranking of conditions according to understandability, meaning the usefulness of the different combinations of sensorial mappings in each experimental condition. Participants were asked to justify their answers.
Ranking according to sheer enjoyment. Participants were asked to justify their preferences.
Multisensory proposals by the users for the use case of the experiment. They were encouraged to brainstorm about the types of multisensory feedback (beyond visuals) and the way they could be used to help them in these tasks.
[See PDF for image]
Fig. 3
Experiment procedure and task objectives. a Structure of a full experiment session. Block assignment is counterbalanced and each block represents an experimental condition (S, SA, SV and SVA). Each block has the same structure (Instructions, repeated measures and post task-questionnaire). A total of 40 trials (10 per block) are carried out by each participant. b Repeated task description and objective. For each block, participants are required to find the element with the highest relevance value encoded using the given combination sensory mappings
Measures
We collected quantitative and qualitative data before, during, and after the task execution phase. The pre-task questionnaire contains demographic data, about user sensory capabilities and prior knowledge about technology and information visualization. Table 1 shows the specific items asked to participants in this part of the experiment.
Table 1. Pre-task questionnaire items
Item | Scale/answer type |
|---|---|
Age | |
Gender | Male/Female/Non-binary/I prefer not to tell |
Musical skill | 7-Point likert scale |
Sensory impairments (if any) | Free text response |
Education level (Last level completed or in progress) | Primary school, Secondary School, Bachillerato, Grado Medio o Superior, University degree, Post-graduate studies, Other (specify) |
Academic background | Free text response |
Professional area | Free text response |
Prior experience with VR | 7-Point likert scale (Never used VR to I’m very familiar with VR) |
Familiar VR experiences | Multiple choice: Videogames, Professional training, VOIP/Telepresence, Audio or video streaming, Other (specify) |
Familiar IA concepts | Multiple choice: Virtual Reality, Meta Quest, Node-link graph, Joystick, Immersive Data Visualization |
Use of prescription lenses | Y/N |
During the task, we recorded video and audio of the interaction of users while performing the tasks for later analysis. Additionally, we logged quantitative data from the VR application for each trial, specifically completion time, success, number of nodes touched and number of manipulations performed (dragging nodes). After each block, users were required to complete the NASA-TLX questionnaire, whose data was recorded from the VR application via a panel displayed immediately after the last trial of a given block.
After the task phase we conducted the semi-structured interview focused on the topics described in Sect. 3.5. Audio of the interviews was recorded and transcribed for content analysis.
Interaction methods
Participants could perform essentially three actions in virtual space. In order to change the viewpoint, the participants could (1) walk around freely within the safe boundaries defined in VR. Regarding the visualization, participants held onto the Meta Touch controllers during the whole VR experience. Using these, participants could (2) touch the virtual nodes of the visualizations by putting their hands very close (or even inside) a node. Another action that was allowed was (3) manipulation; participants could grab elements by approaching them with one hand and pressing both triggers on the controller (i.e., closing the fist on a virtual element). Figure 11 illustrates the actions allowed to users in the developed VR application.
Results
The Shapiro–Wilk method was used to assess the normality of values for each set of measures. A one-way ANOVA was conducted on the quantitative measures recorded during the experiment . For each metric, we performed Tukey’s Post-Hoc analysis to compare groups one to one in order to detect significant differences between them. Reported Effect sizes for the ANOVAs use (Partial Eta Squared). Refer to Fig. 12 for a summary of all quantitative results and post-hoc pairwise comparison effect sizes (Cohen’s d) .
Task-performance
A one-way ANOVA (F = 24.725; ) revealed statistically significant differences in terms of accuracy, among the different conditions, with a large effect size (). Participants in the SA condition were found to be significantly more successful in the task when compared with the rest of conditions (; ; ) with . and pairwise comparisons were found to have a indicating a large effect size, whereas showed a , indicating a small effect size. Other statistical differences were found between the conditions lacking sonification (S and SV) when compared to the combination of all sensory mappings available at the same time (; ) with . post hoc comparison showed a , indicating a medium effect size. Instead, comparison resulted in , indicating a large effect size. Figure 4 shows box-plots and confidence intervals for the accuracy/success rate metric ANOVA.
Regarding completion time, the results of a one-way ANOVA (F = 1.689; ) provide no indication of statistically significant differences across experimental conditions.
With regards to interaction metrics, we performed two comparative analyses. First, we compared the means of nodes touched per trial in the different conditions, but the one-way ANOVA did not reveal statistically significant differences (which only made sense for the conditions other than S, which required touching elements to receive the multisensory feedback).
Secondly, we performed a one-way ANOVA to find statistical differences in the number of manipulated nodes in the different experimental conditions (F = 3.823; ), with a small effect size (). Specifically, we found that participants performed many more manipulation interactions in the S condition (only visual encoding present) compared to the conditions which made use of auditory feedback (i.e., and ). post hoc comparison showed a , indicating a small effect size. comparison instead shows a medium effect size with . No significant differences were found between the remaining pairs of conditions. Figure 5 shows the comparison in terms of manipulated nodes for the different conditions.
[See PDF for image]
Fig. 4
Accuracy results across all four experimental conditions. Conditions involving sonification seem to be more effective when it comes to improve accuracy. Vibration as only extra encoding channel (SV) seems to hinder the performance (in terms of accuracy) of users. a Success rate Box-plot. b Success rate CIs
[See PDF for image]
Fig. 5
Object manipulation results for all four experimental conditions. Participants in the S condition performed more manipulations than in SA and SVA. a Manipulations Box-plot. b Manipulations CIs
NASA workload
We performed a one-way ANOVA to find no significant differences in terms of NASA TLX overall scores (F = 1.823; ). Figure 6 shows mean scores and confidence intervals across the four experimental conditions.
We also compared the mean scores for individual NASA subscales via one-way ANOVA. We found statistically significant differences in reported performance (F = 3.144, ), with a small effect size (). Post-hoc analysis revealed that participants in the SA condition perceived themselves as more performant when compared to the SV condition (). Cohen’s d for this pairwise comparison , indicating a medium effect size. Additionally, results revealed significant differences in the NASA frustration scale (F = 4.13; ), with a medium effect size (). Frustration was found to be higher in the SV condition when compared to the conditions involving sonification, with vibration (; F = 3.125, ) and without it (). The observed difference has a Cohen’s , indicating a medium effect size, which is also the case for with .
Figure 7 illustrates the statistically significant differences found in NASA subscale scores.
We did not find any further statistically significant differences in the remaining NASA subscale scores, namely: Mental Demand, Physical Demand, Temporal demand, and Effort. Refer to the Appendix for complementary plots on all NASA subscale comparisons.
[See PDF for image]
Fig. 6
NASA overall scores. No significant differences were found although there seems to be a tendency to a lower workload in the conditions involving audio (SA and SVA). a NASA-TLX overall scores box-plot. b NASA-TLX overall confidence intervals
[See PDF for image]
Fig. 7
NASA TLX subscale plots for statistically significant differences found across conditions. a Performance. b Performance CIs. c Frustration. d Frustration CIs
Condition rankings
After experiencing all experimental conditions, participants were asked to rank (first to fourth) them according to understandability (meaning the perceived usefulness of the sensory mappings to the task) and enjoyment (general user experience). Herein, we report the quantitative aspects of the ranking but refer to the discussion to find the qualitative analysis and reflections on the participants’ justification of rankings.
Figure 8 (left) summarizes the preferences of users regarding the sensory mappings that were perceived as more useful to discern among elements in the visualization. Participants ranked SA as the condition in which they felt more certain about their selection (60% ranked it first), followed by SVA in the second ranking spot (52%). Regarding the 3rd and 4th positions, participants did not reach so much consensus. The combination of size and vibration (SV) and the visual feedback alone (S) were chosen as the least useful sensory mappings.
Regarding enjoyment, participants seemed to perceive the sonification aspect as providing value in terms of enjoyment. SA and SVA are most often ranked either first or second, although SA was the one most people (48%) chose as best for enjoyment (vs 29% of SVA). SV was commonly ranked third (43%) and S was ranked most often as the less enjoyable experience (52%) followed closely by SV (35%). You can refer to Fig. 8 (right) to see full the distribution of votes across participants.
[See PDF for image]
Fig. 8
Experimental condition ranking results. Understandability (left): participants ranked the different sensory mappings (first to fourth) in terms of the perceived usefulness of them for the task at hand. Enjoyment (right): participants ranked the sensory mappings according to to which extent they believe the encodings contributed to their enjoyment in general
Strategies
For each participant, video and audio during the experiment session were recorded. We went through the videos and created annotations for each participant and experimental condition/block. After watching videos for five (randomly sampled) participants we came up with a set of items to analyze in detail behaviours for each experimental condition (See Table 2 for details on the templates for analysis and examples with one participant). A single researcher went through all videos and filled in the template for all blocks and participants.
[See PDF for image]
Fig. 9
Strategy counts for each experiment condition. We report the occurrences of bimanual use, viewpoint relocation, node rearrangement and overlapping. There were a total of forty-eight (48) participants. For bimanual use, node rearrangement and node overlapping we only explicitly report the instances in which such approach was indeed leveraged by participants, omitting the rest
Figure 9 shows a summary of the video analysis quantitative results organized according to the aforementioned topics. The majority of participants made use of both hands to interact the multisensory experiences (when co-encoding was present: SA, SV and SVA). In the S condition, around half of the participants used only one hand.
Regarding the level of viewpoint relocation (given the three levels defined), it is clear that participants were more likely to remain stationary when auditory feedback was present. More variability in terms of movement in the scene was found in the SV condition, in which people moved in front of the visualization more often. When only size was present as a discriminator among nodes (S), we found there were various approaches, with around 1/3 of the participants classified as either , or . Participants were found to make use of overlapping in very few instances (<5%) when they had any kind of non-visual feedback available (i.e., in the SA, SV and SVA conditions). Regarding the S condition, about half of the participants (52%) made use of this strategy in an attempt to discern among nodes visually.
Rearrangement of nodes was less common in the conditions making use of sonification (SA and SVA), while SV displayed a 30% use of rearrangement during the task. More than half of the participants (53.1%) in the S condition were found to rearrange nodes as a strategy.
We performed video analysis and combined it with during and post-task feedback to delve into the strategies followed by users and their justification. Most users (42%) were found to approach overlapping in a very specific way. They used one hand to grab a candidate node to then overlap such node with the remaining nodes one at a time to see whether one exceeded the boundaries of the other. Another reported alternative was to grab two candidate nodes at once and overlapping them (10%).
Participants that rearranged the nodes in the visualization most often chose to place them in the same plane, vertically or horizontally aligned (46%). Once they had the nodes organized, a common approach was to iteratively rearrange nodes in ascending or descending order in terms of perceived relevance. In contrast, some participants reported using space to create clusters that had specific meaning to them. Sixteen (16) participants created their own area of potential candidates. A different approach was clustering all rejected nodes together (6) or having both previous types of clusters (candidates and rejected) defined in the environment by participants (4).
Discussion
In this section, we discuss implications of the obtained results with respect to the posed Research Questions (see Sect. 3) and beyond.
Multisensory mappings and task performance
Overall results suggest that the combination of size and pitch as mappings for values can help users discern elements in the proposed visualisation, increasing their accuracy in the task. The sonic mapping with the proposed tonal distances (fourths) seems to be effective in compensating for difficulties in visual comparisons, and was found to work best when no haptic feedback was present (SA condition). This aligns with previous work that suggest that audio is more suitable than vibration to encode precise information (Paneels and Roberts 2010). Participants did not only perform better quantitatively in the task when audio was present, but also they perceived the combination of sound and size as the most useful redundant encoding for the task at hand (see Sect. 4.3). They repeatedly (44%) reported that they felt more certain about their selection when this kind of stimulation was present (e.g., “I am almost sure that when there was sound, I was choosing the most relevant node” said participant 18). This increase in performance metrics linked to auditory stimulation is also supported by the differences in perceived NASA performance scores, which were higher in the conditions involving audio (SA and SVA). Additionally, around 48% of participants found themselves using sound as the main indicator of relevance (when available). For instance, participant 44 stated that “even if all nodes would have been exactly the same size, sound would be more than enough for me to perform the task with confidence”.
Regarding our proposed vibration sensory mapping, performance results are not very encouraging. Accuracy was found to be lower than the baseline encoding (S), meaning that the addition of the vibration to encode information as proposed seems to have a detrimental effect on this aspect of task performance. Self-reported performance also was found to be the lowest in the SV condition, which suggests that participants were aware of their decrease on accuracy or general performance.
In the semi-structured interview, vibration was reported as non-intuitive and six (6 ; 12.5%) participants indicated that they could not really discriminated across vibration levels. The differences in the reported frustration (NASA TLX) for the SV condition also support this point, indicating that vibration is in fact hindering the performance of users, also when presented alongside audio (SVA). Seven (7; 15%) participants reported to feel slower when carrying out the task, as if the feedback type was limiting their speed (e.g., participant 5 said “I felt like vibration made me much slower, I had to touch nodes individually and sequentially because I was not able to touch two nodes simultaneously and make sense of the difference”). Eleven (11; 23%) participants also reported perceiving vibration differently in the left and right hand, which surely introduced uncertainty when conducting the task. For instance, participant 8 stated: “I felt that for the same node, I would get more vibration touching it with my left hand rather than my right hand. What was I supposed to do then? Which hand should I trust?”. Such problem could be due to hardware rendering differences (even though we used allegedly identical devices for both hands) or perceptual differences in users which may be worth exploring in the future. Additionally, haptic (and also auditory) feedback had been previously observed to have negative impacts on performance in situations when the combined stimuli were perceived as semantically incongruent (Wenzel and Godfroy-Cooper 2021). Since vibration has typically been used to directing attention of users (Jones and Sarter 2008) in everyday contexts (e.g., phone notifications, alarms), participants may have struggled to interpret the vibrotactile codification introduced in our study, as it conflicted with their established associations of vibration. For instance, P32 reported that vibration elicited a “sense of urgency that felt distracting”. This interpretation is consistent with recent reviews of haptics in VR, which emphasize that vibrotactile cues are often perceived as notifications or alerts due to their prevalent everyday use (Jacucci et al. 2024). While vibrotactile actuators are widely adopted because of their simplicity, they offer only a restricted subset of haptic sensations compared to richer modalities, limiting their ability to convey nuanced meanings. These perceptual constraints, combined with the semantic expectations of vibration, may explain the reduced effectiveness of vibrotactile encoding in our study. Different approaches to haptic encoding may prove more effective in conveying information than the one proposed here and could lead to different performance outcomes, as shown by Prouzeau et al. (2019).
Despite of the negative points listed before, there were three (3) instances of participants who in contrast reported to find the vibration encoding very useful and reinforcing their selection. These participants had in common a very low self-reported musical ability: “Vibration was for sure helping me, I could feel these slight differences that with sound...I don’t have a musical ear, that’s for sure” said participant 9.
Regarding the combination of all three sensory mappings (SVA), we can see that effects of adding sensory modalities did not lead to an additive increase in accuracy nor altered any other performance metric. This aligns with previous work (Fröhlich and Wachsmuth 2013) which questions the value of simply adding modalities as not necessarily providing benefits (e.g., in presence). We hypothesize that in our case the limitations found in the SV condition play an important role in this non-additive effect, but other confounding factors were identified. In this condition, eight (8; 16%) participants reported feeling overstimulated, not being able to focus on both sound and vibration: “I thought the task would be easier with all the different stimuli combined but in the end, I felt like vibration specifically just making it hard for me to focus in general” said participant 39. Others felt like they wanted to focus on a specific type of stimuli (e.g.,“Vibration seemed useful to me, but when sound was also present I could not pay attention to it” said participant 16), and in fact, fourteen (14; 29%) participants reported to have completely ignored vibration in the SVA condition, mostly because they decided to shift their attention to audio which they found more reliable to discriminate between nodes. A number of participants (7; 14.5%) also reported that overstimulation lead them to feel that the different data representations (sensory mappings) were not aligned (e.g., a node which appeared to be the largest and had the lowest pitch perceived as vibrating less than other). Comments from participant 3 illustrate this problem: “[...] I found a very big node that was not emitting a sound and vibration that I expected. In the end I felt like I could not even differentiate nodes anymore. I got very frustrated”. These events seem to be related to cognitive overload and sensory crosstalk (Marriott et al. 2018), however, adequately characterizing their origin is beyond the scope of this paper.
Observations of a decrease on performance and feeling overwhelmed due to “simply adding” more sensory modalities were also reported in previous work by Melo et al. (2022). Participants in their study found themselves distracted and overwhelmed the more multisensory stimuli available. However, contrary to them, we did not observe a significant increase on workload with the addition of stimuli (see Fig. 6) which could have helped to explain the origin of this feeling of being too stimulated. Additionally, the vibration rendering device and its placement (in the back of the hand) were mentioned by participants as not contributing to perceiving the stimuli, which highlights the need of carefully choosing the area of stimulation depending on the haptic properties leveraged (Wenzel and Godfroy-Cooper 2021). Regardless of the inability of some participants to differentiate among vibration intensity levels, the materials we used to render the haptic stimulation could have played a role in the decrease of task performance (as discussed also by Våpenstad et al. (2013)). Additionally, it is worth considering the fact that the level of familiarity with different multisensory stimuli prior to the task may also be a confounding variable. While participants were in general accustomed to audio stimulation and could tell elements apart using just such cues, they were not familiar with using vibration as a way to distinguish between elements. The literature suggests that consistently decoding vibrotactile information would require substantial training that our participants lacked (Van Erp 2005).
Finally, for our baseline sensory mapping (S), we observed that the lack of other sensory mappings to encode information lead to a larger number of manipulations of elements in the visualizations, as well as relocation of the participants themselves in-front of or around the visualizations. This effect also seems to be visible in the SV condition at a lower scale, but may be explained by uncertainty, frustration and lack of perceived effectiveness of the vibration mapping, which just introduced complexity in the visualization without contributing to making the task easier, at least for the majority of participants.
Impact of data representations on behaviour and user experience
Most participants reported to approach trials in a similar way across experimental conditions, using the visual size as the first feature for filtering irrelevant nodes, to then shift their attention to the second type of feedback present (if any). When sound was present (SA and SVA), there is a clear tendency to remain stationary and eliciting non-visual stimuli repeatedly instead of trying to rearrange elements.
When both vibration and sound were present in a trial (SVA), sound was largely preferred as discriminator (21 ; 43%): “As a trial began, I was able to discard by size, but then I found it specially useful to confirm my selection through the sound feedback” said participant 43. In fact, a significant percentage of (29%) participants reported to have ignored vibration on purpose to conduct these trials, focusing solely in the audio component. Audio has been reported to elicit an affective response in users (Kramer 2000) which could explain this tendency.
Interestingly, four (4) participants indicated that in the SV condition they used the sound emitted by the vibrating motor as sensory mapping, contrary to what we had designed : “I feel like even though it’s supposed to be doing touch, because of the fact that the stronger the buzz, the higher pitched the noise that comes in, I feel like I could still kind of rely on the sound” said participant 40. This same event was observed in Prouzeau et al. (2019), in whose study one user leveraged vibration sound instead of the tactile feeling for the task.
We observed that participants developed more varied strategies when no redundant mapping was present (S condition). Participants moved more around the environment, rearranged more elements of the visualization in this condition, even turning to overlapping, which was barely leveraged by participants when other sensory mappings were present. In relation to this, participant 33 said “[...] without the extra help, I realised that the only way for me to increase precision was to grab a sphere and making it so it overlapped another. I tried that once in a different trial but I didn’t find it useful then”. Additionally, interviews showed that participants felt more insecure about their performance and perceived a leap in difficulty of the task when they had no redundant mapping present (S condition). “When only size was present, I literally could not tell three of the nodes apart, I ended up picking randomly” (participant 1). This could explain why the size-only sensory mapping elicited more varied strategies. The quantitative results for manipulations per trial (see Figure 5) also suggest that participants struggled to perform visual comparisons and this lead to the different strategies.
Multisensory feedback was reported to be useful to compensate for spatial perspective. Even in the case of vibration, the redundant encoding served to underline nodes that otherwise would have been overlooked. A comment from participant 1 illustrates this observed benefit of multimodality: “Generally, I tended to discard the nodes that were further away, but when a second type of feedback was present, I could tell if my assumption was right by quickly touching the nodes, so I started doing that every time to be sure”. In relation to this convenience of having an extra feedback, although vibration was not found useful to encode information as such (as evidenced by understandability ranking and accuracy), participants reported they found it useful for the overall experience as feedback when ‘colliding’ with an object. Additionally, the haptic feedback in the SV condition was found effective to discard very low relevance items quickly by touching. This point is supported by previous work that outlined the effectiveness of haptic encodings to highlight salient features in visualizations (Paneels and Roberts 2010). Nine (9) participants relied on this mechanism in the SV condition.
Specific mappings elicited unique approaches for problem-solving. Interestingly, four (4) participants turned to singing the notes emitted by nodes as a way to feel the difference or externalize the information they needed for one-to-one comparisons. This suggests that the favorable results obtained through the sonic redundancy mapping (especially in isolation—SA condition), may be rooted in our familiarity and exposure to natural sound encodings in our everyday lives. In the S condition, a significant number of participants leveraged crouching and kneeling (14; 29%), a strategy also leveraged but to a lower extent in the SV (only 4 participants). We hypothesize that while in S users had to base all decisions in their ability to discern visually, in the SV condition they found themselves more doubtful attempting to perceive differences haptically. This idea is supported by users turning to strategies that emerged solely in the SV condition, such as touching two nodes simultaneously (one with each hand)—observed in 5 participants, and repeatedly “punching” the same node with both hands in an attempt to compare perceived stimuli—leveraged by 56% of the participants in this condition.
Regarding user preferences and acceptance of multisensory stimulation, further analysis revealed that most people ranked the combination of all three multisensory encodings (SVA) as providing the highest enjoyment because they felt the increased immersion level as something desired. People ranking SA first reported that their enjoyment was mostly driven by their ability to feel performant in the task. Participants which ranked the unimodal condition (S) as the most enjoyable reported to base their ranking in the perceived challenge. The lack of redundant mappings made them more uncertain and explore further in the visualization, approaching each trial of this kind like a puzzle to solve.
Participants’ brainstorming of multisensory mappings
We wanted to delve into the attitude of users towards multisensory stimulation for data representation, so we asked participants in our post-interview to brainstorm on possible non-visual sensory mappings they would like to experience. Though these are individual comments that do not provide empirical evidence of the appropriateness of the proposed encoding, they can inspire future developments and, specially, they demonstrate that different people expect or value different stimuli and codifications.
The visualization and task they conducted for our study served as example of application of data representation that participants were asked to reflect upon when trying to come up with their custom sensory mappings. We report the aggregated conclusions from this informal brainstorming session hoping they could serve as starting point for designing future data representations that are not created arbitrarily, but instead driven by user preferences.
Hearing
Eight (8) participants found our sensory mapping so useful that they said they could not come up with a more suitable alternative mapping leveraging audio. Five (5) participants reported loudness as a way they would expect audio to be used for data codification. Timbre (different musical instruments) was reported by five (5) participants as a desired way to encode specific data elements. Among the proposals we found some interesting comments regarding how to approach data encoding. For instance, participant 1 stated: “Imagine that you define like a list of instruments whose quality tells you about order or relevance. There’s a bass. and suddenly there’s a xylophone... or a flute which I associate to something being lighter. The idea is having different instruments associated with weights. You could even change both pitch and instrument to encode information”. Using timbre for data encoding has been considered in the past (e.g., Nagel et al. (2008)) but claimed to be mainly suitable to represent categorical variables. Another scheme proposed by participant 26 was to represent higher relevance with increasing pitches (contrary to what we did) and also define this same ordinal scale with different instruments with a different associated timbre. Spatial audio was surprisingly not mentioned too often (only four participants) and we believe that is due to the fact that they simply take it for granted. Familiarity of sound and customization was reported as very important by three (3) participants (e.g., ability to assign custom sounds for marking elements, chords, melodies). Other mentions are encoding data with the number of instruments playing at the same time, and playing around with tempo.
Touch
The majority of comments we received from participants regarding data encodings leveraging tactile properties involved feeling some kind of weight or pressure sensation on the skin when interacting with objects (11 participants). Another possibility for encoding posed by eleven (11; 23%) participants was extending the area stimulated by vibration. Several participants reported that the placement and very localized nature of the proposed encoding (vibration on the back of the hand) did not help with perception. Instead, more sensitive areas could have been used for sensory presentation as described by participant 19: “I can imagine feeling vibration all over my body, or at least a larger area. The more area vibrating perhaps could encode this relevance”. Temperature as encoding mechanism was mentioned by nine (9) participants (proposing higher temperature as indicator of more relevant). Leveraging texture properties for data representation was suggested by seven (7) participants. They pictured the encoding as making use of levels of squishiness, bumpiness or polygonal complexity as indicators of relevance. Rhythm and vibration patterns were mentioned by five (5) participants as interesting approaches that could be naturally combined with sonic rhythm to represent order relationships between elements (e.g., using music with increasing beats per minute that you can also feel). The idea of using tempo/rythmic patterns for data representation has been considered in past studies in sonification (e.g., Krygier (1994)), where it was posed as specially useful to convey ordinal information. Among the elicited encodings, duration (temporal variation) and body locus (i.e., the location of the tactile stimulation on the body) have been highlighted as particularly promising dimensions for conveying information through the tactile modality (Wenzel and Godfroy-Cooper 2021). More creative proposals include using humidity and windiness (the more wind a node generates in whatever part of the body you are using, the more relevant it is).
Smell/taste
These two senses received less attention from participants, partially because many reported to feel constrained to what was possible realistically to achieve through multisensory interfaces. We explicitly asked them to reflect on all senses with taking into account no technological restrictions. Among the very few proposals of sensory mappings we find un/pleasantness as scale to represent order (3 participants) and using configurable smells as markers on a visualization (2 participants).
Multisensory redundant codification potential and implications for future studies
It seems clear from our study that multisensoriality in IA applications has the potential to change how human analysts explore and make sense of immersive data visualizations. The use of various sensory channels to convey information in a redundant manner seems to help distinguish among data elements that would otherwise be hardly impossible to tell apart. Specifically, we found that audio (and vibration to a lesser extent) proved useful to compensate some drawbacks of visual encodings such as occlusion or perspective size estimation. This alone constitutes a very strong point we hope will foster research in the direction of multisensory redundant data codification in the future. Moreover, sensory channels and variables used for data representation seem to elicit changes in the immersive analysis experience by affecting objective measures such as performance, but also subjective elements such as frustration level, perceived difficulty, challenge-skill balance or self-perceived performance.
Regarding the mere efficacy to convey information multisensorily, designers must pay special attention in making sure users can actually perceive different levels of stimuli across the sensory mappings chosen for data representation. In this study we only chose two non-visual mappings as an exploratory step towards understanding the implications of using redundancy, but there are countless multisensory variables that could potentially be used for data encoding. For instance, many participants posed the use of proprioceptive stimulation as a way to encode relevance as “weight” in a visualization. Although this an other possibilities such as temperature, sonic timbre or tempo could all well be effective mappings for data, future research will tell which mappings and combinations can fit better different types and report more benefits in terms of performance or cognitive metrics.
Since mappings not only affect task performance, but also have been shown to afford different strategies, more research shall be devoted to understand not only which data representations work better in term of objective task-centered measures, but also should consider whether some mappings could elicit more exploratory strategies on users, or foster interaction and enjoyment.
Another conclusion from both quantitative and qualitative observations of our study is the great variability in preferences and perceived usefulness of multisensory stimuli for the task. Some users might prefer receiving stimuli through the sense of touch because it helps their spatial awareness, while others would prefer audio because of efficiency, comfort or familiarity. Some others might want to switch between them to leverage specific strengths of a given codification in some scenarios. Identifying perceptual capabilities for different users and adapting systems and multisensory stimuli rendering devices to them (whether manually or automatically using artificial intelligence systems), is a challenge worth tackling in the future. All the aforementioned research lines we outlined, call for the development of tools that allow for prototyping and testing sensory mapping proposals in general, to see whether they can serve IA tasks and determine their affordances and limitations. Our study served to show that not only the effectiveness of mappings depends on perceptual factors, individual abilities or capabilities, but also users may use mappings in unexpected ways that do not align with what was designed (e.g., using the sound of vibration instead of actual vibration to discern among elements).
In a different topic, although our study dealt with a very constrained, small point-cloud visualization (8 nodes), multisensory redundant mappings and the conclusions from our study could serve and be applied to larger visualizations. We believe that multisensory codification does not necesarily have to be present at all times in a visualization. It does not even make sense due to known perceptual differences across senses, such as limited distinguishable pitches, vibration levels and representational ranges (e.g., audible spectrum). Instead, we believe that multisensory redundant mappings can be more effective as a way to implement “detail on demand” or a “multisensory fish-eye-view” mechanism. This means that such mappings and their combination can be effectively applied to subsets of datasets that analysts may want to explore in detail, where multisensory stimulation could even allow for the identification of salient features that could go unnoticed using visual data representations.
We argue that multisensory codification can be especially valuable in helping users explore node-link graph visualizations (Drogemuller et al. 2018). These visual representations, often highly complex, are widely used in areas such as social network analysis (Tabassum et al. 2018), ontology/knowledge graph exploration (Onorati 2013), and even the design or interpretation of neural network topologies (Linse et al. 2022). They bring together large sets of intricately interconnected data that are not easy to interpret.9
For example, in social network analysis (e.g., where nodes represent users and directed edges capture relationships) visual properties like node size or the incoming edges to a node can convey an individual’s influence within the network. This same information could be represented multisensorily allowing for a quicker scan through the large map of interconnected entities without the need to hide or “visually filter” elements, helping users interpret and mentally represent differences between influencers through a more embodied approach. Multisensory encodings could also be applied in a non-redundant fashion, having different attributes (as user type, average activity, or follower growth over time) mapped to other sensory channels to allow parallel access while the visual channel is devoted to other processes (e.g., node labelling, node manipulation).
By strategically leveraging the strengths of different senses for data encoding, analysts could come up with new ways of interacting with such visualizations and potentially draw more robust conclusions or find new interesting directions diverging from those obtained following a traditional analysis pipeline.
Limitations
An important limitation is that participants did not receive task-based training for the haptic encoding before starting the task (as they had for sound). This could have affected the ability of participants to discriminate between elements since also the haptic mapping was reported as the least familiar for participants.
The fidelity of haptic stimuli is still an open challenge (Culbertson et al. 2018). Vibration encodings could become more complex with the adequate hardware devices in the future, possibly allowing to leverage more perceptible differences in haptic rendering that could encode a larger range of value. However, it is also important to consider the attitude of users towards stimulation, since people might find extensive vibration uncomfortable. With more complex haptic devices other tactile experiences, like complex vibrotactile patterns (see tactons, Brown et al. (2005)), temperature or texture could be also explored.
In our study, we assumed vision as always present in visualizations, however it may be interesting to explore whether the use of the proposed non-visual encodings on its own can be used for successful sensory substitutes for visually impaired users as suggested previously by Patnaik et al. (2019).
Conclusions and future work
In this paper, we evaluated three redundant multisensory mappings to encode data in Immersive Analytics and the effect of these in performance and behaviour of users in the context of a small point-cloud visualization. Results showed promising potential of leveraging multisensory redundancy as a means for effective data representation when visuals prove to be insufficient. Specifically, the use of sound (pitch) to encode data was found to improve accuracy and perceived self-performance, apart from being reported as highly helpful by participants. Vibration was observed to contribute to the experience of users by reaffirming the encounter of 3D elements and also helping them to quickly detect non-relevant ones.
Our findings suggest that redundant multisensory mappings can serve to compensate for the limitations of other sensory modalities in IA applications which opens up research directions to propose codifications and evaluate them. We observed that users had a wide range of preferences regarding multisensory stimulation and reacted differently to the presented codifications in terms of how they approached the tasks and to what extent they found redundancies useful. This factor along with the reported individual perceptual abilities and cognitive issues detected (e.g., feeling overwhelmed, overload), emphasize the importance of customization, whether intelligent or not, of multisensory interfaces. Additionally, given that mapping data to sensory variables (e.g., texture, loudness, timbre, etc.) involves devising artificial relationships between data and how their meaning is conveyed to users, the design of these multisensory mappings must be derived from iterative prototyping to evaluate their effects not only in objective metrics but also in terms of affordances. Rapid prototyping tools, which alleviate the effort required to implement multisensory AI scenarios, are key to support the required experimentation to eventually derive guidelines or heuristics for this domain. Additionally, the effect of a proper training on the multisensory mappings can also play a role on effectiveness of encodings. The way users get familiar with “artificial” mappings should be carefully considered before making assumptions on adequacy or alleged suitability of a given mapping or combination of mappings.
Before multisensory redundant data representation can be applied, research work in the near future should focus on developing tools that can help researchers iterate on multisensory mappings, study their effectiveness, and provide means to adapt these interfaces and stimuli to the capabilities and preferences of analysts.
Acknowledgements
This work is supported by the Spanish State Research Agency (AEI) under grant Sense2MakeSense (PID2019-109388GB-I00).
Author Contributions
A.R. is the main contributor to this work, carrying out the development of the immersive application used in the user study and its execution with participants. A.R. also carried out all quantitative and qualitative analyses as well as the writing of the manuscript. A.B. contributed to the writing and iterating on the Introduction and Abstract of the paper. All authors reviewed the manuscript and contributed to the conception or design of the research work.
Data Availability
No datasets were generated or analysed during the current study.
Declarations
Conflict of interest
The authors declare no Conflict of interest.
https://sonification.design/.
2http://dataphys.org/.
3https://developers.antilatency.com/Hardware/Bracer_en.html.
4https://obsproject.com/.
5https://developer.oculus.com/downloads/package/oculus-developer-hub-win/.
6https://unity.com/.
7https://www.steinberg.net/cubase/.
8https://ffmpeg.org/.
9See https://snap.stanford.edu/data/ for sample datasets.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Andreasen, A; Geronazzo, M; Nilsson, NC; Zovnercuka, J; Konovalov, K; Serafin, S. Auditory feedback for navigation with echoes in virtual environments: training procedure and orientation strategies. IEEE Trans Vis Comput Graph; 2019; 25,
Berger, M; Bill, R. Combining VR visualization and sonification for immersive exploration of urban noise standards. Multimodal Technol Interact; 2019; 3,
Bhardwaj A, Chae J, Noeske RH, Kim JR (2021) TangibleData: interactive data visualization with mid-air haptics. In: Proceedings of the 27th ACM symposium on virtual reality software and technology. ACM, Osaka, Japan, pp 1–11
Brookes, J; Warburton, M; Alghadier, M; Mon-Williams, M; Mushtaq, F. Studying human behavior with virtual reality: the unity experiment framework. Behav Res Methods; 2020; 52,
Brown L, Brewster S, Purchase H (2005) A first investigation into the effectiveness of tactons. In: First joint Eurohaptics conference and symposium on haptic interfaces for virtual environment and teleoperator systems. IEEE, Pisa, Italy, pp 167–176
Brown L, Brewster S, Ramloll R, Yu W, Riedel B (2002) Browsing modes for exploring sonified line graphs
Carpendale MST (2003) Considering visual variables as a basis for information visualisation. https://doi.org/10.11575/PRISM/30495
Conrad, FG; Couper, MP; Tourangeau, R; Peytchev, A. The impact of progress indicators on task completion. Interact Comput; 2010; 22,
Culbertson, H; Schorr, SB; Okamura, AM. Haptics: the present and future of artificial touch sensation. Ann Rev Control Robot Auton Syst; 2018; 1,
Drogemuller A, Cunningham A, Walsh J, Cordeil M, Ross W, Thomas B (2018) Evaluating navigation techniques for 3D graph visualizations in virtual reality. In: 2018 International symposium on big data visual and immersive analytics (BDVA). IEEE, Konstanz, pp 1–10
Dubus, G; Bresin, R. A systematic review of mapping strategies for the sonification of physical quantities. PLoS ONE; 2013; 8,
Ens B, Bach B, Cordeil M, Engelke U, Serrano M, Willett W, Yang Y (2021) Grand challenges in immersive analytics. In: Proceedings of the 2021 CHI conference on human factors in computing systems. ACM, Yokohama, Japan, pp 1–17
Faul, F; Erdfelder, E; Lang, A-G; Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods; 2007; 39,
Feick M, Kleer N, Tang A, Krüger A (2020) The virtual reality questionnaire toolkit. In: Adjunct publication of the 33rd annual ACM symposium on user interface software and technology. ACM, Virtual Event USA, pp 68–69
‘Floyd’ Mueller F, Dwyer T, Goodwin S, Marriott K, Deng J, D Phan H, Ashok Khot R (2021) Data as delight: eating data. In: Proceedings of the 2021 CHI conference on human factors in computing systems. ACM, Yokohama, Japan, pp 1–14
Fonnet, A; Prie, Y. Survey of immersive analytics. IEEE Trans Vis Comput Graph; 2021; 27,
Fröhlich, J; Wachsmuth, I et al. Hutchison, D et al. The visual, the auditory and the haptic—a user study on combining modalities in virtual worlds. Virtual augmented and mixed reality. Designing and developing augmented and virtual environments; 2013; Berlin, Springer: pp. 159-168. [DOI: https://dx.doi.org/10.1007/978-3-642-39405-8_19]
Ho, C; Spence, C. Using peripersonal warning signals to orient a driver’s gaze. Hum Factors J Hum Factors Ergon Soc; 2009; 51,
Ho, C; Gray, R; Spence, C. Role of audiovisual synchrony in driving head orienting responses. Exp Brain Res; 2013; 227,
Hogan, T; Hornecker, E. Towards a design space for multisensory data representation. Interact Comput; 2016; [DOI: https://dx.doi.org/10.1093/iwc/iww015]
Hornecker, E; Hogan, T; Hinrichs, U; Van Koningsbruggen, R. A design vocabulary for data physicalization. ACM Trans Comput-Hum Interact; 2024; 31,
Jacucci, G; Bellucci, A; Ahmed, I; Harjunen, V; Spape, M; Ravaja, N. Haptics in social interaction with agents and avatars in virtual reality: a systematic review. Virtual Real; 2024; 28,
Jansen Y (2014) Physical and tangible information visualization (Unpublished doctoral dissertation). Ecole Doctorale Informatique Paris-Sud
Jay, C; Stevens, R; Hubbold, R; Glencross, M. Using haptic cues to aid nonvisual structure recognition. ACM Trans Appl Percept; 2008; 5,
Jeong, W. Multimodal trivariate thematic maps with auditory and haptic display. Proc Am Soc Inf Sci Technol; 2005; 42,
Jeong W, Gluck M (2002) Multimodal bivariate thematic maps with auditory and haptic display
Jones, LA; Sarter, NB. Tactile displays: guidance for their design and application. Hum Factors J Hum Factors Ergon Soc; 2008; 50,
Klein, K; Sedlmair, M; Schreiber, F. Immersive analytics: an overview. it - Inf Technol; 2022; 64,
Korkut, EH; Surer, E. Visualization in virtual reality: a systematic review. Virtual Real; 2023; 27,
Kramer, G. Auditory display: sonification, audification and auditory interfaces; 2000; Boston, Addison-Wesley Longman Publishing Co. Inc.:
Kramer G, Walker B, Bonebright T, Cook P, Flowers JH, Miner N, Neuhoff J (2010) Sonification report: status of the field and research agenda
Kraus, M; Fuchs, J; Sommer, B; Klein, K; Engelke, U; Keim, D; Schreiber, F. Immersive analytics with abstract 3D visualizations: a survey. Comput Graph Forum; 2022; 41,
Krygier, JB. Sound and geographic visualization. Modern cartography series; 1994; Amsterdam, Elsevier: pp. 149-166. [DOI: https://dx.doi.org/10.1016/B978-0-08-042415-6.50015-6]
Lecuyer, A; George, L; Marchal, M. Toward adaptive VR simulators combining visual, haptic, and brain–computer interfaces. IEEE Comput Graph Appl; 2013; 33,
Linse, C; Alshazly, H; Martinetz, T. A walk in the black-box: 3D visualization of large neural networks in virtual reality. Neural Comput Appl; 2022; 34,
Lisle L, Davidson K, Gitre EJ, North C, Bowman DA (2021) Sensemaking strategies with immersive space to think. In: 2021 IEEE virtual reality and 3D user interfaces (VR). IEEE, Lisboa, Portugal, pp 529–537
Marriott, K et al. Immersive analytics; 2018; Cham, Springer:
Martin, D; Malpica, S; Gutierrez, D; Masia, B; Serrano, A. Multimodality in VR: a survey. ACM Comput Surv; 2022; 54,
Melo, M; Goncalves, G; Monteiro, P; Coelho, H; Vasconcelos-Raposo, J; Bessa, M. Do multisensory stimuli benefit the virtual reality experience? A systematic review. IEEE Trans Vis Comput Graph; 2022; 28,
Miller, J. Channel interaction and the redundant-targets effect in bimodal divided attention. J Exp Psychol Hum Percept Perform; 1991; 17,
Moloney, J; Spehar, B; Globa, A; Wang, R. The affordance of virtual reality to enable the sensory representation of multi-dimensional data for immersive analytics: From experience to insight. J Big Data; 2018; 5,
Munzner, T. Visualization analysis and design; 2016; Boca Raton, CRC Press:
Nagel, HR; Granum, E; Bovbjerg, S; Vittrup, M. Simoff, SJ; Böhlen, MH; Mazeika, A. Immersive visual data mining: the 3DVDM approach. Visual data mining; 2008; Berlin, Springer: pp. 281-311. [DOI: https://dx.doi.org/10.1007/978-3-540-71080-6_18]
Nesbitt KV (2001) Modeling the multi-sensory design space
Nesbitt KV, Hoskens I (2008) Multi-sensory game interface improves player satisfaction but not performance
Onorati T (2013) SEMA4A: a knowledge base for accessible evacuation and alert notifications in emergencies
Oskarsson, P-A; Eriksson, L; Carlander, O. Enhanced perception and performance by multimodal threat cueing in simulated combat vehicle. Hum Factors J Hum Factors Ergon Soc; 2012; 54,
Paneels, S; Roberts, JC. Review of designs for haptic data visualization. IEEE Trans Haptics; 2010; 3,
Parise, CV. Crossmodal correspondences: standing issues and experimental guidelines. Multisens Res; 2016; 29,
Parise, CV; Spence, C; Ernst, MO. When correlation implies causation in multisensory integration. Curr Biol; 2012; 22,
Patnaik, B; Batch, A; Elmqvist, N. Information olfactation: harnessing scent to convey data. IEEE Trans Vis Comput Graph; 2019; 25,
Posner, MI; Nissen, MJ; Klein, RM. Visual dominance: an information-processing account of its origins and significance. Psychol Rev; 1976; 83,
Prouzeau A, Cordeil M, Robin C, Ens B, Thomas BH, Dwyer T (2019) Scaptics and highlight-planes: immersive interaction techniques for finding occluded features in 3D scatterplots. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, Glasgow, Scotland, UK, pp 1–12
Rey, A; Bellucci, A; Díaz, P; Aedo, I. Abdelnour Nocera, J; KristínLárusdóttir, M; Petrie, H; Piccinno, A; Winckler, M. The effect of teleporting versus room-scale walking for interacting with immersive visualizations. Human–computer interaction—INTERACT 2023; 2023; Cham, Springer: pp. 110-119. [DOI: https://dx.doi.org/10.1007/978-3-031-42283-6_6]
Rey A, Bellucci A, Diaz Perez P, Aedo Cuevas I (2022) IXCI: The Immersive eXperimenter Control Interface. In: Proceedings of the 2022 international conference on advanced visual interfaces. ACM, Frascati, Rome, Italy, pp 1–3
Rizzolatti, G; Fadiga, L; Fogassi, L; Gallese, V. The space around us. Science; 1997; 277,
Saffo, D; Bartolomeo, SD; Crnovrsanin, T; South, L; Raynor, J; Yildirim, C; Dunne, C. Unraveling the design space of immersive analytics: a systematic review. IEEE Trans Vis Comput Graph; 2023; [DOI: https://dx.doi.org/10.1109/TVCG.2023.3327368]
Skarbez, R; Polys, NF; Ogle, JT; North, C; Bowman, DA. Immersive analytics: theory and research agenda. Front Robot AI; 2019; 6, 82. [DOI: https://dx.doi.org/10.3389/frobt.2019.00082]
Spence, C. Audiovisual multisensory integration. Acoust Sci Technol; 2007; 28,
Spence, C. Crossmodal correspondences: a tutorial review. Atten Percept Psychophys; 2011; 73,
Spence, C; Ho, C. Boehm-Davis, DA; Durso, FT; Lee, JD. Multisensory information processing. APA handbook of human systems integration; 2015; Washington, American Psychological Association: pp. 435-448. [DOI: https://dx.doi.org/10.1037/14528-027]
Spence, C; Nicholls, MER; Driver, J. The cost of expecting events in the wrong sensory modality. Percept Psychophys; 2001; 63,
Stein, BE. The new handbook of multisensory processing; 2012; Cambridge, The MIT Press:
Tabassum, S; Pereira, FSF; Fernandes, S; Gama, J. Social network analysis: an overview. WIREs Data Min Knowl Discov; 2018; 8,
Tak S, Toet L (2013) Towards interactive multisensory data representations. In: Proceedings of the international conference on computer graphics theory and applications and international conference on information visualization theory and applications. SciTePress - Science and and Technology Publications, Barcelona, Spain, pp 558–561
Van Erp J (2005) Vibrotactile spatial acuity on the torso: effects of location and timing parameters. In: First joint Eurohaptics conference and symposium on haptic interfaces for virtual environment and teleoperator systems. IEEE, Pisa, Italy, pp 80–85
Våpenstad, C; Hofstad, EF; Langø, T; Mårvik, R; Chmarra, MK. Perceiving haptic feedback in virtual reality simulators. Surg Endosc; 2013; 27,
Wai Y, Brewster S (2002) Comparing two haptic interfaces for multimodal graph rendering. In: Proceedings 10th symposium on haptic interfaces for virtual environment and teleoperator systems. HAPTICS 2002. IEEE Computer Society, Orlando, FL, USA, pp 3–9
Wang Y, Ma X, Luo Q, Qu H (2016) Data edibilization: representing data with food. In: Proceedings of the 2016 CHI conference extended abstracts on human factors in computing systems. ACM, San Jose, CA, USA, pp 409–422
Wenzel EM, Godfroy-Cooper M (2021) The role of tactile cueing in multimodal displays: application in complex task environments for space exploration
Yu W, Brewster S (2002) Multimodal virtual reality versus printed medium in visualization for blind people
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.