Content area
Purpose
This study aims to investigate the role of information normalization in online healthcare consultation, a typical complex human-to-human communication requiring both effectiveness and efficiency. The globalization and digitization trend calls for high-quality information, and normalization is considered an effective method for improving information quality. Meanwhile, some researchers argued that excessive normalization (standardized answers) may be perceived as impersonal, repetitive, and cold. Thus, it is not appreciated for human-to-human communication, for instance, when patients are anxious about their health condition (e.g. with high-risk disease) in online healthcare consultation. Therefore, the role of information normalization in human communication is worthy to be explored.
Design/methodology/approach
Data were collected from one of the largest online healthcare consultation platforms (Dxy.com). This study expanded the existing information quality model by introducing information normalization as a new dimension. Information normalization was assessed using medical templates, extracted through natural language processing methods such as Bidirectional Encoder Representations from Transformers (BERT) and Latent Dirichlet Allocation (LDA). Patient decision-making behaviors, namely, consultant selection and satisfaction, were chosen to evaluate communication performance.
Findings
The results confirmed the positive impact of information normalization on communication performance. Additionally, a negative moderating effect of disease risk on the relationship between information normalization and patient decision-making was identified. Furthermore, the study demonstrated that information normalization can be enhanced through experiential learning.
Originality/value
These findings highlighted the significance of information normalization in online healthcare communication and extended the existing information quality model. It also facilitated patient decision-making on online healthcare platforms by providing a comprehensive information quality measurement. In addition, the moderating effects indicated the contradiction between informational support and emotional support, enriching the social support theory.
1. Introduction
In the era of big data, individuals are inundated with vast amounts of information on a daily basis. However, due to limitations in information processing capacity, making optimal decisions can be challenging, if not impossible, for most people (Lachman et al., 2015). This challenge is exacerbated by the “garbage in, garbage out” effect, where low information quality further constrains decision-making capabilities (Kilkenny and Robinson, 2018). Recognizing the pivotal role of information quality in fostering efficient communication and decision-making, the evaluation of information quality has emerged as a critical issue (Lee et al., 2002). Nowhere is this problem more critical than in the healthcare domain, given the profound implications of medical errors. Studies have underscored that high information quality is fundamental to successful communication, enabling physicians to make precise judgments and enhancing patients’ satisfaction (Brown et al., 2019). Wang and Strong's (1996) information quality framework has served as a valuable guide for evaluating information quality. Numerous studies across various domains have been conducted based on this framework, demonstrating its utility (e.g. McKinney et al., 2002; Wixom and Watson, 2001; Zhu and Gauch, 2000). However, research indicated that this classical model requires expansion and adjustment to address new contexts (Lee et al., 2002), particularly with the recent orientation toward digitization (Maass et al., 2018). In response to this need, this paper introduced information normalization as a novel dimension of information quality and examined its role in communication within the context of online healthcare consultation.
The concept of information normalization can be traced back to Bailey and Pearson's (1983) study on computer user satisfaction. In their research, they identified 39 factors of computer-output information that influenced user satisfaction, encompassing aspects such as timeliness, completeness, format of output, and relevancy. Specifically, the factor of format was described as “the material design of the layout, and the display of the output contents”. While not formally termed as such, subsequent researchers recognized this concept as the precursor to information normalization (DeLone and McLean, 1992; Doll and Torkzadeh, 1988; Wang and Strong, 1996). Over time, scholars have extensively discussed and expanded upon the factor of format, enriching the model by incorporating dimensions such as concise representation, clarity, understandability, interpretability, and consistency (DeLone and McLean, 1992; Goodhue, 1995; Jarke and Vassiliou, 1997; Wang and Strong, 1996). However, much of this discourse has been centered around computer-computer or computer-human communication, with a primary focus on computer systems rather than human beings, particularly in the context of computer-generated content (Bailey and Pearson, 1983; DeLone and McLean, 1992; Doll and Torkzadeh, 1988).
In recent years, there has been a growing emphasis on the development of human-centered technology, including initiatives such as human-centered artificial intelligence (AI) or emotional AI. Within this paradigm, the enhancement of human living and working quality takes precedence. Technologies like online healthcare consultation platforms are viewed as tools or aids in this endeavor. Consequently, there is a need for a shift in the understanding and definition of information normalization toward a human-centered approach, where the characteristics of information are tailored to be more conducive to human cognition. In line with this perspective, we define information normalization as the structure and logic of information, delineating how information is organized and presented to optimize human comprehension and interaction.
In the healthcare sector, the advent of advanced information technology is reshaping the behaviors of participants, with online healthcare communities representing a prominent example. These platforms enable physicians and patients to overcome geographical barriers and engage in online communication (Mousavi et al., 2020). However, despite the existence of various rating systems within these platforms, such as patients’ recommendations of specific physicians, these rating scores often suffer from simplicity, subjectivity, and a lack of transparency. General ratings provided by patients, typically ranging from 0 to 5, heavily rely on subjective preferences and are constrained by patients’ professional knowledge levels. For instance, patients with serious illnesses like cancer or AIDS may prioritize qualities such as politeness and kindness when assessing physicians’ skills (Curtis et al., 2002; de Waard et al., 2018). In addition, many rating systems are created by the aggregation of various factors (e.g. consultation count and patient ratings) on the platform, in which the calculation rule is opaque. This “black box” evaluation approach frequently engenders confusion and distrust among patients, potentially leading to biased decision-making when seeking medical services.
To address this challenge, we draw upon the theoretical framework of information quality and devise a more comprehensive and transparent model for assessing service quality. Unlike traditional rating systems that indirectly measure service quality through patient ratings and comments, our approach directly evaluates service quality by analyzing physicians’ responses. This shift aims to foster more objective assessments of healthcare services while minimizing the influence of patient subjectivity. Furthermore, by exploring multiple dimensions of information quality, we can obtain more nuanced evaluations of physicians’ answer quality. Simultaneously, we enhance the interpretability of the service evaluation framework by integrating indicators with practical significance, thereby empowering patients to make informed decisions about their healthcare options.
Information normalization is an urgent call in the field of online healthcare. Communication in healthcare exhibits high information asymmetry, in which physicians’ professional expressions are extremely difficult for patients to understand. In this case, a well-organized answer would highlight the significant points, reduce the complexity, and decrease patients’ learning/cognitive costs, thereby increasing patient satisfaction. It could be an effective way for healthcare communication to avoid misunderstanding and finally medical accidents.
In online environments, the necessity for information normalization becomes increasingly obvious. The digital mode of communication presents unique challenges for effective interaction. Unlike face-to-face communication where physicians and patients can engage in numerous interactions, online consultations are often constrained by platform limitations. Physicians must convey information clearly within restricted parameters, typically limited to a few sentences or characters. It necessitates efficient communication strategies. Information normalization emerges as a viable solution to this dilemma. Previous research has underscored the benefits of information normalization for enhancing communication effectiveness (Flusberg et al., 2017; Khurana et al., 2020). Gilson, for instance, argued that adhering to standardized procedures in normalized work practices can mitigate ambiguity (Gilson et al., 2005). Compared to disorganized responses, normalized information is presented clearly and concisely, often employing standardized medical terminology that patients can readily comprehend. This clarity reduces confusion and misunderstanding, enabling patients to access more accurate and reliable information. Moreover, online information generated by individuals differs from that produced by information systems in its creativity, flexibility, and complexity. Well-organized information not only reduces the difficulty of information processing but also enhances readability and understandability. By presenting information in a structured and normalized format, online healthcare platforms can facilitate smoother communication and foster a more positive user experience for both physicians and patients.
Extensive evidence indicated that normalized healthcare information, such as structured templates, plays a crucial role in reducing variation (Flusberg et al., 2017) and enhancing comprehensibility and comprehensiveness (Brown et al., 2019). Consequently, it leads to clearer and more fluent communication among healthcare professionals (Khurana et al., 2020) and improves diagnostic outcomes (Lin et al., 2014). Additionally, normalization serves to mitigate misunderstandings (Khurana et al., 2020). Beyond its impact on information quality, normalization also influences user perceptions. As noted by Gilson, customers may perceive employees trained to adhere to standards as more knowledgeable (Gilson et al., 2005), a principle particularly relevant in healthcare settings. Patients, as lay users, tend to view physicians providing normative responses as more professional and reliable, thereby enhancing satisfaction and adherence to medical advice. However, it is essential to tread carefully with normalization to avoid its negative consequences. Excessive normalization, exemplified by mechanical answering templates in intelligent customer service, can lead to standardization, eliciting negative reactions from patients. The use of standardized, impersonal, and inflexible expressions may alienate individuals and discourage further communication (Go and Sundar, 2019). Therefore, the application of normalization should be deliberate, especially in contexts where emotional support is crucial. In summary, the primary objective of our study is to examine the role of information normalization in online healthcare communication, aiming to provide a comprehensive analysis and empirical evidence of its impact under various circumstances. By delving into this topic, we seek to shed light on the nuanced effects of normalization and its implications for healthcare communication practices.
The main contributions of our study can be outlined fourfold. First, we extended the existing model of information quality by introducing a novel dimension of information normalization, and empirically validated its positive impact on online healthcare communication. Second, we proposed an innovative method for quantifying information normalization by leveraging techniques from natural language processing, providing a systematic and objective means of assessment. Third, we contributed to the social support theory by examining the moderating role of disease risk in the relationship between information normalization and patient decisions. Fourth, we investigated the path of information normalization, highlighting its potential for development and learning through practice and experience.
In the next section, we give a summary of the theoretical basis and relevant literature as the foundation to develop our research model. Then, we generate our hypotheses and a theoretical model. Subsequently, we outline our research method, including the data collection and experimental design. The empirical results and corresponding analysis follow. Finally, we draw conclusions and discuss the theoretical and practical implications.
2. Literature and theoretical background
In this section, we begin with an overview of information quality theory. We then introduce our newly proposed dimension - information normalization. Finally, we summarize related research in healthcare.
2.1 The information quality framework
The development of information technology facilitated the proliferation of information and finally led to serious information overload issues (Maass et al., 2018). Consequently, there is a pressing need for high-quality information, necessitating an appropriate measurement of information quality. Since the 1980s, numerous researchers have endeavored to explore information quality and have developed various information quality models (Bailey and Pearson, 1983; DeLone and McLean, 1992; Jarke and Vassiliou, 1997; Kilkenny and Robinson, 2018; Miller, 1996; Wang and Strong, 1996). For example, Bailey and Pearson (1983) assessed information quality using 39 indicators, including timeliness, completeness, conciseness, format, and relevance. Miller (1996) utilized four dimensions - completeness, accuracy, relevance, and timeliness - to evaluate report quality. Wang and Strong (1996) introduced a conceptual model comprising four dimensions, namely intrinsic data quality, contextual data quality, representational data quality, and accessibility data quality. However, these models were primarily designed to evaluate the success of computer-centered information systems. In these frameworks, information quality was often examined in conjunction with system quality. Yet, information generated by systems differs significantly from that created by human beings. Human-generated information is typically more flexible, diverse, and complex. In the context of healthcare, information provided to patients is predominantly generated by physicians. Therefore, directly applying these original models to assess healthcare information quality is inappropriate.
Numerous efforts have been made to adapt previous information quality models to diverse scenarios, leading to various expansions and adjustments in information quality dimensions. In finance, for instance, researchers investigated the detrimental effects of low information quality on liquidity risk by devising an information quality proxy based on three economic criteria (Ng, 2011). In marketing, a hierarchical framework was developed from the original model to identify suitable features for evaluating product review quality, thereby aiding consumers in making informed purchase decisions (Chen and Tseng, 2011). In the field of AI, the information quality model was refined into a six-dimensional metric encompassing currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness, aimed at enhancing information retrieval performance (Zmud, 1978). However, despite these adaptations in various domains, a comprehensive framework for assessing information quality in healthcare, particularly within online healthcare contexts, remains elusive. This gap impedes knowledge advancement and innovation (Landau et al., 2021). Therefore, our primary objective is to develop a framework tailored to healthcare, specifically online healthcare, to facilitate the evaluation of information quality in online healthcare communities.
Criteria for information quality have been extensively employed in the literature to examine user satisfaction (Iivari, 1987). Many studies have demonstrated that information quality plays a pivotal role in facilitating successful information transfer and knowledge adoption, thus serving as a significant predictor of user satisfaction (Bailey and Pearson, 1983; DeLone and McLean, 1992; Guo et al., 2012). For instance, Guo et al. (2012) revealed that online information aids consumers in assessing product quality during purchasing decisions. In Bailey and Pearson’s user satisfaction assessments (Bailey and Pearson, 1983), the top 10 most crucial items among all 39 items pertained to information quality, including information accuracy, output timeliness, reliability, completeness, relevance, precision, and currency. Information quality, user satisfaction, and individual impact are considered key concerns in the information systems literature (DeLone and McLean, 1992), with information quality believed to influence both the use of information systems and user satisfaction (Rai et al., 2002). In recent years, healthcare, particularly online healthcare, has garnered significant attention in the realm of information systems (Chen et al., 2019; Khurana et al., 2019; Mousavi et al., 2020). However, scant research has been conducted to examine the role of information quality in online healthcare. Our study aims to address this research gap by investigating communication between physicians and patients in online healthcare communities.
2.2 Information normalization
We define information normalization as the underlying structure and logic governing how information is organized and presented. This concept is rooted in earlier research. Bailey and Pearson (1983) originally identified the format of output from information systems as a crucial indicator of information quality in determining computer user satisfaction. They defined format as “the material design of the layout and display of the output contents,” underscoring the importance of information representation. Doll and Torkzadeh (1988) later summarized this concept into two key points: presented in a useful format and clarity. Subsequently, Wang and Strong (1996) consolidated related concepts and introduced the dimension of “representational data quality”, which encompassed attributes such as understandability, interpretability, and concise representation (as summarized in Table 1). Since then, scholars and practitioners have expanded and adjusted the interpretation of this dimension within their respective research contexts, leading to varying interpretations and confusion. As depicted in Table 1, there is significant variation in the concept from different perspectives. Academics often prioritize interpretability and representation, whereas practitioners may emphasize uniqueness and consistency, focusing more on the “data” level. Even within academic circles, there is a lack of consensus on how to define this concept.
In the Internet era, the importance of information normalization cannot be overstated. Today, people rely heavily on online platforms to search for and exchange information (Garton et al., 1997). The vast array of human-generated information encompasses a wide range of topics and formats. It would be unthinkable if all this information was disorganized and chaotic. Disordered information presents significant obstacles to comprehension, increasing the cognitive burden of information processing. Moreover, illogical expression can easily lead to misunderstanding and confusion, potentially resulting in unforeseen complications for information stakeholders (Jones, 2003). Furthermore, compared to mechanical information output from systems, human-generated information is characterized by greater flexibility, variability, and complexity. It underscores the pressing need for standardized information with well-defined structure and logic. In light of these circumstances, we aim to address this research gap by introducing a clear and practical concept: information normalization. This concept shifts the focus from mere “presentation” to the underlying structure and logic, emphasizing the organization of information rather than its superficial and ambiguous format. By emphasizing the structure encompassing the entire information, such as an entire article, this perspective offers a comprehensive and high-level examination of information, providing deeper insights into information quality. Therefore, we propose information normalization as a novel and vital dimension of information quality.
2.3 Information quality analysis in online healthcare consultation
The healthcare literature underscores the pivotal role of information quality as a determinant of service quality (Kerr et al., 2008). It exerts the strongest positive impact on patients’ adoption of answers compared to emotional support and source credibility (Jin et al., 2016). Online healthcare consultation represents a unique form of online communication. Maltz (2000) utilized four dimensions - relevance, timeliness, comprehensibility, and credibility - to evaluate users’ perceived information quality in communication. Similarly, Nelson et al. (2005) assessed information quality based on information accuracy, completeness, currency, and format. Behl et al. (2023) identified criteria such as information amount, completeness, timeliness, relevance, and accessibility to denote contextual information quality.
Recent healthcare research emphasized the significant influence of distinctive biomedical features embedded in health information on online communication, warranting their consideration in healthcare information quality evaluation (Fadahunsi et al., 2021). However, little attention has been paid to this topic until recently. In this paper, we propose an information quality framework for measuring answer quality in online health consultations based on existing literature. The framework is described as follows:
Timeliness In online consultations, the speed of response, or timeliness, is a pivotal determinant of physician service quality (Yang et al., 2019; Zhang et al., 2014). Timeliness is typically quantified by the average duration a patient waits before receiving a response from the physician during a consultation (Jin et al., 2016). Studies have demonstrated that quicker response times correlate with improved service delivery quality, decision-making for continued consultation, and higher levels of patient satisfaction (Yang et al., 2019; Ye et al., 2019). Timeliness is also used to identify high-quality physicians (Zhang et al., 2014).
Completeness Completeness in healthcare communication denotes the extent to which an answer contains sufficient information to effectively address patients’ concerns. It reflects both the richness of information provided and the dedication of physicians to online services. Comprehensive answers suggest physicians’ conscientiousness and responsible approach, indicating their commitment to meeting patients’ needs. Word count or text length commonly serves as a measure of completeness, given its established validity and widespread use in previous research. Blumenstock (2008) advocated for word count as a superior indicator of article quality due to its simplicity and intuitiveness, surpassing more complex criteria. Similarly, Mudambi and Schuff (2010) noted that longer text reviews in healthcare often contain more detailed information, assisting consumers in reducing uncertainty about product quality. In the context of online healthcare communities, word count or text length frequently offers insights into the depth of physicians’ responses, providing valuable assessments of information completeness (Biswas et al., 2022; Ye et al., 2019). Therefore, completeness, as indicated by text length, plays a pivotal role in information quality, facilitating effective communication and enhancing patient satisfaction.
Depth Interaction depth positively influences service satisfaction in online patient-physician conversations (Zheng et al., 2013). Describing this dimension, interaction frequency quantifies the number of dialogic rounds. Dialog unfolds as information flows back and forth between parties (Odum and Peterson, 1996). Frequent interaction enables physicians and patients to deepen their mutual understanding (Zhang et al., 2014). Physicians gain insight into patients’ conditions and requirements, facilitating accurate diagnoses and improved services. Patients, in turn, grasp the meaning of answers and treatment ideas through affirming discussions with physicians.
Relevance Relevance assesses the alignment between a user’s needs and the information provided (Bailey and Pearson, 1983). In online consultations, relevance pertains to the correspondence between questions and answers, a metric often employed to gauge health information quality. Furthermore, relevance is a crucial aspect of communication (Behl et al., 2023), ensuring that conversations remain focused on the primary issue and effectively address patient needs. Irrelevant responses can hinder communication, resulting in inefficiency and dissatisfaction, potentially lowering physician selection preference (Zhang et al., 2019). Previous studies have utilized cosine similarity between questions and answers as a measure of relevance, a methodology also adopted in our research (Jin et al., 2016).
Normalization Normalization encompasses the structure and logic of information, playing a crucial role in healthcare by addressing semantic fuzziness and information complexity. Firstly, normalization organizes intricate medical information into a clear and logical format, enhancing readability and reducing information processing difficulties for patients. Secondly, it compels physicians to adhere to standardized medical practices, prompting careful consideration during reply editing by reminding them of essential steps and points. It reduces the risk of malpractices, such as misdiagnosis due to overlooked allergy history. Thirdly, normalization enhances the accuracy and precision of biomedical statements. The use of standardized terminology and expressions minimizes errors, consequently lowering the occurrence of medical accidents.
Moreover, normalized information saves physicians’ time, allowing them to focus on improving content quality. Over time, an answering framework can be developed and refined through daily consultations. Given that many patients seek advice for similar diseases, such as hypertension, physicians need not repeatedly type similar answers (Werfel et al., 2014). Instead, they can reuse and modify pre-existing frameworks to suit individual needs. This process parallels the functionality of an automatic reply system, which selects appropriate templates for various questions (Malik et al., 2007). However, unlike mechanical template applications lacking flexibility, physicians adjust templates or fill frameworks according to specific patient requirements.
3. Research model and hypotheses
3.1 Experiential learning of information normalization
Appropriate answering structures are typically developed through a process of usage and subsequent adjustments. In domains such as automatic Q&A, answer templates are crafted based on frequently asked questions, evolving through continuous enrichment and refinement from user feedback (Unger et al., 2012). This process reflects experiential learning, where knowledge is acquired through ongoing, naturalistic learning from life experiences rather than formal science and education (Kolb, 2014). From the perspective of experiential learning theory, physicians generate and adjust templates during their interactions with patients, suggesting that physicians’ information normalization is influenced by their consultation experiences. Therefore, we propose the following hypothesis:
3.2 The role of information normalization in patient decision-making
In online healthcare communities, patients are allowed to select a preferred physician for consultation after paying a certain fee. However, the non-face-to-face nature of online communication also fosters apprehension and distrust (Gong et al., 2021), prompting patients to seek more information to evaluate the quality of physicians. Online healthcare communities often provide a plethora of data, including information generated by physicians, patients, and the platform itself (Li et al., 2019), for patients’ reference. Unfortunately, most patients lack the expertise to fully comprehend this abundance of information and make optimal choices based on it. Physicians in online communities can be viewed as offering their knowledge and services, thus it is their responsibility to assist patients in understanding their professional capabilities and making informed choices regarding their services.
According to cognitive learning theory, organized information is significantly easier to perceive and comprehend, particularly for individuals lacking related knowledge (McCrudden and Rapp, 2017). A well-structured answer with clear logic and representation reduces message complexity, thus lowering patients’ learning costs and ultimately enhancing patient satisfaction. Normative answers also reflect physicians’ professionalism, as they are trained to adhere to best practices involving specific procedures (Ingraham et al., 2019). Skillful physicians are accustomed to structuring their medical answers according to strict medical practice steps, akin to the procedures they follow in offline workplaces, such as Electronic Health Records (EHRs) in hospitals. From the patients’ perspective, physicians who follow standard medical flow paths and provide normative answers are perceived as more knowledgeable and professional (Gilson et al., 2005).
Additionally, these normative health answers are typically presented as modifiable templates with a consistent structure but varying contents. Physicians often prepare these templates in advance (edited during previous consultations) and simply adjust a few words or sentences to tailor them to the current patient’s illness. This approach saves physicians’ considerable time, allowing them to devote more effort to enriching the content. All these factors contribute to increased patient satisfaction by improving both the objective and perceived quality of information. In our study, all variables are analyzed at the physician level. Accordingly, we define the patient’s selection as the number of online consultations a physician has received within a specified period and define patient satisfaction as the average ratings of the physician on the platform. Based on this, we propose the following hypotheses:
3.3 The moderating effects of disease risk
Consumer characteristics play a fundamental role in shaping perceptions of service quality and satisfaction (Anderson et al., 2008). Numerous studies have explored how individual differences impact satisfaction levels (Anderson et al., 2008; Mittal and Kamakura, 2001; Wangenheim, 2003). Researchers have found that individual differences significantly impact levels of satisfaction (Cooil et al., 2007) and moderate the relationship between service quality and satisfaction (Anderson et al., 2008). Online patients, as consumers of healthcare services, exhibit unique characteristics, particularly concerning disease risk. Disease risk encompasses physical factors such as health status and fitness, as well as psychological factors such as stress and anxiety (Ruo et al., 2003; Yang et al., 2015). Therefore, disease risk may influence or moderate patients’ consulting behavior and service satisfaction. In this study, we aim to analyze the role of disease risk from two perspectives, including emotional support and personalized needs.
On the one hand, patients with high-risk diseases may have heightened needs for emotional support. Emotional support encompasses actions taken to provide patients with a sense of care and empathy (Bizzi et al., 2017). Both informational and emotional support are considered fundamental aspects of online healthcare (Liu et al., 2020; Naveh and Bronstein, 2019), often used to evaluate physicians’ service quality (Yang et al., 2015). Emotional support is believed to significantly influence patients’ satisfaction and treatment outcomes (Adamson et al., 2012). The requirements for emotional support vary among individuals and can even differ for the same individual across different situations (Slevin et al., 1996). For patients, their illness state is a critical determinant of their psychological well-being. Serious diseases, often associated with high risks of complications and mortality, can impose immense psychological pressure, rendering patients more vulnerable and in need of comfort and compassionate care. Consequently, patients facing critical illnesses are likely to have a heightened desire for emotional support.
While information normalization significantly enhances the overall quality of physicians’ responses, it may have adverse effects on human emotions in certain contexts. A prime example is the use of chatbots, which can evoke feelings of antipathy due to perceptions of impersonality, immaturity, or a sense of being served by a tool rather than a human (Han and Lee, 2022; Nichifor et al., 2021). Consumers often perceive overly normalized responses, resembling templates, as mechanical and impersonal (Abujabal et al., 2017), failing to meet their nuanced and personalized needs (Go and Sundar, 2019), consequently resulting in decreased satisfaction. Similarly, in online healthcare conversations, patients may perceive physicians as robotic when presented with template-like responses, particularly those with high emotional support needs, such as patients facing high-risk diseases. Moreover, feelings of despair and anxiety can lead to heightened emotional arousal, negatively impacting patients’ rational decision-making (Kaufman, 1999). Besides the utility and expertise of information, seriously ill patients also prioritize physicians’ attitudes during service delivery. Consequently, despite improving overall information quality, the positive impact of normalization may be diminished in terms of communication performance, as evidenced by patients’ behaviors such as physician selection and satisfaction, especially among those facing high-risk diseases.
On the other hand, patients with varying levels of disease risk have different needs regarding the level of personalization in diagnosis and treatment services. High-risk diseases, such as cancer, are typically intricate, with significant variations among patients’ conditions. When doctors respond, they must analyze specific issues and offer personalized solutions tailored to each patient's condition. Conversely, for low-risk diseases like colds, symptoms are generally similar, and standardized treatments suffice in most cases. In such scenarios, doctors can efficiently utilize existing templates to respond, saving time and addressing patients’ concerns simultaneously. However, patients with high-risk diseases expect personalized responses, desiring doctors to provide tailored answers specific to their conditions rather than following a standard format similar to others. Based on the aforementioned discussion, it can be inferred that despite significantly improving overall information quality, the positive impact of normalization may be diminished in terms of communication performance, particularly for patients facing high-risk diseases. In other words, disease risk may negatively moderate the relationship between normalization and patients’ behaviors. Thus, we present the following hypotheses:
Based on the above analysis and hypotheses, our research model can be formed as shown in Figure 1.
4. Data and variables
4.1 Data collection
To test our hypotheses, we collected data from Dingxiang Doctor (https://dxy.com/), one of the most renowned online healthcare communities in China. This platform is dedicated to providing healthcare services to the general public, including online consultations, hospital inquiries, and disease self-examinations. With over 50,000 registered professional physicians from tertiary (top-level) hospitals, it serves as a prominent platform for medical professionals. For our study, we selected physicians who registered between 2016 and 2020 on the platform. We excluded those who did not provide basic information such as their hospitals and gender, resulting in a sample of 4,758 physicians across 33 departments. We collected the latest 100 (at most) public consultation records for each physician, resulting in a total of 271,892 records after filtering out those containing voice responses, which accounted for 11.7% of the total. Additionally, we gathered corresponding physician information, including physician-generated data (e.g. clinical title, academic title, and hospital level), patient-generated data (e.g. historical consultations, patient feedback, and ratings), and system-generated data (e.g. the physician's total number of consultations, number of consultations in the last month, and response speed).
4.2 Variable construction
4.2.1 The construction of information normalization
In hospital settings, normative information often manifests in the form of quality initiatives and professional practice guidelines, such as checklists and reporting templates (Ingraham et al., 2019). Table 2 illustrates two pairs of examples contrasting responses with and without normalization, representing two common template types: checklists used before providing diagnosis and treatment plans. Both columns within each row convey the same information. Normalized responses, in the form of templates with customized modifications, exhibit significantly enhanced structure, logic, and clarity, thereby facilitating patients’ reading and comprehension. Our investigation revealed that these templates are not provided by the platform. Different physicians tend to use different sentences to convey the same information, a fact also confirmed by physician users. It eliminates potential interference from platform settings.
As a result, the measure of normalization involves the extraction of templates from physicians’ answer text. A template, defined as a format, mold, or pattern used as a guide to make something (https://www.techtarget.com/whatis/definition/template), exhibits a similar appearance when employed in text editing. Therefore, an essential step in template extraction is identifying responses with similar structures and writing logic. However, templates are not mere replicas of other paragraphs. The specific content (words or sentences) within the template can be altered or substituted based on specific consultation questions. Thus, exact templates cannot be obtained by simply matching identical answers. To tackle this challenge, we utilized deep learning techniques to establish relationships between physicians’ various answers. Additionally, topic-modeling methods and text-mining techniques were employed to gain deeper insights into the data.
For each physician, we measured their information normalization by calculating the average template length across their latest (up to) 100 consultations. The technical details are discussed in the following steps, and Figure 2 illustrates the overall processing flowcharts.
Step 1: Answer similarity calculation
Initially, we computed the similarity between various answers provided by the same physician to identify potential templates. We utilized the pre-trained model bert_base_chinese (Sheng et al., 2021) to generate embeddings for each word, represented as a vector of 768 dimensions. Each consultation consists of several question-answer rounds, with each answer round regarded as the base unit for text similarity calculation. The vectors of all words within the same answering round were aggregated to form the semantic representation of that round, resulting in a 768-dimensional vector. Since templates are characterized by structures and formats used more than twice by the same physician, we exclusively compared answers in different consultations generated by the same physician. Consequently, we computed the cosine similarity for each vector pair of question-answer rounds for each physician. The similarity between vectors A and B was computed using Equation (1), where = 768, and represents the -th component of A.(1)
Step 2: Template candidate generation
Next, we identified those rounds with cosine similarity greater than and a length longer than (we set = 0.95 and = 13) as template candidates. To determine the value of , we followed a structured process. Initially, we manually labeled 1,000 pairs of rounds as ground-truth templates. Then we sorted the corresponding 1,000 cosine similarity values between question-answer pairs (derived from Step 1) in ascending order. The lowest quartile of the similarity values was selected as the value of , which is 0.95. The value of followed the HowNet (https://www.cnki.net) duplicate checking standard, defining 13 consecutive similar words as duplicates and considering them in the repetition rate.
Step 3: Template refinement
The identified templates fell into two categories: medical-related and medical-unrelated (examples can be seen in Table 3). The latter is primarily used by physicians to express politeness and kindness, which deviates from our definition of templates as a dimension of information quality. To filter out these unrelated templates, we employed Latent Dirichlet Allocation (LDA) (Blei et al., 2003) for clustering. Initially, we conducted word segmentation on the template candidates using the Jieba tool in Python. Stop-words were removed before applying topic modeling. In model training, we set parameters for the number of topics as 2, as 0.1, and as 0.01. After 1,000 iterations, we successfully classified the templates into two categories, as shown in Table 3. Medical and non-medical words have been effectively separated into two topics. Templates labeled as “medical” are the focus of our study and serve as input in the next step.
Step 4: Information normalization calculation
The measure of information normalization was constructed based on the templates generated in the third step. For each consultation, the text length of templates was divided by the text length of the doctor’s answers to obtain the template usage ratio at the consultation level. These ratios were then aggregated at the physician level by averaging across all consultations of the physician, resulting in our final variable: template length ratio. In short, information normalization is defined as the proportion of template length to the total answer length per consultation on average for each physician.
4.2.2 Dependent and independent variables
Table 4 summarizes the primary variables employed in the study along with their corresponding measures. These variables were aggregated at the physician level. Patient selection of physicians was gauged by the count of consultations in the most recent month, following established literature (Chen et al., 2022; Gong et al., 2021). Online consultation experience was quantified as the total consultations minus those in the latest month, addressing concerns related to variable overlap. Patient satisfaction was represented as a binary variable due to the distribution characteristics and constraints of the data. This decision was motivated by two factors. Firstly, physicians’ ratings often exhibit polarization. While the rating scale theoretically ranges from 0 to 5, instances of ratings between 0 and 5 are infrequent (3.78%). It appears that only highly satisfied or highly dissatisfied patients tend to provide ratings to express their extreme sentiments. Secondly, physicians with no reviews receive a rating of 0 from the platform. Consequently, treating ratings as a continuous or ordinal variable would be unreasonable. Therefore, patient satisfaction was constructed as a binary variable to yield more accurate results through the utilization of logit regressions.
To explore and account for additional dimensions of information quality, four variables - timeliness, completeness, depth, and relevance - were incorporated. Timeliness was defined by the healthcare platform, which employs a seven-minute threshold to differentiate between prompt and non-prompt responses. Physicians with an average response time of less than seven minutes were categorized as “quick response” on their respective homepages. Information completeness was assessed based on text length, a metric commonly employed in prior studies (Blei et al., 2003; Blumenstock, 2008; Slevin et al., 1996; Yang et al., 2019). Depth was assessed by the number of communication rounds during the consultation. Regarding relevance, we computed the cosine similarity between the question text embedding and answer text embedding using Word2Vec (https://www.tensorflow.org/text/tutorials/word2vec) for each consultation. Subsequently, we averaged the similarities across all consultations conducted by the same physician to gauge the relevance of the information provided in the physicians’ responses to the questions posed.
The Pearson correlation matrix of main variables is presented in Table 5, excluding dummy variables of the clinical level, city level, and department due to their excessive number.
4.2.3 Moderating and control variables
As outlined in the preceding section, disease risk may serve as a negative moderator between information normalization and patient decisions, viewed through the prisms of emotional support and personalization needs. Hence, disease risk was selected as the moderating variable. Mortality rate served as the proxy for disease risk, obtained from the China Health Statistical Yearbook in 2020 (Commission, 2020). This selection aligned with our conceptual definition, as higher mortality rates typically correlate with increased pressure, fairness, and anxiety (Yang et al., 2015). For example, diseases with high mortality rates, such as malignant tumors, heart disease, cerebral vascular disease, and coronary heart disease, were identified based on their respective mortality rates. Using this information, we identified physicians specializing in high-risk diseases through online sources such as “direct expert in”. To achieve this, we calculated the average mortality rate for each physician across their areas of expertise. Based on this average rate, we categorized doctors into two groups: those specializing in high-risk diseases and those specializing in low-risk diseases. This categorization resulted in a binary variable for disease risk, with a value of 1 indicating specialization in high-risk diseases and 0 indicating expertise in other diseases.
Other physician quality indicators were included as control variables, encompassing professionalism, academic level, and hospital reputation. Professionalism was gauged by clinical title, academic level by academic qualification, and hospital reputation by hospital ranking, which can be referred to in Table 4. We opted for hospital ranking instead of hospital level, as the majority of physicians (93%) were affiliated with 3A hospitals, which did not provide a distinguishing feature. Additionally, basic physician information such as consultation fees, gender, and city level were controlled. Lastly, physicians’ departments remained fixed. A natural logarithm transformation was applied to absolute quantity features, including consultation number, text length, and consultation fee, to mitigate variable fluctuation.
5. Results
5.1 Hypothesis testing
To test the hypotheses proposed above, we formulated three equations. Equation (2) investigated the impact of online consultation experience on the degree of information normalization for physician and was expressed as follows:(2)
Equation (3) and Equation (4) were formulated to assess the influence of information normalization on physician decision-making, encompassing physicians’ selection behavior and patient satisfaction, respectively. To investigate the moderating effect of disease risk, we introduced the interaction term of disease risk and normalization. The equations are structured as follows:(3)(4)
We employed a step-by-step approach to add control variables and test our hypotheses. Initially, only independent variables and physician control variables were included to examine the direct correlation between normalization and patient decision-making. Subsequently, variables representing traditional information quality dimensions were added. Finally, an interaction term was introduced to evaluate the moderating role of disease risk. The estimation results are outlined in Table 6. The dependent variable is information normalization for Model (1), patients’ physician selection for Models (2) to (4), and patient satisfaction for Models (5) to (7).
In Model (1), the findings revealed a positive association between physicians’ online consultation experience and their level of information normalization ( = 0.037, < 0.001), thus supporting H1. Model (2) and Model (3) investigated the relationship between information normalization and physician selection. The results affirmed the beneficial impact of information normalization on patients’ choice even after adjusting for physician quality ( = 2.531, < 0.001) and other dimensions of information quality ( = 2.209, < 0.001), thereby supporting H2(a). The coefficient in Model (3) suggested that if the template ratio increased by 0.01 unit, the patient selection willingness (the total number of physician consultations) increased on average by 2.209%, which is an appreciable improvement.
Model (5) and Model (6) examined the relationship between information normalization and patient satisfaction, yielding significant results akin to those observed for physician selection, thus corroborating H2(b). The findings indicated a significant positive correlation between information normalization and whether the physician received an average rating of five stars. Overall, patient decision-making, encompassing both physician selection and satisfaction, was positively associated with physicians’ information normalization.
The results concerning traditional dimensions of information quality align with previous studies. As depicted in Model (3) and Model (6), all dimensions of information quality exhibit a significant positive effect, except for completeness. The result suggests that greater text length does not necessarily translate to increased satisfaction or willingness to select a physician. Among the dimensions, timeliness demonstrated the highest significance level ( = 1.420, < 0.001; = 1.140, < 0.001), followed by depth ( = 0.063, < 0.01; = 0.102, < 0.01) and relevance ( = 1.163, < 0.05; = 1.931, < 0.05), in descending order. The moderating effect of emotional support needs was examined in Model (4) and Model (7), revealing that disease risk negatively moderated the relationship between normalization and both physician selection ( = −1.858, < 0.001) and patient satisfaction ( = −1.846, < 0.001). This finding supports H3(a) and H3(b), suggesting that the positive impact of normalization on patient decision-making diminishes when patients are afflicted with high-risk diseases.
These results confirmed the significant impact of information quality, including our newly proposed dimension of information normalization, on the communication dynamics between physicians and patients in healthcare consultation. However, the influence of each dimension varied. Physicians need to be mindful of these variations when providing services in online communities. Additionally, tailoring services to meet the needs of patients with varying levels of disease risk is crucial.
5.2 Robustness check
To validate the robustness of our proposal, we employed three alternative measures for information normalization. The explanations and technical details of each measure are outlined below.
Template length. In our primary analysis, we utilized the template length ratio as the measure for information normalization. However, recognizing the significance of absolute length, especially considering the control for completeness, we also assessed the absolute length of templates. To compute template length, we averaged the lengths of templates across all consultations conducted by each physician. Subsequently, we applied a logarithmic transformation to mitigate the scale effect.
Count of template phrases and its ratio. Many medical concepts are expressed through sequences of words, or phrases, to convey integrated meanings. To capture this aspect, we measured information normalization using the count of phrases rather than individual words. We collected medical phrases from prominent Chinese medical encyclopedia websites, such as A+ hospital (http://www.a-hospital.com/), and various medical thesauri, such as Sogou cell thesaurus (https://pinyin.sogou.com/dict/), resulting in a comprehensive biomedical phrase set comprising over 190,000 entries. These phrases were integrated into the Jieba thesaurus for word segmentation. Following the same procedure as for template length, we computed the count of template phrases and its ratio for each physician.
The estimation results of three alternatives are presented in Table 7 to Table 9. Table 7 employed the absolute number of template words per consultation as the proxy for information normalization, while Table 8 used the count of template phrases. Additionally, we recalculated the “ratio” using the number of phrases and tested it in Table 9.
We can readily observe that the results of the three alternatives are consistent with our primary study. In both Tables 7 and 8, the coefficient of information normalization in Models (2) to (7) can be interpreted in terms of elasticity, as the logarithm of both dependent and independent variables was utilized. Taking Table 7 as an example, the results suggested that patient selection willingness could increase by up to 0.153% (Model (3)) with a 1% increase in template length per consultation. These findings affirmed the robustness of our conclusions. Furthermore, we noted that the coefficients of Completeness were significantly negative in Tables 7 and 8, contrasting with the results in Tables 6 and 9. This discrepancy may stem from the correlations between measures of completeness and normalization, as all of these measures utilized the length of text.
6. Summary and discussion
6.1 Key findings
In this paper, we introduced a new dimension, namely, information normalization, into the original information quality model in response to the call for efficient communication in digitization. We conducted an empirical study in the scenario of online healthcare consultation to test our proposal. The analysis results supported our argument that information normalization plays a significant role in information quality, at least in the scenario of online communication, which is an irreversible trend in the era of digitization. The findings align with previous literature, which claimed that information format (which has a similar meaning to our proposed information normalization and acts as the foundation of our proposal) was a significant factor in evaluating information quality (Bailey and Pearson, 1983).
Our findings underscore the significance of information expression and presentation in today’s digital age. In an era marked by information overload, the online landscape is saturated with a myriad of complex and visually stimulating content. Within this fast-paced society, individuals often struggle to allocate time for careful contemplation of intricate information, such as communication text. In such circumstances, an effective presentation style becomes paramount in enhancing reading efficiency. Normalized expression, logical presentation, and clear writing layout all contribute to an enhanced reading experience and increased efficiency in information transmission. It holds especially true in specialized fields like healthcare, where patients often lack professional knowledge, making it more challenging to comprehend medical information. In such cases, a normalized approach to presenting information becomes even more essential, as it reduces readers’ difficulty in processing information. By providing a structured and easily understandable format, normalized expression enables individuals to grasp the meaning of the information more quickly and directly. This approach alleviates the need to decipher intricate expressions, thereby minimizing energy expenditure and reducing the risk of potential misunderstandings.
Specifically, our study yielded four significant findings regarding the impact of the information quality matrix on online healthcare communication and final decision-making. Firstly, we found a positive correlation between the normalization level of physicians’ answer information and their online consultation experience. It suggests that normalization can be acquired and improved through practical experience, akin to a form of experiential learning. This finding provides valuable insights for individuals looking to enhance their information normalization skills for more effective communication.
On the one hand, information normalization can be honed and refined through ongoing communication, akin to the learning process of AI or machines. AI exhibits proficiency in recognizing patterns and logic. ChatGPT, for instance, possesses the ability to condense, restructure, and streamline complex information through extensive training. Similarly, physicians’ information normalization skills can also be bolstered through accumulated experience in real-world practice. Furthermore, with the rapid advancement of AI technologies, they may play a crucial role in facilitating this process in the future.
On the other hand, heightened information normalization signifies an enhancement in information quality. As information providers can elevate the level of normalization through consistent practice and training, individuals can similarly enhance information quality through dedicated training efforts. This observation provides valuable insights for research in the realm of information quality, prompting inquiries into how to elevate information quality and the feasibility of achieving such improvements through personal training and education.
Secondly, patients’ satisfaction and their inclination to select physicians are positively associated with the information normalization of physicians’ responses. It suggests that the normalization of health information aids in generating clearer, more professional, readable, and comprehensive content. This finding aligns with prior research on medical templates, which has demonstrated that normalized templates can lead to more comprehensive and consistent reports, as well as clearer and more reliable communication (Flusberg et al., 2017; Khurana et al., 2020).
In the healthcare domain, the impacts are even more pronounced. First, normalized responses are systematically organized, logically structured, and clear, thereby facilitating easy comprehension for patients and reducing the complexity of information processing. Second, the use of normalized expressions, such as standardized medical terminology instead of abbreviations or colloquial terms, helps mitigate sentence ambiguity, reducing misunderstandings among patients and preventing unnecessary complications. Third, information normalization ensures that doctors adhere to established medical norms and procedures, prompting them to consider exceptional situations that may be overlooked, thereby reducing the occurrence of medical accidents such as misdiagnosis. Fourth, non-professional patients often struggle to comprehend complex medical information, highlighting the importance of clear and coherent responses to enhance overall communication efficiency. Fifth, online environments, such as online healthcare communities, accentuate the role of information normalization. In these settings, patients interact with doctors in chatboxes, without the benefit of perceiving doctors’ body language, which is a conventional means of conveying information. Consequently, there is an increased demand for readability and understandability in responses, which can be achieved through information normalization.
Thirdly, patients’ choices of physicians and their satisfaction are positively linked to the timeliness, depth, and relevance of information provided by physicians. The correlation between timeliness and patient satisfaction aligns with previous research, indicating that patients are more likely to feel satisfied when they receive a prompt response, thereby enhancing their motivation to utilize e-consultation services (Nijland et al., 2009). In the realm of online healthcare, quick responses not only offer patients a sense of real-time communication and bridging the virtual gap, but also minimize waiting time and assist patients in resolving issues more efficiently.
The discovery regarding depth aligns with prior literature, which suggests that depth measures the actual utilization of information systems and determines the success of information systems (Zheng et al., 2013). Many issues necessitate multiple rounds of communication and interactions for effective resolution. For example, initial inquiries may be broad, and subsequent interactions enable patients to provide additional medical information or seek clarification on specific details from the doctor’s response. Similar to offline consultations, addressing medical concerns often entails multiple rounds of Q&A to attain a comprehensive understanding of the condition, enabling the doctor to offer accurate diagnoses and treatment plans. Increased rounds of communication signify a deeper level of engagement between doctors and patients, facilitating the resolution of detailed and personalized issues. It also underscores the doctor’s dedication, as it entails additional time and effort.
The positive contribution of relevance aligns with previous information quality models, which have highlighted relevance as a crucial aspect of contextual information quality (DeLone and McLean, 1992; Jarke and Vassiliou, 1997; Wang and Strong, 1996). The usefulness of information hinges on its ability to fulfill user needs. Providing information that is unrelated to the question would be akin to moving in the wrong direction, failing to address the issue, and ultimately falling short of enhancing information quality.
Unexpectedly, information completeness does not exert an influence on patients’ decisions, contrary to findings from previous studies on information quality (Jarke and Vassiliou, 1997; Wang and Strong, 1996). This unexpected result may stem from the correlation between information completeness and normalization. Normative answers typically contain richer content, as they are crafted with careful consideration. As a result, the impact of completeness on decision-making may be subsumed within the broader concept of normalization.
Fourthly, the findings underscore the nuanced impact of disease risk, which negatively moderates the relationship between information normalization and communication performance. It highlights the diversity among patients and underscores how individual characteristics can shape their perceptions of service quality. On the one hand, it signals a potential drawback of normalization in terms of addressing human emotions. Patients grappling with severe illnesses often experience heightened stress and anxiety and seek emotional support in addition to informational assistance. Given that normalization may convey a sense of impersonality and detachment (Han and Lee, 2022), it may not be well-received by patients facing high-risk diseases.
On the other hand, it reflects the inherent conflict between excessive information normalization and personalized patient needs. Normalized information typically adheres to a fixed response pattern, making it well-suited for less complex diseases. However, for intricate conditions such as cancer, adhering to a standardized pattern becomes challenging for doctors when delivering treatments. In such situations, the substantive content of the response can better emphasize the value of the service. Patients with high-risk diseases hope for doctors to offer valuable and customized information rather than seemingly uniform responses.
Although patients with high-risk diseases are less satisfied with normalized answers, they are still the best choice for certain patients in many cases, especially during the COVID-19 epidemic. Patients with chronic diseases such as hypertension have to see the physician at regular intervals, which wastes time and effort on the way to and from the hospital since periodic checks and prescriptions are simple and easily completed online. Moreover, medical resources are distributed unevenly in China. The platform can serve as a bridge between patients and physicians all over the country, providing patients from rural areas and small cities with access to physicians from first-tier cities and famous hospitals. For this reason, online health services are meaningful for patients with high-risk diseases.
6.2 Theoretical implications
The theoretical contributions of this paper are fourfold. Firstly, in response to the emerging challenges posed by digitization, we extended the conventional information quality model by incorporating a novel dimension termed information normalization. This addition represents a progression beyond established concepts like information format or layout. By encompassing this new dimension, we extended the model’s relevance from computer-centric communication to human-centered interaction, which involves more nuanced and intricate information dynamics. Such expansion enhances the model’s adaptability to meet the evolving demands of our society’s development.
Secondly, we delved into the existing literature to devise methods for quantifying information normalization. This endeavor yielded four distinct measures, thereby providing valuable insights for future research endeavors. Although the concept of information normalization has been broached previously (Bailey and Pearson, 1983), the bulk of prior studies have leaned toward qualitative analysis, primarily due to the absence of viable quantitative measures on a large scale (Bailey and Pearson, 1983; DeLone and McLean, 1992; Doll and Torkzadeh, 1988). By initially leveraging the medical templates employed by physicians, we formulated various metrics to dissect the structures and underlying logic embedded within physicians’ responses. Our empirical investigations subsequently underscored the efficacy of these proposed measures. The meticulous procedural guidelines delineated in this paper stand to furnish future scholars in analogous fields with a valuable roadmap for conducting similar investigations.
Thirdly, our study sheds light on the interplay between experiential learning and information quality theory. Through our empirical investigation, we examined whether information quality can be self-augmented through practical application. Our findings suggest that physicians can incrementally cultivate and refine the degree of information normalization over time, as evidenced by the accumulation of consultation history. It suggests that normalization can indeed be acquired and honed through experiential learning, thereby establishing a crucial nexus between learning theory and information quality theory. This connection unveils a promising avenue for future research endeavors aimed at exploring information quality dynamics in dynamic and evolving contexts.
Lastly, our findings enrich social support theory by elucidating the intricate relationship between informational support and emotional support. While informational support tends to address rational needs, emotional support caters to perceptual demands, potentially leading to conflicts in certain scenarios. Our study unveils the nuanced interplay between these two forms of support, particularly evident in the negative moderation effect of disease risk on the positive impact of information normalization. It underscores the conflicting nature of these support types, as patients with high-risk diseases prioritize emotional support over standardized information. Our insights offer valuable contributions to understanding the intricate connections between various forms of support within social support theory.
6.3 Practical implications
Our study also offers several practical implications for physicians aiming to improve their online consultation services. Firstly, physicians must recognize the pivotal role of information quality. In the realm of online consultation, the information conveyed by physicians constitutes the sole resource available to patients. Unlike face-to-face encounters in hospitals, online interactions lack the nuances of non-verbal communication, such as body language, potentially leading to misinterpretations. Thus, enhancing information quality becomes paramount in fostering effective communication and delivering top-notch service. Specifically, physicians should focus on key dimensions of information quality, including normalization, timeliness, depth, and relevance. It entails structuring responses, providing prompt replies, engaging in frequent interactions, and tailoring answers to address patients’ specific diseases and symptoms.
Secondly, physicians are encouraged to normalize their services in their daily online healthcare practices. A well-organized response can enhance the logic, accuracy, readability, and comprehensibility of information, thereby improving the overall service quality and patient satisfaction. In practical terms, physicians can consciously develop a repository of templates during their routine consultations. For instance, patients with the same disease typically share common symptoms, causes, treatments, and precautions. Physicians can pre-edit response templates based on different disease categories and then tailor them to specific illnesses during subsequent consultations. This approach not only saves physicians’ time but also ensures answer quality, as these templates are continually refined through repeated use, becoming more comprehensive and thoughtful over time.
Thirdly, physicians must be mindful of the nuanced role of information normalization. When dealing with patients facing high-risk diseases, physicians should offer additional emotional support and personalized responses to counteract potential perceptions of impersonality and indifference resulting from normalization. As previously discussed, normalization primarily addresses rational (informational) needs, and excessively standardized information may evoke negative psychological responses, particularly in patients battling severe illnesses and enduring immense psychological pressure. Such standardized responses might be perceived as mechanistic and may signal a lack of empathy on the part of the physician. In practical contexts, physicians should carefully calibrate the degree of normalization, avoiding responses that appear overly mechanical and uniform. Additionally, physicians can provide emotional support, such as words of comfort and encouragement, to mitigate the potential psychological impact of normalization.
6.4 Limitations and future research
Our study has several limitations and suggests avenues for future research. Firstly, all variables in this study are aggregated at the physician level due to privacy concerns, preventing access to patient-level information from the platform. Consequently, obtaining panel data at the physician-consultation level is not feasible, which poses challenges in addressing endogeneity issues. Thus, our conclusions are limited to related relationships rather than causal ones. Additionally, due to data constraints, we are unable to distinguish between doctors who received a 0 rating and those who did not receive a rating. As a result, we constructed patient satisfaction as a binary variable. To enable more granular research in the future, we intend to explore other platforms or collaborate with companies to access more detailed data.
Secondly, our measures of information normalization may require adjustment when applied in other domains. In our study, we utilized healthcare templates to gauge normalization. While templates are prevalent in various fields, including financial statement preparation and academic paper writing, the applicability of structured expression may vary. In industries like entertainment, where the emphasis is on capturing attention, forms of information expression continually evolve. As a result, structured expression may be less common in these domains, potentially limiting the generalizability of our model to related research.
Thirdly, our discussion in this research is confined to the information quality of the text. In addition to textual data, information can manifest in various forms, including images, videos, and more. Investigating methods to assess the quality of multimedia information, particularly in terms of normalization, presents a novel avenue for future research. Specifically, we intend to delve into the realm of short videos and live streaming for further exploration.
Furthermore, the results are grounded in data obtained from a Chinese healthcare website, potentially constraining their applicability to other contexts. Generalizing our findings to Western cultures may not be appropriate. Consequently, further research efforts should be directed toward examining the relationship between information quality and service outcomes across diverse cultural settings. Such investigations may uncover intriguing variations and nuances in the observed phenomena.
This study was funded by the National Natural Science Foundation of China (No. 72271235) and Fundamental Research Funds for the Central Universities (No. 2023110139).
Figure 1
Illustration of our research model
[Figure omitted. See PDF]
Figure 2
The flowchart of the construction of information normalization measures
[Figure omitted. See PDF]
Table 1
The academics’ and practitioners’ view of representational information quality
| View | Reference | Representational information quality |
|---|---|---|
| The academics' view | DeLone and McLean (1992) | Understandability, readability, clarity, format, appearance, conciseness, uniqueness, comparability |
| Goodhue (1995) | Compatibility, meaning, presentation, lack of confusion | |
| Jarke and Vassiliou (1997) | Interpretability, syntax, version control, semantics, aliases, origin | |
| Montazemi and Wang (1988) | Format (graphic, tabular, or textual), color | |
| Wang and Strong (1996) | Understandability, interpretability, concise representation, consistent representation, accessibility, readable, reasonable | |
| Wand and Wang (1996) | Meaningfulness | |
| Zmud (1978) | Arrangement, readable, reasonable | |
| The practitioners’ view | Cykana et al. (1996) | Uniqueness |
| Gardyn (1997) | Consistency | |
| Redman (2001) | Clarity of definition, precision of domains, naturalness, homogeneity, identifiability, minimum unnecessary redundancy, semantic consistency, structural consistency, appropriate representation, interpretability, portability, format precision, format flexibility, ability to represent null values, efficient use of storage, representation consistency Obtainability, flexibility, robustness, metadata characteristics |
Source(s): Authors own work
Table 2
Two examples of the comparison of disorganized normalized responses with the same meaning
| No. | Un-normalized responses | Normalized responses |
|---|---|---|
| 1 | “Are there any symptoms such as nasal congestion, acid regurgitation, fever, earache, hiccups, or swallowing obstruction? Are there any predisposing factors or similar situations before?” | “Hello, thank you for your trust. I need to get more information to make appropriate suggestions. Please give answers to the following questions according to your actual situation |
| 2 | “Hello, your disease is respiratory tract infection. To deal with it, you can take moxifloxacin orally for seven days according to the instructions. Xidi Iodine Buccal Tablets and ambroxol tablets are also needed: one tablet at a time, three times a day. You can use dextromethorphan to relieve cough as well. Don't eat raw, cold, spicy, and stimulating foods. You should eat more fresh fruits and vegetables, and drink more water. Maintain indoor ventilation and normal humidity. Don’t smoke and drink. Get enough sleep. Don’t fatigue and stay up late. If you do not get better after a week, you should go to see the physician in the hospital.” | “Hello, thank you for your patience. First of all, based on your medical history, we can exclude novel coronavirus infection since you do not have an epidemiological history of COVID-19. Then, according to your current symptoms, your disease can be diagnosed as a respiratory tract infection. Based on this, we give the following guidance |
Source(s): Authors own work
Table 3
Text mining for two types of templates
| Topic | Topic words | Answer examples |
|---|---|---|
| Nonmedical | hello, thank you, edit, reply, in detail, wait a moment, contact, question, busy, trust, recognition, glad to, serve, wishes, better soon etc. | “First of all, thank you for your trust. I have received your question. I am editing the reply. It will take a while to type on my mobile phone. Please wait. I will reply to you in detail later. You don’t need to reply to this message because it will waste an opportunity for you to ask questions. I will take the initiative to contact you later.” |
| Medical | symptoms, infection, pain, cough, muscle, skin, inflammation, acne, cervical, thyroid, vitamin, antibiotic, drug, rest, sleep, fruits, vegetables, beans, etc. | “Relevant conditioning and improvement can be carried out through the following points |
Source(s): Authors own work
Table 4
Description of variables and measures in our model
| Variable | Measure | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| Physician Selection (PhS) | The number of consultations last month of the physician (logarithm) | 1.390 | 1.600 | 0.000 | 7.330 |
| Patient Satisfaction (PaS) | Whether the physician has an average rating of five-star | – | – | 0.000 | 1.000 |
| Online Consultation Experience (Exp) | The total number of consultations of the physician one month before on the platform (logarithm) | 5.490 | 1.620 | 1.100 | 10.430 |
| Normalization (Nor) | The ratio of template length to the total answer length of the physician | 0.100 | 0.130 | 0.000 | 0.770 |
| Timeliness (Tim) | Whether the physician responds in seven minutes on average | – | – | 0.000 | 1.000 |
| Completeness (Com) | The average answer length of the physician (logarithm) | 5.960 | 0.610 | 3.450 | 8.520 |
| Depth (Dep) | The average interaction frequency of the physician | 5.230 | 1.180 | 2.000 | 13.000 |
| Relevance (Rel) | The cosine similarity between question and answer text | 0.144 | 0.058 | 0.000 | 1.000 |
| Disease risk (DiR) | Whether the physician specializes in high-risk diseases (1 for high risk and 0 otherwise) | – | – | 0.000 | 1.000 |
| Clinical Title1 (ClT1) | Whether the physician is a chief physician | – | – | 0.000 | 1.000 |
| Clinical Title2 (ClT2) | Whether the physician is an associate chief physician | – | – | 0.000 | 1.000 |
| Clinical Title3 (ClT3) | Whether the physician is an attending physician | – | – | 0.000 | 1.000 |
| Education (Edu) | Whether the physician has obtained a medical degree | – | – | 0.000 | 1.000 |
| Hospital Ranking (HoR) | Whether the hospital of the physician ranks top100 in China | – | – | 0.000 | 1.000 |
| Consulting Fee (Fee) | The fee patients have to pay to consult the physician (logarithm) | 3.520 | 0.560 | 2.400 | 6.910 |
| Sex | The sex of the physician (1 for male and 0 for female) | – | – | 0.000 | 1.000 |
| Department (Dept, Dummy) | The department of the physician (including 33 departments) | – | – | – | – |
| City Level (CiL, Dummy) | The level of the city that the physician is located in (classified from junior to senior into 4 levels: first-tier city, new first-tier city, second-tier city, and others) | – | – | – | – |
Source(s): Authors own work
Table 5
Correlation matrix of variables in our model
| 1) | 2) | 3) | 4) | 5) | 6) | 7) | 8) | 9) | 10) | 11) | 12) | 13) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1) PhS | 1.000 | ||||||||||||
| 2) PaS | 0.718 | 1.000 | |||||||||||
| 3) Exp | 0.603 | 0.439 | 1.000 | ||||||||||
| 4) Nor | 0.219 | 0.123 | 0.413 | 1.000 | |||||||||
| 5) Tim | 0.281 | 0.167 | 0.174 | 0.119 | 1.000 | ||||||||
| 6) Com | 0.185 | 0.164 | 0.279 | 0.441 | 0.108 | 1.000 | |||||||
| 7) Dep | 0.142 | 0.122 | 0.018 | 0.131 | 0.132 | 0.433 | 1.000 | ||||||
| 8) Rel | 0.134 | 0.156 | 0.113 | −0.002 | 0.068 | 0.418 | 0.265 | 1.000 | |||||
| 9) DiR | −0.132 | −0.054 | −0.136 | −0.085 | −0.023 | −0.070 | −0.041 | −0.042 | 1.000 | ||||
| 10) Edu | 0.175 | 0.176 | 0.146 | 0.025 | 0.036 | 0.061 | 0.056 | 0.075 | 0.054 | 1.000 | |||
| 11) HoR | 0.076 | 0.110 | 0.086 | −0.081 | −0.040 | −0.029 | −0.045 | 0.076 | 0.033 | 0.310 | 1.000 | ||
| 12) Fee | 0.196 | 0.182 | 0.333 | 0.103 | 0.073 | 0.283 | 0.144 | 0.332 | −0.078 | 0.192 | 0.197 | 1.000 | |
| 13) Sex | −0.111 | −0.094 | −0.046 | 0.021 | −0.033 | −0.091 | −0.052 | −0.210 | 0.150 | −0.046 | −0.016 | −0.060 | 1.000 |
Note(s): Excepting clinical title, city level, and department (excessive dummies)
Source(s): Authors own work
Table 6
Estimation results
| Dependent variable | Nor (OLS) | PhS (OLS) | PaS (Logit) | ||||
|---|---|---|---|---|---|---|---|
| Model | (1) | (2) | (3) | (4) | (5) | (6) | (7) |
| Experience | 0.037*** | ||||||
| (0.001) | |||||||
| Normalization | 2.531*** | 2.209*** | 2.784*** | 2.196*** | 1.691*** | 2.249*** | |
| (13.69) | (11.27) | (12.07) | (8.77) | (5.83) | (6.57) | ||
| Timeliness | 1.420*** | 1.409*** | 1.140*** | 1.132*** | |||
| (16.82) | (16.70) | (9.29) | (9.18) | ||||
| Completeness | −0.031 | −0.027 | 0.105 | 0.111 | |||
| (−0.62) | (−0.55) | (1.30) | (1.36) | ||||
| Depth | 0.063** | 0.067** | 0.102** | 0.107*** | |||
| (3.04) | (3.27) | (3.19) | (3.33) | ||||
| Relevance | 1.163* | 1.137* | 1.931* | 1.911* | |||
| (2.36) | (2.31) | (2.31) | (2.29) | ||||
| Nor DiR | −1.858*** | −1.846*** | |||||
| (−5.23) | (−3.38) | ||||||
| DiR | −0.038** | 1.090*** | 1.024*** | 1.120*** | 1.195*** | 1.134** | 1.233*** |
| (0.012) | (7.12) | (6.56) | (7.12) | (3.40) | (3.14) | (3.38) | |
| ClT1 | −0.049*** | 0.296** | 0.295** | 0.293** | 0.349* | 0.383* | 0.382* |
| (0.008) | (2.68) | (2.72) | (2.72) | (1.97) | (2.15) | (2.15) | |
| ClT2 | −0.021** | 0.218** | 0.228** | 0.221** | 0.214 | 0.253 | 0.245 |
| (0.007) | (2.69) | (2.87) | (2.79) | (1.61) | (1.88) | (1.82) | |
| ClT3 | −0.008 | 0.057 | 0.046 | 0.043 | 0.127 | 0.120 | 0.117 |
| (0.006) | (0.79) | (0.66) | (0.62) | (1.05) | (0.98) | (0.96) | |
| Edu | 0.003 | 0.491*** | 0.442*** | 0.434*** | 0.622*** | 0.587*** | 0.580*** |
| (0.005) | (8.26) | (7.73) | (7.61) | (7.38) | (6.80) | (6.71) | |
| HoR | −0.027*** | 0.114* | 0.169** | 0.173** | 0.203* | 0.277** | 0.281** |
| (0.004) | (2.04) | (3.08) | (3.16) | (2.43) | (3.23) | (3.27) | |
| Fee | 0.005 | 0.084 | −0.002 | 0.000 | 0.234*** | 0.086 | 0.087 |
| (0.004) | (1.94) | (−0.04) | (0.00) | (3.55) | (1.24) | (1.25) | |
| Sex | 0.003 | −0.110* | −0.072 | −0.077 | −0.238** | −0.193** | −0.199** |
| (0.004) | (−2.42) | (−1.65) | (−1.78) | (−3.26) | (−2.60) | (−2.67) | |
| CiL (dummy) | YES | YES | YES | YES | YES | YES | YES |
| Dept (dummy) | YES | YES | YES | YES | YES | YES | YES |
| Constant | −0.107*** | −0.125 | −0.200 | −0.263 | −2.805*** | −3.803*** | −3.887*** |
| (0.016) | (−0.69) | (−0.69) | (−0.92) | (−6.99) | (−7.01) | (−7.12) | |
| Adj./Pseudo R2 | 0.251 | 0.208 | 0.269 | 0.274 | 0.0112 | 0.117 | 0.119 |
| VIF | 3.00 | 2.92 | 2.92 | 3.00 | 2.92 | 2.92 | |
Note(s): N = 4,758 observations, robust standard errors in parentheses, ***p < 0.001, **p < 0.01, *p < 0.05
Source(s): Authors own work
Table 7
Results of robustness check I (template word count as a measure for normalization)
| Dependent variable | Nor (OLS) | PhS (OLS) | PaS (Logit) | ||||
|---|---|---|---|---|---|---|---|
| Model | (1) | (2) | (3) | (4) | (5) | (6) | (7) |
| Experience | 0.614*** | ||||||
| (0.017) | |||||||
| Normalization | 0.162*** | 0.153*** | 0.195*** | 0.174*** | 0.149*** | 0.194*** | |
| (0.0110) | (0.0127) | (0.0152) | (0.0167) | (0.0217) | (0.0254) | ||
| Timeliness | 1.437*** | 1.423*** | 1.144*** | 1.133*** | |||
| (0.0847) | (0.0845) | (0.123) | (0.124) | ||||
| Completeness | −0.117** | −0.117** | −0.0386 | −0.0377 | |||
| (0.0529) | (0.0529) | (0.0898) | (0.0902) | ||||
| Depth | 0.0643*** | 0.0688*** | 0.107*** | 0.112*** | |||
| (0.0206) | (0.0204) | (0.0323) | (0.0323) | ||||
| Relevance | 0.765 | 0.731 | 1.826** | 1.815** | |||
| (0.487) | (0.488) | (0.828) | (0.833) | ||||
| Nor × DiR | −0.119*** | −0.127*** | |||||
| (0.0208) | (0.0341) | ||||||
| Physician controls | YES | YES | YES | YES | YES | YES | YES |
| Constant | −1.329*** | −0.194 | 0.176 | 0.0626 | −2.909*** | −3.203*** | −3.354*** |
| (0.283) | (0.181) | (0.301) | (0.302) | (0.400) | (0.570) | (0.574) | |
| Adj./Pseudo R2 | 0.219 | 0.206 | 0.267 | 0.272 | 0.0994 | 0.119 | 0.121 |
Note(s): N = 4,758 observations, robust standard errors in parentheses, ***p < 0.001, **p < 0.01, *p < 0.05
Source(s): Authors own work
Table 8
Results of robustness check II (template phrase count as a measure for normalization)
| Dependent variable | Nor (OLS) | PhS (OLS) | PaS (Logit) | ||||
|---|---|---|---|---|---|---|---|
| Model | (1) | (2) | (3) | (4) | (5) | (6) | (7) |
| Experience | 0.493*** | ||||||
| (0.014) | |||||||
| Normalization | 0.202*** | 0.195*** | 0.245*** | 0.212*** | 0.180*** | 0.236*** | |
| (0.0137) | (0.0161) | (0.0190) | (0.0204) | (0.0270) | (0.0313) | ||
| Timeliness | 1.430*** | 1.416*** | 1.140*** | 1.129*** | |||
| (0.0846) | (0.0844) | (0.123) | (0.124) | ||||
| Completeness | −0.141*** | −0.139*** | −0.0448 | −0.0420 | |||
| (0.0535) | (0.0535) | (0.0910) | (0.0913) | ||||
| Depth | 0.0638*** | 0.0685*** | 0.106*** | 0.112*** | |||
| (0.0206) | (0.0204) | (0.0322) | (0.0322) | ||||
| Relevance | 0.829* | 0.790 | 1.843** | 1.824** | |||
| (0.488) | (0.490) | (0.829) | (0.834) | ||||
| Nor × DiR | −0.145*** | −0.162*** | |||||
| (0.0260) | (0.0419) | ||||||
| Physician controls | YES | YES | YES | YES | YES | YES | YES |
| Constant | −1.294*** | −0.149 | 0.330 | 0.223 | −2.854*** | −3.122*** | −3.273*** |
| (0.227) | (0.180) | (0.305) | (0.305) | (0.400) | (0.576) | (0.581) | |
| Adj./Pseudo R2 | 0.287 | 0.207 | 0.268 | 0.273 | 0.0993 | 0.119 | 0.121 |
Note(s): N = 4,758 observations, robust standard errors in parentheses, ***p < 0.001, **p < 0.01, *p < 0.05
Source(s): Authors own work
Table 9
Results of robustness check III (template phrase ratio as a measure for normalization)
| Dependent variable | Nor (OLS) | PhS (OLS) | PaS (Logit) | ||||
|---|---|---|---|---|---|---|---|
| Model | (1) | (2) | (3) | (4) | (5) | (6) | (7) |
| Experience | 0.050*** | ||||||
| (0.002) | |||||||
| Normalization | 1.932*** | 1.706*** | 2.130*** | 1.687*** | 1.067*** | 1.745*** | |
| (0.137) | (0.146) | (0.171) | (0.187) | (0.198) | (0.255) | ||
| Timeliness | 1.415*** | 1.404*** | 1.090*** | 1.128*** | |||
| (0.0844) | (0.0842) | (0.114) | (0.123) | ||||
| Completeness | −0.0439 | −0.0397 | 0.115 | 0.0976 | |||
| (0.0497) | (0.0495) | (0.0730) | (0.0815) | ||||
| Depth | 0.0655*** | 0.0696*** | 0.0798*** | 0.109*** | |||
| (0.0206) | (0.0205) | (0.0295) | (0.0321) | ||||
| Relevance | 1.187** | 1.163** | 4.209*** | 1.951** | |||
| (0.492) | (0.491) | (0.704) | (0.835) | ||||
| Nor × DiR | −1.376*** | −1.393*** | |||||
| (0.266) | (0.407) | ||||||
| Physician controls | YES | YES | YES | YES | YES | YES | YES |
| Constant | −0.148*** | −0.119 | −0.143 | −0.209 | −2.801*** | −2.486*** | −3.834*** |
| (0.021) | (0.182) | (0.289) | (0.288) | (0.401) | (0.355) | (0.547) | |
| Adj./Pseudo R2 | 0.260 | 0.210 | 0.271 | 0.275 | 0.0949 | 0.0468 | 0.119 |
Note(s): N = 4,758 observations, robust standard errors in parentheses, ***p < 0.001, **p < 0.01, *p < 0.05
Source(s): Authors own work
© Emerald Publishing Limited.
