Content area
In the data economy, organizations are increasingly engaging in data ecosystems, managing and sharing data as valuable assets to address business and societal challenges. Whereas organizations have traditionally exchanged data vertically along the value chain, we observe a growing trend where they share complementary data assets with others, even at times with their competitors. Despite the rising significance of data ecosystems, research on this emerging form of horizontal data sharing is still scarce. Building on the theory of communities of practice, we study a pioneer data sharing community involving over 40 multinational companies that developed an innovative approach to pool data management efforts. We derive a design theory for horizontal data sharing, structured around eight design principles that align with key dimensions grounded in theory: domain of interest, shared practice, and community. By offering prescriptive design knowledge, our findings make an important contribution to the emerging literature on cross-company data sharing. Our research also provides practitioners with actionable insights on how to establish and operate data sharing communities effectively.
Introduction
The data economy implies important changes for organizations: data evolve from being “by-products” of economic activity to strategic assets that have to be managed accordingly (Loebbecke & Picot, 2015; Otto & Aier, 2013), and businesses increasingly engage in data sharing with external parties for economic and societal gains (Aaltonen et al., 2023), fostering the development of data ecosystems (Oliveira et al., 2019). Unlike other enterprise assets, data exhibits unique properties (Aaltonen et al., 2021), with its value increasing when shared and reused (Jarke et al., 2019; Richter & Slowinski, 2019), providing additional motivation for the formation of data ecosystems. According to industry experts, firms that “share data externally generate three times more measurable economic benefit than those who do not” (Gartner, 2021). The scope and purpose of data sharing have evolved considerably over time (Wixom et al., 2020): Whereas organizations have traditionally exchanged data vertically with established partners along the value chain (data sharing 1.0), they are currently starting to share complementary data assets with others, even at times with their competitors, in order to address business and societal challenges. This new type of horizontal data sharing (data sharing 2.0) implies a joint value proposition around shared data assets where participants seek mutual benefits (Wixom et al., 2020), such as achieving economies of scale by complementing their own data assets or pooling data management efforts (Azkan & Mã, 2022; Lefebvre et al., 2022; Oliveira et al., 2019). This form of data sharing implies closed ecosystems, similar to the gated communities and club goods concepts, which have strong boundaries, restricted access (Blakely, 2006), and whose community outcome(s) should only benefit their members (Buchanan, 1965).
Despite the growing body of knowledge on data ecosystems, the investigation of data sharing remains relatively new, and we observe three important research gaps. First, the emerging form of data ecosystems with horizontal data sharing is still in its infancy. Its nascent nature implies that both theoretical and practical frameworks for understanding and guiding horizontal data sharing are still underdeveloped. Second, existing research often focusses on the implementation of data sharing, with an emphasis on marketplaces (Abbas et al., 2024; Schweihoff et al., 2024) and dataspaces (Möller et al., 2024; Otto & Jarke, 2019). This implementation-centric view must be complemented with a practice view that reflects the active role played by data providers, data consumers, and other intermediary bodies to sustain it (Azkan & Mã, 2022; Von Scherenberg et al., 2024). Third, recent research emphasizes the emergence and formalization of data ecosystems (Aaen et al., 2022; Gelhaar & Otto, 2020) but does not explain how to sustain them on the long term. Yet, keeping an ecosystem alive is often challenging, and maintaining its joint value proposition is critical to its survival (Azkan & Mã, 2022; Oliveira et al., 2019).
In light of these gaps, we argue that the theory of communities of practice (CoP) (Nicolini et al., 2022; Wenger, 1998) offers a suitable framework for studying horizontal data sharing. CoP revolves around the idea that learning and innovation happen through collective empowerment around shared practices rather than through the unilateral transfer of tacit knowledge between community members (Nicolini et al., 2022; Wenger, 1998). Therefore, the CoP concept fits well with data ecosystems’ multilateral nature and can offer valuable insights on how to sustain a balanced member’s participation in a community built around a shared value proposition (Oliveira et al., 2019; Otto et al., 2019). Building on recent studies (Bühler et al., 2023; Jarvenpaa & Markus, 2020), we conceptualize horizontal data sharing as communities of practice (CoPs) that consist of selected organizations that collaborate to create value from shared data assets. We pose the following research question: How can horizontal data sharing communities be developed and operated?
To answer this research question, we propose a design theory for horizontal data sharing communities1. Our research is set within an action design research (ADR) project (Sein et al., 2011), in which we partnered with the Business Partner Data Sharing Community, a pioneer data sharing community operated by a university spin-off. To date, the community consists of more than 40 multinational companies from different industries that share their business partner master data. Reflecting on this community’s history, we extract prescriptive design knowledge in the form of a design theory to guide the development and operation of data sharing communities. As the core of the design theory, we derive eight design principles of form and function (Gregor et al., 2020)—supported by justificatory knowledge—that align with CoP characteristics (Wenger et al., 2002), namely their domain of interest, their community, and shared practice. We also present an expository instantiation “for the purpose of theory representation or exposition” (Gregor & Jones, 2007) to illustrate how the design principles materialize in the different phases of the data sharing community life cycle.
Our study enriches the emerging body of knowledge on cross-company data sharing, while contributing to the broader field of data ecosystems research. Through our design theory, we provide reusable and action-oriented guidelines as “how to do” knowledge (Gregor, 2006) and offer insights and concrete guidance on how to develop and operate data sharing communities. Specifically, we propose CoPs as a relevant concept to study horizontal data sharing, emphasizing the collaborative creation of data practices by members to ensure the sustainability of these ecosystems. Beyond the mere emergence of a data sharing community symbolized by its members’ commitment around a joint value proposition, our theoretical contribution extends to examining two significant, yet overlooked, phases in its life cycle: starting up and operating. These phases, which are rich in data practices, underscore the pivotal role of an intermediary (Möller et al., 2024; Oliveira et al., 2019; Schweihoff et al., 2024) in offering various forms of support to the community and in sustaining the exchange of practices. Our study thus responds to longstanding calls for further research into methodologies for developing data ecosystems (Oliveira et al., 2019).
The remainder of the paper is structured as follows: We start with a synthesis of the relevant literature on data sharing and CoPs. Then, we outline our research methodology and research process in greater detail. Subsequently, we present our findings, i.e., the design theory. Finally, we discuss our contribution for research and practice and provide an outlook for future investigations.
Background
Data ecosystems and cross-company data sharing
The data economy is characterized by the development of data ecosystems, which facilitate data sharing between data providers and consumers (Oliveira et al., 2019). The European data strategy and the EU’s underlying Data Act underpin numerous data sharing benefits upfront, such as improved access to private and public data, the generation of new products and services, and the reduction of public services’ costs, which will amount to 270 billion euros in additional GDP by 2028 (European Commission, 2022).
Data sharing is the “domain-independent process of giving third parties access to the data sets of others” (Jussen et al., 2023, p. 6). It is enacted by providing and facilitating access for compliant use and reuse of data between data providers and data consumers (Azkan & Mã, 2022; Jussen et al., 2023). With the proliferation of data ecosystems, data sharing’s scope and its purpose in an enterprise context have evolved considerably (Wixom et al., 2020), to eventually extend beyond organizational boundaries. For a long time, firms have exchanged mainly transactional data along their value chain, which is now referred to as data sharing 1.0 (Wixom et al., 2020). Here, firms mainly supported their business relationships with established partners, either upstream or downstream from their own core competency. For instance, a supplier might share data with retailers, distributors, or other sales intermediaries to enable data-driven decision-making and generate actionable insights (Möller et al., 2024). A very prominent example of vertical data sharing is the data exchange between consumer goods companies and retailers, facilitated by the GS1 ecosystem (2024), which has developed standards-based solutions (“common language”) to identify items, capture information throughout the supply chain and then to share that information globally (Legner & Schemm, 2008). Additionally, firms have recently started to realize the economic potential of sharing complementary data assets with other organizations (data sharing 2.0) (Wixom et al., 2020). This has led to a new form of data sharing in which firms engage in “horizontal cooperation and collaboration” to achieve a common goal (Otto et al., 2019, p. 29), a concept also promoted within the European Union's data strategy (European Commission, 2023).
Figure 1 depicts the differences between vertical and horizontal cross-company data sharing. Unlike vertical data sharing, which is primarily driven by enterprises’ need to preserve an existing value proposition (e.g., to execute a transaction or comply with a regulation), horizontal data sharing is aimed at creating new value proposition by sharing relevant data assets outside the company’s boundaries (Wixom et al., 2020). Concretely, actors share their data with others, sometimes even with competitors, thereby often augmenting the existing vertical data sharing or forming completely new ecosystems. For instance, Skywise has been initiated by Airbus in 2015 to continuously track and analyze operations and performance data from aircrafts with suppliers and with airlines. Here, horizontal data sharing can be considered an extension of vertical data sharing, where the focus is about joint data use and innovation. Literature has also recently highlighted cases of purely horizontal data sharing with seemingly unrelated actors—i.e., from different industries and not part of the same value chain—concerned about the pooled management efforts for specific data or data types, such as the Business Partner Data Sharing Community presented in this paper (Lefebvre et al., 2022). Other forms of horizontal data sharing are data collaboratives (Hale et al., 2003; Susha et al., 2019) aimed at sharing and reusing data across sectors to address societal challenges (Susha et al., 2023). Smart city initiatives, as illustrated in the case of Maas Madrid, establish horizontal data sharing among different mobility providers to optimize transportation and benefit citizens.
[See PDF for image]
Fig. 1
Vertical data sharing versus horizontal data sharing
The paradigm shift from data sharing 1.0 to data sharing 2.0 enables companies to create a joint value proposition around shared data assets, such as achieving economies of scale by complementing their own data assets or pooling data management efforts (Azkan & Mã, 2022; Lefebvre et al., 2022; Oliveira et al., 2019). It reflects the general principles of the sharing economy, which comprise examples of collaborative consumption and production, leading to more intensive use of data as under-utilized assets while minimizing the costs related to data recollection and reuse (Legner, 2019). Examples for horizontal data sharing are displayed in Table 1 and highlight data ecosystems’ participatory aspect and bidirectional nature, with the actors endorsing the roles of the producers and the consumers, also called data prosumers (Otto & Aier, 2013).
Table 1. Examples of horizontal data sharing mentioned in literature and by EU initiatives
Community | Sector | Members | Goal | Shared data | Source |
|---|---|---|---|---|---|
Join Data | Agriculture | 10 actors from the agricultural sector, e.g., farmers, laboratories, knowledge institutes, and agricultural entrepreneurs | Agricultural innovation. Derive data-driven insights, improve processes and profitability, and optimize their production | IoT data, Geo data | (EIP-AGRI—European Commission, 2018; Support Centre for Data Sharing, 2022) |
Skywise | Aviation | 300 actors, both airlines having Airbus aircrafts and manufacturers/suppliers | Derive valuable insights for improving plane components and anticipate maintenance | IoT data from planes, Geo data, product and material master data | (European Commission, 2023; Jussen, Fassnacht, et al., 2024) |
CDQ Business Partner Data Sharing Community | Cross-industry | 40 multinational companies | Maintain and enrich business partner master data for cost-sharing and increasing quality | Business partner master data, Financial data | (Jussen, Fassnacht, et al., 2024; Schweihoff et al., 2024; Support Centre for Data Sharing, 2022) |
Maas Madrid | Mobility | 70 mobility actors in Madrid | Optimize and improve the public transport efficiency | Product master data, IoT data, Geo data | (Lopez-Carreiro et al., 2021; Support Centre for Data Sharing, 2022) |
However, in practice, cross-company data sharing, sometimes called B2B data sharing, still faces significant challenges, which have hampered the uptake of data ecosystems. For instance, public and private actors are reluctant to share data outside their organizations, due to misaligned incentives, privacy concerns, and a lack of collaboration (Skatova et al., 2014; Susha et al., 2017a, 2019). A comprehensive review of existing literature reveals four types of concerns essential for creating and sustaining data sharing, which extend beyond merely technical implementations and platforms (Susha et al., 2019):
Regulatory concerns arise from a general confusion on legal frameworks leading to conservative approaches to data sharing.
Organizational concerns are caused by misaligned incentives, unclear roles, and pervasive lack of trust within data sharing communities.
Data-related concerns arise from data quality and security challenges, and the complexity of proper utilization and analysis.
Societal concerns stem from inclusivity and general impact of data sharing to achieve equitable and effective interventions.
These barriers remain prominent in the data sharing literature, which continues to be recognized as an emerging research field. For example, a recent comprehensive review by Fassnacht et al. (2023) reveals that defining the strategic value of inter-organizational data sharing remains a significant challenge, hindered by limited management commitment, inadequate use case identification, and an unclear value proposition. Likewise, a recent study by Jussen et al., (2024a, 2024b) underscore practitioners’ difficulties in establishing a robust data sharing ecosystem, particularly in defining participation levels, necessary roles, and formalized structural requirements in policies and contractual agreements.
Adding a practice lens on data sharing
Despite recent insights on the fundamental components of data sharing, including intermediaries, participants, and data categories (Azkan & Mã, 2022; Jussen et al., 2023), both literature (Susha et al., 2019) and practice (European Commission, 2018) point out the absence of comprehensive guidelines for inter-organizational data sharing. Existing research primarily focuses on defining the scope of data sharing initiatives and establishing the necessary technical infrastructure, such as data marketplaces (Abbas et al., 2024; Schweihoff et al., 2024) and dataspaces (Möller et al., 2024; Otto & Jarke, 2019). This implementation-centric view must be complemented with a practice-based perspective and investigate the active roles of data providers, consumers, and intermediaries in creating and realizing that value proposition (Azkan & Mã, 2022; Von Scherenberg et al., 2024). Additionally, while recent studies have focused on the emergence and formalization of data ecosystems (Aaen et al., 2022; Gelhaar & Otto, 2020), they fall short of explaining how to sustain them over the long term. Yet, keeping an ecosystem viable and maintaining its joint value proposition are critical to its survival (Azkan & Mã, 2022; Oliveira et al., 2019).
Communities of practice are a conducive environment for cross-company data sharing as they allow adding an exclusive practice perspective to data ecosystems (Jarvenpaa & Markus, 2020; Lefebvre et al., 2022). CoPs are groups of people who share a concern, a set of problems, or a passion for a topic, and who deepen their knowledge of and expertise in this area by interacting on an ongoing basis (Nicolini et al., 2022; Wenger et al., 2002). Three essential characteristics, as illustrated in Fig. 2, identify and characterize communities of practice (Wenger et al., 2002). First, members should share a common domain of interest. Second, they benefit from mutually engaging, regularly interreacting, sharing, and learning together in a community setting. Third, members work on developing a shared repertoire of practices that they will be able to implement in their working area. Thereby, practices are viewed as “recurrent, materially bounded and situated action engaged in by members of a community” (Orlikowski, 2002, p. 256). Sustaining these shared practices is essential, as the continuity of operations and members’ commitment are critical to preventing the collapse of the community of practice (Nicolini et al., 2022).
[See PDF for image]
Fig. 2
Three characteristics of a Community of Practice
Building on this theoretical lens, horizontal data sharing communities (DSCs) are composed of selected organizations (community members) sharing the same domain of interest who collaborate to create value from shared data assets. Within this framework, members can “achieve a common goal through effective collaboration processes and high levels of information sharing that could not be accomplished by a single organization independently” (Schlosser, 2017, p. 25). This conceptualization resonates with a data ecosystem’s “network of multiple actors” (Otto et al., 2019, p. 5) spanning public-public, private–public, and private-private collaborations. These actors share data to pursue common objectives, ultimately benefiting the entire ecosystem (Azkan & Mã, 2022; Oliveira et al., 2019). Operating as closed ecosystems similar to gated communities or club goods, data sharing communities create exclusive environments with defined boundaries, governed by shared rules and standards that guide collective actions (Blakely, 2006; Buchanan, 1965; Manzi & Smith-Bowers, 2005). In this paper, we seek to theorize the design of data sharing communities and thereby explain how these communities are developed and operated in a way that sustains joint value creation for all members.
Research context and methodology
Research context: The Business Partner Data Sharing Community
For studying the development and operations of data sharing communities, we use ADR as a research method, which focuses on “generating prescriptive design knowledge through building and evaluating ensemble IT artifacts in an organizational setting” (Sein et al., 2011, p. 40). ADR comprises close collaboration between researchers and practitioners. In our case, we partnered with CDQ, a university spin-off and a pioneer in data sharing community development and operations (CDQ, 2024). CDQ’s Business Partner Data Sharing Community (BP-DSC) was chosen for its sustained success over eight years, exemplifying a robust shared value proposition among an expanding, diverse membership from various industries, including direct competitors. Additionally, the BP-DSC has gained recognition in academic literature and at the European level (Table 1).
BP-DSC is a horizontal data community comprising more than 40 multinational companies, as of 2024, collaboratively managing customer and supplier data (hereafter referred to as business partner data). The community emerged from a research consortium (Österle & Otto, 2010) composed of university researchers and practitioners from the mentioned European corporations, and aimed at jointly developing data management practices. At this time (2015), participating firms shared a common concern about the high costs of managing data across business units or divisions (group data or corporate data). Each company performed very similar tasks to create and maintain their master data and had to invest significant skills and resources to ensure their data quality was high. This was especially true in the context of business partner data, which was business critical for each company, but which also overlapped significantly between the companies. Discussions of these pain points and shared practices allowed the group to develop the idea of applying the sharing economy’s principles to corporate data and to pilot an approach through which the business partner data was shared and managed collaboratively (Schlosser, 2017). After a pilot phase, the first companies started sharing data productively in 2017. The university spin-off was tasked with managing the data sharing platform and with supporting the community composed of companies from different industries (e.g., pharmaceutical, automotive, retail). Over time, the community members developed three types of shared practices (Table 2): (1) Shared management of data definitions and quality rules; (2) shared management of external reference data; (3) shared management of data assets.
Table 2. Shared practices developed by the BP-DSC (CDQ, 2024)
Shared practices | Description | Objective |
|---|---|---|
Shared management of data definitions and quality rules | Maintenance of a data model for business partners (including attributes such as the address, tax number, bank account, mapping to the operational systems). Maintenance of > 2600 data quality rules comprising the completeness, correctness, and non-redundancy of the global coverage | Measurement of business partner data quality and detection of duplicates based on a comprehensive set of up-to-date rules |
Shared management of external reference data | Maintenance of > 70 trusted external data sources, e.g., open data, commercial registers, and specialized databases with > 70 different identifiers for companies worldwide and > 3000 legal forms | Validation of entries and enrichment with external reference data (e.g., addresses) |
Shared management of data assets | Sharing of > 200 million data records on business partners and their updates in the data sharing pool | Automated creation and updating of validated data records |
Trough the shared management of data definitions and quality rules, members combine company-specific perspectives and establish a common understanding of the business partner data. This way, the BP-DSC develops the conceptual foundations required to share the business partner data across the company boundaries (Schlosser, 2017). This included data models, business rules, reference data, and processes, which had been jointly developed and were documented on the BP-DSC’s wiki. Through shared management of external reference data, external data sources (e.g., open government data from corporate registers, data acquired from specialized providers) that provide company information as a “trusted source” are identified. Building on these foundations, the shared management of data assets comprised the workflows’ implementation to provide and validate data records, as well as related updates on the data sharing platform. When sharing data records, it is essential to uphold basic principles, such as maintaining the anonymity of companies and ensuring data quality and trustworthiness through the four-eye principle (i.e., review by a second company).
As of 2024, the data sharing platform contains about 200 million data records that are constantly maintained and expanded by using more than 2600 data quality rules and that integrate more than 70 trustworthy reference data sources (CDQ, 2024). By sharing business partner data, member companies realize economies of scale, while significantly improving their data’s quality. Among the direct benefits are a significant reduction in the data’s life cycle costs, i.e., the expenditure on the data creation and maintenance, as well as the reduction in the quality assurance’s costs. Improved data quality leads to more up-to-date, accurate, and complete data, which has positive, indirect effects on all internal business processes relying on business partner data, i.e., primarily the purchasing, marketing, and sales processes. Further indirect benefits arise from the 360° view of the business partners, which enables better decisions and business insights, ensures compliance with regulatory requirements, and reduces risks. The university spin-off acts as an intermediary providing the technological infrastructure for sharing data and supporting the community (e.g., workshops five times a year) and ensures that the community grows.
In 2021, members of the BP-DSC proposed a new data sharing community focused on maintaining business partner data specific to the healthcare industry (HCO-DSC). This initiative, which initially involved three pharmaceutical and healthcare companies (out of the 20 BP-DSC members at the time of the study) provided an opportunity for researchers and experts to build upon, reflect on, and learn from the BP-DSC’s development and operation with the goal of establishing and supporting data sharing communities in a more professional and effective manner.
Research process
Overview
The design theory was developed through a research process that followed the ADR framework’s four stages (Fig. 3). We began addressing immediate and relevant problems by employing purposeful artifacts, since these are regarded as the central foundation of design theories. Throughout this process, the research team collaborated with the university spin-off, an external advisor (not only involved in the creation of the BP-DSC, but also with extensive experience from the GS1 ecosystem), and the data sharing community members. It analyzed their existing practices and issues and contributed to the revision of existing and development of new practices, with the goal of improving the process for developing and operating data sharing communities. From February 2021 to April 2022, one researcher was fully immersed in the BP-DSC activities and supported first the HCO-DSC, then the BP-DSC, thereby making observations, participating in workshops, and contributing to the development and documentation of the community's practices. The strong cooperation not only allowed us to make direct observations, conduct semi-structured interviews, and access physical artifacts, such as organization-specific documentation and tools, but also allowed our research to influence the organizational setting and contribute to the development and operations of the data sharing community.
[See PDF for image]
Fig. 3
ADR process followed for this research based on Sein et al. (2011)
Problem formulation
When the idea of HCO-DSC as a new data sharing community was introduced, it sparked a discussion on how to develop and operate data sharing communities. As part of the problem formulation, we analyzed data sharing’s specific challenges in the BP-DSC’s development and operations context. We compared these challenges to a list of 38 challenges and barriers to data sharing collaboratives and partnerships which we had identified from prior literature, including the comprehensive review by Susha et al.(2019) as well as more than 30 academic and practitioner publications. These span the regulatory, organizational, data-related, and societal categories. To identify specific challenges in horizontal data sharing, we asked three experts involved in the BP-DSC—two of whom had participated in its creation—to respond to an online questionnaire asking in which they rated the relevance of the each of the abovementioned challenges on a five-point Likert scale. The experts also had the possibility to explain their ratings and suggest other missing challenges based on their experience. Eventually, the experts prioritized 16 challenges with two or more positive ratings (agree or strongly agree) and confirmed five meta-requirements that should be addressed by a reference process for developing and operating data sharing communities.
Building, intervention, and evaluation cycles
The first building, intervention, and evaluation (BIE) cycle took place between February and May 2021, coinciding with the starting up phase of the HCO-DSC community. Its primary goals were to build a common ground of semantic knowledge for the HCO-DSC and to consolidate the experiences into the alpha version of the reference process. More specifically, this started with creating a list of relevant terms for the community, enriched with definitions, characteristics, and examples from several sources (e.g., member’s own business glossaries, other existing communities, and relevant online sources). During five working sessions, a team—comprising experts from the intermediary, the researcher, an external advisor, and community members from three companies—mapped different terms (e.g., “health care organization”) to industry-specific classification systems, such as the Standard Industry Classification (SIC) and the North American Industry Classification System (NAICS). Another key step toward shared semantics was defining business rules to link terms and concepts (e.g., the minimum set of attributes/field required to consider pharmaceutical supplier data complete). This semantic knowledge base was developed iteratively, involving research activities, such as literature review and analysis of definitions. As part of reflection and learning, the team focused on analyzing standards, vocabularies, and concepts necessary for building and implementing semantic knowledge for data sharing. This included reflections on the development and implementation of semantic frameworks within the HCO-DSC. Furthermore, the research team examined the literature on data sharing setup and implementation, using it to inform the reference process.
Progress and intermediate results were presented for feedback at two community workshops, which included senior management from each participating organization, and were submitted to informal formative evaluation. Based on these experiences, work on the reference process started, resulting in the alpha version with the process’s nominal phases committing, starting up, operating, and supporting, as well as the underlying process steps in the corresponding phases. This alpha version was then presented for review to two experts of the intermediary and the external advisor as formative evaluation, resulting in a few, rather small adjustments.
The second BIE cycle occurred from September 2021 to March 2022 and involved the larger BP-DSC community, comprising 20 members at this time, with the goal of refining the end-to-end reference process to develop and operate data sharing communities. In this cycle, we extended the alpha version—which primarily focused on the semantic aspects relevant to the starting up phase—to include exhaustive documentation, defined roles, and supporting deliverables for each step. To achieve this, we conducted interviews with key actors involved in BP-DSC’s development over the years and further analyzed its historical documentation. The reference process was intensively discussed during six community workshops of the BP-DSC, utilizing collaboration tools (e.g., Miro) to foster exchanges between community members and reach consensus. During these exchanges, we also asked community members to challenge existing key performance indicators (KPI) and metrics, and to provide suggestion for new, relevant ones. The parallel reflection and learning phase allowed the research team to examine the development and operation of both the HCO-DSC and BP-DSC, analyzing their differences and commonalities with other data sharing communities. By drawing on these insights, the team could transfer key learnings to a broader set of problems and goals, enriching the understanding of data sharing initiatives in various contexts. The resulting beta version of the artifact provides a more detailed and comprehensive framework for the implementation and management of data sharing communities.
The reference process was then assessed using the following summative evaluation criteria (Prat et al., 2015): understandability, usefulness, accuracy, completeness, and adaptability. Regarding accuracy and understandability, two-thirds of the participants strongly agreed. All participants agreed that the reference process is complete. Moreover, all participants strongly agreed on the completeness of the phases and steps. One expert interestingly highlighted the need for a “business model for shared data,” suggesting it as a potential addition to future iterations of the reference process. Some participants expressed reservations about the adaptability of the reference process for data sharing communities that prioritize insights generation over joint data management. Nevertheless, they unanimously recognized its usefulness and relevance in addressing shared challenges, as reflected in the meta-requirements.
Formulization of learning
In the formalization of learning stage, we formalized our design knowledge as design theory. We extracted the design principles by transforming the situated learnings from the HCO-DSC and BP-DSC into general solution concepts, which subsequently allowed us to generate design principles of form and action. We also reviewed the literature to justify and explain our design decisions and the testable propositions and formulated them fully in this phase.
To ensure that the design principles for developing and operating the data sharing community were developed in a methodologically sound way, we followed the “Method for Design Principle Development in Information Systems” suggested by Möller et al. (2020). This method offers a taxonomical approach of design principles development (Fig. 4). With a strong focus on building prescriptive knowledge, our research design has a reflective research perspective, because design principles emerge after and during the reference process’s design iterations (Gregor, 2006). The design principles were documented by following guidelines of Gregor et al. (2020), who recommend structuring principles into aim, context, and mechanism. Additionally, these authors suggest that principles should be justified by knowledge from the literature or empirical evidence. We utilized the CoP theoretical lens to extract design knowledge from the BIE cycles and then generalized the findings to form a set of empirically produced principles in order to generate the design theory for horizontal data sharing communities.
[See PDF for image]
Fig. 4
Characteristics of the design principles development in our study based on Möller et al.’s (2020) taxonomy
A design theory for horizontal data sharing communities
Our design theory has six core components (Gregor & Jones, 2007). The purpose and the scope (1) specify the practice-inspired causa finalis and constitute a set of requirements that specifies the artifact that the theory facilitates. We extract this set from our insights in the early stages of the ADR project and formalize five meta requirements. Constructs (2) are representations of the entities of interest in the design theory, and—in our case—derived from communities of practice in extant literature: domain of interest, relationships between members, and shared repertoire of practices. The principles of form and function (3) are the essence of the design theory. We develop eight design principles (DPs) that capture the knowledge gained about the process of operating and developing data sharing communities, and encompass knowledge about creating other instances that belong to same class of solutions (Sein et al., 2011). By means of testable propositions (4), we develop truth statements that relate a design principle to specific meta requirements. Justificatory knowledge (5) represents insights from literature that inform, explain, and validate our design decisions. In addition, we provide an expository instantiation (6) that exemplifies how the design principles materialize in the different phases of committing, starting up, operating, and supporting data sharing communities.
Defining the purpose and scope by means of meta-requirements
Design theory’s purpose is to provide practitioner-oriented guidance and overcome the manifold challenges in developing and operating data sharing communities. As part of the problem formulation, the research team analyzed general data sharing’s challenges from the literature and compared them to the specific challenges in horizontal data sharing communities. Based on experiences from the Business Partner Data Sharing Community, “measuring of impact and value,” “lack of consistency of data and resources,” “differences in terminologies and frames of reference,” and “fear of losing control and lack of trust” were rated highest by the three experts (Table 1). For the latter, for instance, one participant raised that its senior management does not support enough the community because it is not developed by them, typical of the “not invented here” syndrome (Katz & Allen, 1982). Overall, 16 challenges (one of regulatory type, 11 of organizational type, three of data-related type, and one of societal type) were raised and addressed in the five meta-requirements (Table 3).
Table 3. Meta-requirements for developing and operating data sharing communities
Challenges (from ADR study) | Meta-requirements | Justificatory knowledge (from literature) |
|---|---|---|
• Fear of losing competitive advantage when sharing data • Difficulties in measuring impact and value • Misalignment of incentives • Unclear value proposition for data providers • Low uptake on data providers | MR 1 – Provide incentives and demonstrate the value of data sharing | (Azkan & Mã, 2022; Wixom et al, 2020; Savage & Vickers, 2009; Oliveira et al., 2019; Tenopir et al., 2011; Weyzen et al., 2021; Gelhaar et al., 2023) |
• Difficulties in collaboration • Lack of coordination in roles, resources and activities | MR 2 – Ensure equal participation and responsibilities | (Azkan & Mã, 2022; Susha et al., 2017a, 2019; The Gov Lab, 2021) |
• Data security concerns • Privacy issues • Fears of losing control and lack of trust in other parties • Lack of control on the data • Problem of informed consent of data subjects | MR 3 – Establish institutional framework that instills trust | (Aaen et al., 2022; Abbas et al., 2024; Wixom et al, 2020; Oliveira et al., 2019; Susha et al., 2017a; Taylor & Mandl, 2015; The Gov Lab, 2021; World Economic Forum, 2015) |
• Differences in organizational norms, culture, and practices • Differences in terminologies and frames of reference | MR 4 – Clarify the terminology and other relevant data-related norms | (Azkan & Mã, 2022; Gelhaar et al., 2021; The Gov Lab, 2021; Weyzen et al., 2021) |
• Lack of consistency of data and resources • Low or uncertain data quality | MR 5 – Maintain data quality in the pool | (Gelhaar & Otto, 2020; Wixom et al, 2020; Oliveira et al., 2019; Susha et al., 2017b; Weyzen et al., 2021) |
To motivate companies to join the community and share data, the DSC must provide incentives and demonstrate the value of data sharing (MR1). This implies that the members should also acknowledge that they are not competitors in the community but must cooperate and collaborate fully and work as partners toward shared benefits. This further entails that all the members assume equal participation and responsibilities (MR2). Since their apparent loss of control over their data, or a general lack of trust could affect the members, procedures should specify the community’s institutional framework that installs trust (MR3). In view of the members’ different backgrounds, the data sharing community must provide means to harmonize between technologies, practices, and cultures. Most importantly, it must clarify the terminology and other relevant data-related norms (MR4). Ultimately, the community should actively improve consistency and quality of data in the pool (MR5) in order to deliver in accordance with the community roamap and to achieve the expected value proposition.
Design principles
Eight principles of form and function form the core of our theory and reflect our learnings in developing and operating data sharing communities. For each, Table 4 summarizes the aim, mechanism, and context, as suggested by Gregor et al. (2020), and provides both a rationale and testable propositions. Building on our theoretical lens, we cluster these principles around a CoP’s three characteristics: domain of interest, community, and shared practice. We also show in a visual manner the relationship between design principles to meta requirements (Fig. 5).
Table 4. Overview of the design principles to develop and operate data sharing communities
CoP | Design principles | Rationale | Testable propositions |
|---|---|---|---|
Domain of Interest | DP 1 – Case for action Aim: To identify the data sharing community’s scope and shared domains of interest Mechanism: Community members should pinpoint their shared data management challenges in a “case for action.” Context: In the committing phase | • Aligned incentives are critical for a CoP’s success (Susha et al., 2019; Wenger et al., 2002) • A data ecosystem aims to solve common challenges that actors face (Gelhaar et al., 2023) | The presence of a strong motivation and a clearly articulated "case for action" at the outset clarifies the community’s purpose and incentivizes companies to join the community |
DP 2 – Value proposition Aim: To communicate the data sharing community’s value and impact Mechanism: The value proposition of shared data management should describe its expected direct and indirect benefits, as well as the methods for measuring its impact Context: In committing and operating phases | • A CoP’s “preliminary design” should be built on a shared value proposition (Wenger et al., 2002) • Data ecosystems survive only if a clear value proposition is detailed out (Oliveira et al., 2019) | The formulation of a clear and community-approved value proposition for shared data management, combined with impact measurement, significantly enhances members' recognition of the community's value | |
Community | DP 3 – Community charter and guidelines Aim: To clarify the institutional framework for the community Mechanism: Community guidelines and procedures should include a roadmap with deliverables, as well as the collaboration and participation mechanisms Context: In committing phase | • A documentation of the activities and the responsibilities is key for CoP development (Wenger et al., 2002) • Charters or procedures are necessary in the context of data-driven services (Schlosser, 2017) | The collaborative development of community charter and the formulation of basic community rules into a set of community guidelines, significantly facilitates collaboration and strengthens the cohesion and trust among community members |
DP 4 – Community members as prosumers Aim: To ensure all the community members contribute actively to the shared practice Mechanism: Community members should endorse the roles of providers and consumers of the data assets in the data pool, thereby acting as data prosumers Context: In committing and operating phases | • A CoP entails all its members exchanging equivalent practices (Wenger et al., 2002) • Members of data ecosystems are often prosumers seeking mutual benefits (Azkan & Mã, 2022) | The equal participation of DSC members in both providing and consuming data assets provides the basis for collaboration and ensures the sustained availability of data | |
DP 5 – Community support Aim: To facilitate community operations in a trusted environment Mechanism: A neutral intermediary should provide organizational and technical support, and report the successes and KPIs regularly Context: In committing, starting up, operating, and supporting phases | • The community intermediary plays a central role in the community facilitation (Skatova et al., 2014) • Measuring the success and value of a CoP is often challenging (Legner, 2019; Nicolini et al., 2022) | The provision of a wide range of services by an intermediary, including performance monitoring, organizational and technical support, coupled with the active facilitation and event planning by a community intermediary, significantly enhances collaboration and the perceived success and value of DSCs by its members | |
Shared Practice | DP 6 – Shared semantics Aim: To standardize the community members’ norms, practices, and terminology Mechanism: A common business vocabulary and rulebook should be developed, accepted, and used by all the community members Context: In the building and the operating phases | • Using the same language or semantic is critical to overcome knowledge obstacles in CoPs and to standardize the norms, practices, and terminology (Nicolini et al., 2022) • Establishing a shared semantic is a key step in creating a data ecosystem (Oliveira et al., 2019) | The development and consensus on a common business vocabulary among all members of the DSC contributes to a clearer understanding and harmonization, and a more effective management of the shared data pool, especially given its large and evolving volume The intermediary's diligent documentation and regular updating of relevant data and business rules significantly enhance the quality, relevance, and value of the DSC's data pool |
DP 7 – Expansion of shared data assets Aim: To expand the volume of share data assets Mechanism: The intermediary should provide periodic updates and communications about the community’s data landscape, as well as communicate the external data sources that are relevant for enriching the pool Context: In starting up and operating phases | • In a CoP, communication is essential for task completion (Oliveira et al., 2019) • External data sourcing practices emerges as key for value creation from the data (Krasikov & Legner, 2023) | The systematic screening and quality assessment of contributed data and the proactive incorporation of enriching external data sources, significantly enhance the quality, relevance, and value of the DSC's data pool | |
DP 8 – Shared data management practices Aim: To grow the community and its benefits Mechanism: Community members should continuously refine the required data management practices that need to be implemented on the data sharing platform Context: In starting up and operating phases | • The refinement and development of shared practice in a CoP and its members’ foster learning and a shared understanding of the problems and solutions(Wenger et al., 2002) • Shared data management is a value creation mechanism for data sharing communities (Lefebvre et al., 2022) | The continuous development of shared data management practices within a DSC contributes to the perceived success and value of DSCs by its members and enhances the quality, relevance, and value of the DSC's data pool |
[See PDF for image]
Fig. 5
Relationship between design principles to meta requirements
Domain of interest
A strong motivation and a “case for action” are required as a starting point when developing a data sharing community (DP1). This starting point can originate from industry (or from an existing community) where organizations (i.e., potential community members) raised common challenges regarding their data management practices. Documenting shared data management challenges in a “case for action” (DP 1) is critical to align the members’ incentives (Susha et al., 2019). Furthermore, since CoPs can fail rapidly when members’ interests diverge (Wenger et al., 2002), the intermediary body should therefore ensure that each member’s expectation is thoroughly understood by, for example, having a one-to-one meeting with this member.
When individual and collective incentives are understood, a value proposition for shared data management should be formulated, which the community (DP 2) should formally approve. The community (Wenger et al., 2002) and the intermediary should, in the form of “preliminary design,” provide a description of the community, its scope, its domains of interest, its roles, the type of data shared, and the key expected benefits. By doing so, the members can acknowledge the community’s value and identify their data management practices’ synergies.
Community
The community operations should be explained in a detailed planning that the members build collaboratively, which includes the phases, tasks, milestones, roadmap, and related deliverables. Accordingly, the resources, roles, and responsibilities (e.g., through a RACI matrix which defines who is responsible, accountable, consulted, and informed) should be mapped out. The community members define the basic community rules, and all the stakeholders should follow these. All these requirements and specificities should be grouped to create community guidelines that the entire community acknowledges (DP3). This could be considered a “commonly agreed collaboration charter” (Schlosser, 2017), defining the community’s basic principles, the community members’ acceptable behavior, and generally aimed at strengthening the community members’ cohesion with the community (Nicolini et al., 2022). Overall, as one member reflected: “Looking back at the community development process, it would have been valuable to set up a general community guideline (i.e., non-technical) at the beginning and then set up more technical guidelines (e.g., about the storage). It is also valuable to reuse existing guidelines, such as a GS1”.
A CoP’s success relies on all of its participants being active and sharing their experiences proactively (Nicolini et al., 2022). For data sharing communities, this also entails that all the members endorse the role of providers and consumers of data assets in the shared data pool, thereby acting as data prosumers (DP 4) (Otto & Aier, 2013). Consequently, the members should ensure their data’s availability for data sharing, as agreed in the community guidelines, in order to ensure the value proposition. The members should therefore trust the community data sharing policies and contracted service agreements.
Furthermore, the members benefit from a wide range of services, such as the organizational (e.g., agenda, workshops, facilities) and technical support (e.g., use of the solution and the technical integration) that the intermediary provides, who also ensures progress and the tracking of success (DP 5). In fact, the community intermediary plays a central role in the community’s life, because his/her main role is to plan and facilitate community events (Wenger et al., 2002). Measuring data sharing communities’ success and value against the initial expectations is often perceived as a challenge (Susha et al., 2019). Reporting success-related KPIs is therefore a suitable way of convincing the members of a data sharing community’s benefits (e.g., the data pool’s growth or the number of shared data assets managed).
Shared practice
Using the same language or semantics is crucial to overcome a CoP’s knowledge obstacles and to standardize the norms, practices, and the terminology (Nicolini et al., 2022). All data sharing community members should develop and agree on a common business vocabulary in order to clarify the shared elements’ meanings and how they are represented in the community data model (DP 6). Owing to the large and evolving volume of data in the pool, it is often the intermediary's task to document, for example, data objects and attributes. The community should also settle on a definition and the synonyms, parent types, subtypes, and key characteristics of each data object. Similarly, the intermediary should document other data that is relevant for the shared practice, such as the business rules (e.g., the business partner’s name is missing) and update this when necessary. Besides maintaining the data model, the intermediary must ensure the regular update of the community’s data landscape, including the externally sourced data required to enrich the data pool (DP7). The intermediary should screen the data that community members contribute and assess their quality before sharing them. Such verification helps to identify data concepts and rules that could impact the pool. However, to achieve the community’s objectives and support the shared practice, the intermediary (the customer success managers in this case) should search for external sources that might enrich the pool. For instance, external data used in the BP-DSC includes reference data (e.g., country codes), an official institution’s datasets (e.g., The European Medicines Agency’s dataset containing all the healthcare organizations and their related addresses), and business registers. However, they can only be integrated after a proper sourcing process (Krasikov et al., 2022), an assessment of their quality, and approval by their members. Focus groups, workshops, and regular meetings support these data management practice exchanges. Consequently, new practices requiring the intermediary to maintain the data sharing platform and integrate the changes (DP8) might emerge. Examples of such practices include, for instance, new data quality and business rules.
Expository instantiation
As an expository instantiation, we demonstrate the design principles’ application “for the purposes of theory representation or exposition” (Gregor & Jones, 2007) in the reference process for developing and operating data sharing communities, which was developed based on experiences from BP-DSC and HCO-DSC. We illustrate how the design principles (Table 5) are incorporated in the different phases of the community life cycle (Fig. 6) from Committing to Starting up and Operating, with Supporting the Community as the underlying phase.
Table 5. Instantiation of the design principles along reference process’ steps
Design principles | Phase and steps | Instantiation examples |
|---|---|---|
DP 1 – Case for action | A.1, C.4 | A list of common pain points, needs, drivers, e.g., a survey; a synthesis of the shared challenges and potential actions. It is crucial to confirm that the cause is truly common and shared by multiple organization and/or industry |
DP 2 – Value proposition | A.2, A.3, C.2, C.3, C.4 | Documentation of the benefits gained from shared data management, including scope, domain of interest, and high-level objectives Monitoring and reporting (e.g., using quantitative and qualitative metrics) |
DP 3 – Community charter and guidelines | A.4, C.4 | A set of documents defining the community principles and characteristics (e.g., the goals, deliverables, roadmap, planning, RACI matrix) in line with community objectives and KPIs to measure the community’s success |
DP 4 – Community members as prosumers | A.2, A.3, A.4, B.1, C.4 | Members’ written commitment to actively contribute to the communities’ exchanges. Clear guidelines including the data sharing policy; data sharing solution integration at a member firm |
DP 5 – Community support | D | Support by an intermediary to enable efficient collaboration from organizational perspective (e.g., facilitation of workshop, moderation of discussions, best practice and experience exchange) and technical perspective (solution use and technical integration), as well as implementation of mechanisms for measuring and reporting community impact (e.g., KPIs) |
DP 6 – Shared semantics | B.2, B.3, B.4, C.4 | Community data model (including tables and templates for data objects) and Semantic Wiki for the implementation of the business vocabulary and rulebook |
DP 7 – Expansion of shared data assets | B.1 C.1 | A list of existing data objects, including important concepts and standards, complemented with relevant external sources that can be integrated in the existing data pool An overview of the data pool’s composition |
DP 8 – Shared data management practices | B.2 C.2, C.4 | Procedures to continuously maintain and share data, Community agreement to update the data sharing solution; decisions on whether to integrate a new data record; formulation of new data quality rules |
[See PDF for image]
Fig. 6
Reference process for developing and operating data sharing communities
Expanding on our expository instantiation, Table 5 synthesizes the examples of design principles’ application along the phases and steps of the reference process.
Committing phase
This committing phase initiates the development of a data sharing community. Key steps include documenting the challenges and need for community development (as a basis for a “case for action”, DP1), defining the initial scope and purpose of the community (including the value proposition, DP2), organizing a pilot meeting to engage potential members, and collaboratively developing community guidelines (DP4) based on member perspectives.
At its core, this phase aims to identify pressing challenges and gather parties who see strong benefits from collectively managing data. For example, firms from the healthcare industry must comply with regulatory requirements that obliged them to classify the healthcare professionals with whom they interacted. As every company was concerned by this requirement, the community members were able to define a “case for action” (DP1) that would clarify the community activities’ purpose. It is crucial to confirm that the cause is truly common and shared by multiple organization and/or industry, and not imposed by one or few companies. This can be achieved by means of surveys that validate the pain points, needs, and drivers.
Furthermore, the phase involves articulating a clear purpose for the community and delineating its goals, characteristics, and operational guidelines (DP4). As a senior stakeholder involved in the ADR iterations highlighted: “Looking back at HCO community development process, it would have been valuable to set up general community guideline (i.e. non-technical) at the beginning and then set up more technical guidelines (e.g. about the storage, etc.). It is also valuable to reuse existing guidelines such as DSC or others like GS1.”
Starting up phase
In the starting up phase, the community involves creating a cohesive framework for collaboration and data sharing. Among the key activities are the pre-analysis of external sources and data from community participants (constituting the baseline for the shared data assets, DP7), the development of a common business vocabulary and rulebook and documentation of the community semantic (DP6), and the shared procedures to continuously update them (DP8).
At its core, this phase aims to prepare the sharing of data assets and to provide the shared language and terminology required for members to share data practices. As an initial step, community members, supported by the intermediary, conduct thorough analyses to evaluate the relevance, quality, and suitability of data sources. This assessment ensures that only high-quality and relevant data are incorporated into the community’s shared resources, laying the groundwork for effective collaboration and data sharing. Then, the community collaboratively defines concepts, business rules, and data quality standards, leveraging methodologies, such as SBVR (Semantics Of Business Vocabulary And Business Rules by the Object Management Group)(Object Management Group, 2019), to ensure coherence and consistency. By formalizing these standards, the community mitigates challenges stemming from organizational differences and fosters alignment in data sharing endeavors.
Another crucial aspect is the creation of a semantic wiki, which serves as a centralized repository for storing and accessing information related to the community’s vocabulary and rulebook (DP 6). By providing a centralized platform for sharing knowledge and resources, the semantic wiki enables community members to easily reference and contribute to the development of standards. The wiki goes beyond the traditional knowledge management system, but leverages semantic technologies, knowledge graphs, and the concepts of linked data (Auer et al., 2007; Bizer et al., 2009; Zaveri et al., 2016; Zuiderwijk et al., 2015). Figure 7 provides an example of a wiki article related to the data concept “business partner” in the context of BP-DSC implemented using the open-source framework Semantic MediaWiki (Semantic-Mediawiki.Org, 2024). All data created within the Semantic MediaWiki can easily be exported or published via the Semantic Web, leveraging the standards RDF—Resource Description Framework (for data description), OWL—Web Ontology Language (for giving the RDF terms a formal meaning), and SPARQL—Protocol and RDF Query Language (as a query language and protocol).
[See PDF for image]
Fig. 7
Example of a wiki article representing a business partner concept
Operating phase
In the operating phase, the focus shifts to the execution of planned activities and the delivery of solutions or services outlined in the community’s roadmap. This phase involves developing and delivering the solution based on the predefined roadmap (DP7), the maintenance of exchanges within the community, measurement of learnings and success (DP2), and improving the community through iterations (all DPs but DP5).
One of the primary tasks in this phase is the development and delivery of solutions or services based on the predefined roadmap. At the BP-DSC, the intermediary—as moderator of the community—facilitates meetings between individual companies to align on technical infrastructures and reaffirm project goals, characteristics, and deadlines. Once the solution or service is developed, the intermediary presents it and implements it, making necessary adjustments based on feedback and evolving needs.
Effective communication is essential for maintaining engagement and facilitating exchanges within the community (DP 5). Community members rely on regular updates and transparent communication channels to stay informed about the progression of the community and any new developments. The intermediary’s role is also to ensure continuous and equal participation of members to sustain the community activities. Measuring the success and value of the community is also critical for its long-term sustainability (DP 2). This phase involves operationalizing and communicating KPIs to assess the effectiveness and impact of the community's activities. KPIs are developed to assess the progress and success of community initiatives, providing valuable insights into areas of improvement and opportunities for growth (e.g., quantitative and qualitative surveys).
Finally, the operating phase also includes periodic reviews and adjustments to various aspects of the community to ensure its continued effectiveness and relevance. Community members collectively agree on potential iterations and refinements during community meetings, such as workshops, to share perspectives and make informed decisions. This iterative process allows the community to refine objectives, deliverables, and planning, as well as update community guidelines and semantic standards. Strategic decisions regarding community expansion are also considered, with input from senior community members to leverage their expertise and insights.
Supporting phase
This overarching phase aims at efficient collaboration of the BP-DSC members, which is facilitated through a set of practices aimed at both organizational and technical perspectives (DP5). From an organizational standpoint, the community intermediary plays a crucial role in orchestrating community workshops, moderating discussions, and facilitating the exchange of best practices and experiences among members. These workshops serve as platforms for social interaction, knowledge sharing, and fostering a sense of community cohesion. By carefully planning and executing these workshops, the community intermediary ensures that community members remain engaged and motivated. Simultaneously, from a technical perspective, mechanisms are implemented to support seamless collaboration and integration of solutions within the community. This includes the use of collaboration tools such as Miro to promote interaction and engagement during workshops, as well as technical integrations to streamline data sharing and exchange processes. By implementing these technical solutions, the community enhances its capacity for effective collaboration and knowledge exchange, thereby maximizing the value derived from community initiatives.
Contribution, limitations, and implications
Contribution
Data sharing, and specifically cross-company data sharing, has gained momentum in information systems research, and our study contributes to this emerging stream of research. Specifically, we draw the attention to horizontal data sharing communities, where organizations share complementary data assets with others and uncover a new joint value proposition, one that focuses on the value derived for shared data management efforts. We leverage insights from a pioneer data sharing community, involving more than 40 multinational companies, to derive prescriptive knowledge in terms of an IS design theory for horizontal data sharing communities. Building on Communities of Practice theory, the eight design principles which form the core of our design theory align with the three CoPs characteristics—domain of interest (two DPs), community (three DPs), and shared practice (three DPs). The design theory, serving as a reusable meta-artifact for developing and operating data sharing communities, responds to longstanding calls for such guidelines (European Commission, 2018; Susha et al., 2019).
Our study also contributes to the current discourse on data ecosystems and complements the platform-centric perspective with a practice perspective that elucidate “the whole functioning of Data Ecosystem” (Oliveira et al., 2019, p. 620), and particularly actors’ role (Azkan & Mã, 2022; Von Scherenberg et al., 2024). It thereby addresses open calls for research on a theory of data ecosystem, emphasizing “the proper coordination of various categories of actors and the provision of business support and stimulation of resources development and usage” (Oliveira et al., 2019, p. 620). The CoP concept provides a lens for understanding the community’s work and engagement, emphasizing the co-creation of practices, norms, and policies that are essential to sustaining the business models underlying horizontal data sharing. For horizontal data sharing communities, our research reveals that the development of shared semantics and data management practices is key to improving the value of shared data assets.
As participatory approach, CoP not only fosters a sense of ownership among the community members but also ensures that the practices are tailored to the unique needs and dynamics of the community. Central to the success is the intermediary. While the brokering role of intermediaries has been extensively discussed in research, our findings underline that their duties extend beyond operating a data sharing platform. As reflected in the design principles, the intermediary’s role is pivotal in sustaining community engagement and trust between members, supporting the development and documentation of shared data practices as well as monitoring and reporting the value of community activities. Thereby our findings underline its critical role in facilitating effective functioning and sustaining value creation within the community (Möller et al., 2024; Otto et al., 2019; Schweihoff et al., 2024).
Through the instantiation of our design theory, we show that data sharing communities thrive when members continually recognize a shared value across three growth stages: committing, starting up, and operating. By addressing the data ecosystem’s life cycle, we also extend existing research that mainly centered on the emergence of data ecosystem. Therefore, our findings offer critical insights into how to sustain the success of data sharing communities by addressing one of the core challenges faced by data ecosystems—ensuring that members continually perceive value in their participation, thereby preventing the erosion of engagement and the eventual collapse of the ecosystems (Aaen et al., 2022; Gelhaar & Otto, 2020; Susha et al., 2023). Altogether, our findings thus provide additional clarity on how “to design, maintain, or make Data Ecosystems evolve” (Oliveira et al., 2019, p. 620).
Limitations
While our contributions are substantial, it is important to recognize several limitations. The design theory is grounded in the specific context of the Business Partner Data Sharing Community (BP-DSC) and its sub-community (HCO-DSC), which consist of large multinational firms with mature data management practices. Our findings thereby may reflect specific needs and capacities of such organizations. Moreover, our study focused on horizontal data sharing of business partner master data, coupled with joint data maintenance practices. While the design principles we developed offer generalizable insights, they have not yet been applied to other data assets. These may come with varying levels of sensitivity, legal requirements, and trust concerns, necessitating tailored approaches beyond our current design principles. For instance, in sectors like healthcare, stricter data protection laws may require additional compliance mechanisms, which could, in turn, necessitate refinements to the design theory. Hence, future research may need to extend our design principles to different data types and sectors. Studies on smaller or resource-constrained firms could assess the scalability and flexibility of these principles, offering insights into their applicability across varied organizational contexts.
Implications
Our research presents a first step towards extending the practice perspective in data sharing research. We believe that the application of the CoP lens within data ecosystems offers substantial potential for groundbreaking research in this field. A detailed exploration of data sharing communities as virtual CoPs—considering their unique structuring characteristics such as demographics, organizational settings, membership profiles, and technological contexts (Dube et al., 2006)—could uncover nuanced insights into their collective value proposition, whether more operational or strategic, and the specific data practices that emerge. As situated learning is a cornerstone of CoP participation (Nicolini et al., 2022), subsequent research could explore the intricacies of inter-organizational knowledge sharing, particularly how individual members profit from interactions with experts from other firms. The boundary objects and practices that define CoP membership are also worthy of further investigation, as they help distinguish and structure participation within these communities. Finally, understanding and optimizing the critical role of intermediaries is essential for the sustainable growth of data sharing communities, presenting a rich area for future exploration.
For the emerging forms of horizontal data sharing, our research emphasizes on the development of shared practices, specifically related to semantics, expansion of data assets and data management. Our findings may be particularly interesting for data collaboratives (Fassnacht et al., 2024; Ruijer, 2021; Susha et al., 2017a). Traditionally viewed in the context of addressing public or societal issues through data sharing, the components of data collaboratives—such as shared motivation, structural formalization, collaborative activities, tensions, and outcomes—align closely with the conceptualization of data sharing communities as community of practice. This parallel suggests that our design theory may inform and complement the inter-organizational governance principles specifically for business-oriented horizontal data sharing.
Our design theory, rooted in the concrete challenges organizations face when sharing data with other parties, holds significant relevance for practitioners. The design theory, with its expository instantiation as a reference process, offers practitioners—and particularly intermediaries and service providers—with actionable support to overcome current challenges and address key factors essential to the success of data sharing communities.
Acknowledgements
The authors would like to thank CDQ, the members of the CDQ Data Sharing Community and the Competence Center Corporate Data Quality (CC CDQ) for their support and contributions.
Funding
Open access funding provided by University of Lausanne.
Data Availability
More details on the Business Partner Data Sharing Community are available in a PhD dissertation by Schlosser (2017) and upon request from the authors.
The first version of the design principles was published in the conference paper (Lefebvre et al. 2023).
Markus Bick
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Aaen, J; Nielsen, JA; Carugati, A. The dark side of data ecosystems: A longitudinal study of the DAMD project. European Journal of Information Systems; 2022; 31,
Aaltonen, A; Alaimo, C; Kallinikos, J. The making of data commodities: Data analytics as an embedded process. Journal of Management Information Systems; 2021; 38,
Aaltonen, A., Alaimo, C., Parmiggiani, E., Stelmaszak, M., Jarvenpaa, S. L., Kallinikos, J., & Monteiro, E. (2023). What is missing from research on data in information systems? Insights from the inaugural workshop on data research. Communications of the Association for Information Systems, 53, 15. https://doi.org/10.17705/1CAIS.05320
Abbas, AE; Van Velzen, T; Ofe, H; Van De Kaa, G; Zuiderwijk, A; De Reuver, M. Beyond control over data: Conceptualizing data sovereignty from a social contract perspective. Electronic Markets; 2024; 34, 20. [DOI: https://dx.doi.org/10.1007/s12525-024-00695-2]
Auer, S; Bizer, C; Kobilarov, G; Lehmann, J; Cyganiak, R; Ives, Z. DBpedia: A nucleus for a web of open data. The Semantic Web; 2007; 4825, pp. 722-735. [DOI: https://dx.doi.org/10.1007/978-3-540-76298-0_52]
Azkan, C., & Mã, F. (2022). Hunting the treasure: Modeling data ecosystem value co-creation. Proceedings of the 43rd International Conference on Information Systems.
Bizer, C; Heath, T; Berners-Lee, T. Linked data—The story so far. International Journal on Semantic Web and Information Systems; 2009; 5,
Blakely, E. J. (2006). Fortress America separate and not equal. In R. H. Platt (Ed.), The human metropolis: People and nature in the 21st-century city (Vol. 197). University of Massachusetts Press in association with the Lincoln Institute of Land Policy.
Buchanan, JM. An economic theory of clubs. Economica; 1965; 32,
Bühler, MM; Calzada, I; Cane, I; Jelinek, T; Kapoor, A; Mannan, M; Mehta, S; Mookerje, V; Nübel, K; Pentland, A; Scholz, T; Siddarth, D; Tait, J; Vaitla, B; Zhu, J. Unlocking the power of digital commons: Data cooperatives as a pathway for data sovereign, innovative and equitable digital communities. Digital; 2023; 3,
CDQ. (2024, March 22). CDQ.com. https://www.cdq.com/solutions/data-sharing-community
European Commission. (2018). Study on data sharing between companies in Europe. EU Publications. https://doi.org/10.2759/634327
European Commission. (2022, February 23). Data act: Measures for a fair and innovative data economy. Ec.Europa.Eu. https://ec.europa.eu/commission/presscorner/detail/en/ip_22_1113
Dube, L; Bourhis, A; Jacob, R. Towards a typology of virtual communities of practice. Interdisciplinary Journal of Information, Knowledge, and Management; 2006; 1, pp. 069-093. [DOI: https://dx.doi.org/10.28945/115]
EIP-AGRI - European Commission. (2018, November 18). JoinData: The future of smart farming [Text]. EIP-AGRI - European Commission. https://ec.europa.eu/eip/agriculture/en/find-connect/online-resources/joindata-future-smart-farming
European Commission. (2023). European data spaces and the role of data.europa.eu. (Nos. 978–92–78–43822–7). Publications office. https://data.europa.eu/doi/https://doi.org/10.2830/1603
Fassnacht, M; Leimstoll, J; Benz, C; Heinz, D; Satzger, G. Data sharing practices: The interplay of data, organizational structures, and network dynamics. Electronic Markets; 2024; 34, 47. [DOI: https://dx.doi.org/10.1007/s12525-024-00732-0]
Fassnacht, Benz, C., Heinz, D., Leimstoll, J., & Satzger, G. (2023). Barriers to data sharing among private sector organizations. Proceedings of the 56th Hawaii international conference on system sciences (HICSS). https://doi.org/10.24251/HICSS.2023.453
Gartner. (2021, May 20). Data sharing is a business necessity to accelerate digital business. https://www.gartner.com/smarterwithgartner/-is-a-business-necessity-to-accelerate-digital-business
Gelhaar, J., & Otto, B. (2020). Challenges in the emergence of data ecosystems. Proceedings of the 23rd pacific Asia conference on information systems.
Gelhaar, J., Groß, T., & Otto, B. (2021). A taxonomy for data ecosystems. Proceedings of the 54th Hawaii international conference on system sciences. https://doi.org/10.24251/HICSS.2021.739
Gelhaar, J., Bergmann, N., Müller, P., & Dogan, R. (2023). Motives and Incentives for data sharing in industrial data ecosystems: An explorative single case study. Proceedings of the 56th Hawaii international conference on system sciences (HICSS). https://doi.org/10.24251/HICSS.2023.454
Gregor, S. The nature of theory in information systems. MIS Quarterly; 2006; 30,
Gregor, S; Jones, D. The anatomy of a design theory. Journal of the Association for Information Systems; 2007; 8,
Gregor, S; Kruse, L; Seidel, S. Research perspectives: The anatomy of a design principle. Journal of the Association for Information Systems; 2020; 21,
GS1. (2024, November 5). https://www.gs1.org/
Hale, S. S., Miglarese, A. H., Bradley, M. P., Belton, T. J., Cooper, L. D., Frame, M. T., Friel, C. A., Harwell, L. M., King, R. E., & Michener, W. K. (2003). Managing troubled data: Coastal data partnerships smooth data integration. In Coastal Monitoring through Partnerships (pp. 133–148). Springer.
Jarke, M; Otto, B; Ram, S. Data sovereignty and data space ecosystems. Business & Information Systems Engineering; 2019; 61,
Jarvenpaa, S. L., & Markus, M. L. (2020). Data sourcing and data partnerships: Opportunities for IS sourcing research. In J. Dibbern, R. Hirschheim, A. Heinzl, & J. Dibbern (Eds.), Information Systems Outsourcing (5th ed., pp. 61–79). Springer.
Jussen, I., Fassnacht, M., Schweihoff, J., & Möller, F. (2024a). Reaching for the stars: Exploring value constellations in inter-organizational data sharing. Proceedings of the 32nd European Conference on Information Systems (ECIS), 1–17.
Jussen, I; Möller, F; Schweihoff, J; Gieß, A; Giussani, G; Otto, B. Issues in inter-organizational data sharing: Findings from practice and research challenges. Data & Knowledge Engineering; 2024; 150, 102280. [DOI: https://dx.doi.org/10.1016/j.datak.2024.102280]
Jussen, I., Schweihoff, J., Dahms, V., Möller, F., & Otto, B. (2023). Data sharing fundamentals: Characteristics and definition. Oroceedings of the 56th Hawaii International Conference on System Sciences (HICSS).
Katz, R; Allen, TJ. Investigating the not invented here (NIH) syndrome: A look at the performance, tenure, and communication patterns of 50 R & D Project Groups. R&D Management; 1982; 12,
Krasikov, P., Eurich, M., & Legner, C. (2022). Unleashing the potential of external data: A DSR-based approach to data sourcing. Proceedings of the 30th European Conference on Information Systems.
Krasikov, P; Legner, C. Introducing a data perspective to sustainability: How companies develop data sourcing practices for sustainability initiatives. Communications of the Association for Information Systems. Communications of the AIS; 2023; 53,
Lefebvre, H., Krasikov, P., Flourac, G., & Legner, C. (2022). Toward cross-company value generation from data: Investigating the role of data sharing communities. Proceedings of the Pre-ICIS Workshop of the AIM.
Lefebvre, H., Flourac, G., Krasikov, P., & Legner, C. (2023). Toward cross-company value generation from data: Design principles for developing and operating data sharing communities. In Gerber, A., & Baskerville, R. (Eds.), Design Science Research for a New Society: Society 5.0 (Vol. 13873, pp. 33–49). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-32808-4_3
Legner, C; Schemm, JW. Toward the inter-organizational product information supply chain–evidence from the retail and consumer goods industries. Journal of the Association for Information Systems; 2008; 9,
Legner, C. (2019). Data sharing in business ecosystems. TREO talks in conjunction with the 40th international conference on information systems (ICIS).
Loebbecke, C; Picot, A. Reflections on societal and business model transformation arising from digitization and big data analytics: A research agenda. The Journal of Strategic Information Systems; 2015; 24,
Lopez-Carreiro, I; Monzon, A; Lopez-Lambas, ME. Comparison of the willingness to adopt MaaS in Madrid (Spain) and Randstad (The Netherlands) metropolitan areas. Transportation Research Part a: Policy and Practice; 2021; 152, pp. 275-294. [DOI: https://dx.doi.org/10.1016/j.tra.2021.08.015]
Manzi, T; Smith-Bowers, B. Gated communities as club goods: Segregation or social cohesion?. Housing Studies; 2005; 20,
Möller, F; Jussen, I; Springer, V; Gieß, A; Schweihoff, JC; Gelhaar, J; Guggenberger, T; Otto, B. Industrial data ecosystems and data spaces. Electronic Markets; 2024; 34,
Möller, F., Guggenberger, T. M., & Otto, B. (2020). Towards a method for design principle development in information systems. In S. Hofmann, O. Müller, & M. Rossi (Eds.), Designing for Digital Transformation. Co-Creating Services with Citizens and Industry (Vol. 12388, pp. 208–220). Springer International Publishing. https://doi.org/10.1007/978-3-030-64823-7_20
Nicolini, D; Pyrko, I; Omidvar, O; Spanellis, A. Understanding communities of practice: Taking stock and moving forward. Academy of Management Annals; 2022; 16,
Object Management Group. (2019). Semantics of business vocabulary and business rules (No. formal/2019–10–02 [SMSC/19–10–02]). https://www.omg.org/spec/SBVR/1.5/PDF
Oliveira, MI; Barros Lima, GDF; Farias Lóscio, B. Investigations into data ecosystems: A systematic mapping study. Knowledge and Information Systems; 2019; 61,
Orlikowski, WJ. Knowing in practice: Enacting a collective capability in distributed organizing. Organization Science; 2002; 13,
Österle, H; Otto, B. Consortium research: A method for researcher-practitioner collaboration in design-oriented IS research. Business & Information Systems Engineering; 2010; 2,
Otto, B., & Aier, S. (2013). Business models in the data economy: A case study from the business partner data domain. Proceedings of the 11th International Conference on Wirtschaftsinformatik.
Otto, B., Lis, D., Jürjens, J., Cirullies, J., Howar, F., Meister, S., Spiekermann, M., Pettenpohl, H., Möller, F., Rehof, J., & Opriel, S. (2019). Data ecosystems. Conceptual foundations, constituents and recommendations for action. Fraunhofer Institute for Software and Systems Engineering ISST. https://doi.org/10.24406/isst-n-634865
Otto, B; Jarke, M. Designing a multi-sided data platform: Findings from the international data spaces case. Electronic Markets; 2019; 29,
Prat, N., Comyn-Wattiau, I., & Akoka, J. (2015). A taxonomy of evaluation methods for information systems artifacts. Journal of Management Information Systems, 32(3), Article 3. https://doi.org/10.1080/07421222.2015.1099390
Richter, H; Slowinski, PR. The data sharing economy: On the emergence of new intermediaries. IIC - International Review of Intellectual Property and Competition Law; 2019; 50,
Ruijer, E. Designing and implementing data collaboratives: A governance perspective. Government Information Quarterly; 2021; 38,
Savage, CJ; Vickers, AJ. Empirical study of data sharing by authors publishing in PLoS journals. PLoS ONE; 2009; 4,
Schlosser, S. R. (2017). Design principles for collaborative data services. University of St. Gallen, St. Gallen.
Schweihoff, J; Lipovetskaja, A; Jussen-Lengersdorf, I; Möller, F. Stuck in the middle with you: Conceptualizing data intermediaries and data intermediation services. Electronic Markets; 2024; 34,
Sein, MK; Henfridsson, O; Purao, S; Rossi, M; Lindgren, R. Action design research. MIS Quarterly; 2011; 35,
Semantic-mediawiki.org.(2024, October 9). Semantic MediaWiki. https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki
Skatova, A., Ng, E., & Goulding, J. (2014). Data donation: Sharing personal data for public good? 1–3. https://doi.org/10.13140/2.1.2567.8405
Support Centre for Data Sharing. (2022). Final report support centre for data sharing. https://dssc.eu/download/attachments/408059908/Support%20Centre%20for%20Data%20Sharing%20Final%20Report.pdf?download=true
Susha, I; Grönlund, Å; Van Tulder, R. Data driven social partnerships: Exploring an emergent trend in search of research challenges and questions. Government Information Quarterly; 2019; 36,
Susha, I; Rukanova, B; Zuiderwijk, A; Gil-Garcia, JR; Gasco Hernandez, M. Achieving voluntary data sharing in cross sector partnerships: Three partnership models. Information and Organization; 2023; 33,
Susha, I., Janssen, M., & Verhulst, S. (2017a). Data collaboratives as a new Frontier of cross-sector partnerships in the age of open data: Taxonomy development. Proceedings of the 50th Hawaii International Conference on System Sciences. https://doi.org/10.24251/HICSS.2017.325
Susha, I., Janssen, M., & Verhulst, S. (2017b). Data collaboratives as “bazaars”? A review of coordination problems and mechanisms to match demand for data with supply. Transforming Government: People, Process and Policy, 11(1), 157–172. https://doi.org/10.1108/TG-01-2017-0007
Taylor, P. L., & Mandl, K. D. (2015). Leaping the data chasm: Structuring donation of clinical data for healthcare innovation and modeling. Harvard health policy review: A student publication of the Harvard interfaculty initiative in health policy, 14(2), Article 2.
Tenopir, C; Allard, S; Douglass, K; Aydinoglu, AU; Wu, L; Read, E; Manoff, M; Frame, M. Data sharing by scientists: Practices and perceptions. PLoS ONE; 2011; 6,
The Gov Lab. (2021, December 29). Data collaboratives guide. https://datacollaboratives.org/
Von Scherenberg, F; Hellmeier, M; Otto, B. Data sovereignty in information systems. Electronic Markets; 2024; 34,
Wenger, E. Communities of practice: Learning, meaning, and identity (learning in doing: Social, cognitive and computational perspectives); 1998; Cambridge University Press:
Wenger, E., McDermott, R. A., & Snyder, W. (2002). Cultivating communities of practice: A guide to managing knowledge. Harvard Business School Press.
Weyzen, R., van Hesteren, D., & Huyer, H. (2021). Advanced technologies for industry – AT WATCH: Technology focus on data sharing. European Commission. https://monitor-industrial-ecosystems.ec.europa.eu/sites/default/files/2021-04/Technology%20Focus%20on%20Data%20sharing.pdf
Wixom, B., Sebastian, I., & Gregory, R. (2020). Data sharing 2.0: New data sharing, new value creation. MIT Sloan Center for Information Systems Research.
World Economic Forum. (2015). Data-driven development: Pathways for progress. Weforum.Org. https://www3.weforum.org/docs/WEFUSA_DataDrivenDevelopment_Report2015.pdf
Zaveri, A; Rula, A; Maurino, A; Pietrobon, R; Lehmann, J; Auer, S. Quality assessment for linked data: A survey. Semantic Web; 2016; 7,
Zuiderwijk, A., Janssen, M., Poulis, K., & van de Kaa, G. (2015). Open data for competitive advantage: Insights from open data use by companies. Proceedings of the 16th Annual International Conference on Digital Government Research, 79–88. https://doi.org/10.1145/2757401.2757411
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.