Content area
Making data open and reusable is a central but challenging-to-achieve goal of open data initiatives. Data reuse is often regarded as an important, value-realizing stage of the research data life cycle, and reusability is a core element of the FAIR data principles, which is widely a adopted framework for data management and stewardship. However, it is essential to understand researchers’ actual data reuse practices in order to promote effective data sharing and reuse across the research communities. Although prior studies have examined these practices in fields with well-established data sharing cultures, including astronomy, earth and environmental sciences, and parts of the social sciences, many other disciplines remain unexplored. There is a growing need for research that explores data reuse practices more systematically and in greater depth across a broader range of fields, especially those also with longstanding traditions of data sharing and reuse.
This dissertation examines the landscape of data reuse practices within the Information Retrieval (IR) research community. IR has a long history of reusing shared data and providing system-based experimental data for reuse, but its data practices have never been systematically studied, and it presented an interesting case because it presented an research domain that involves researchers trained in a wide range of disciplines. Drawing on two rounds of semi-structured interviews with 36 participants, this study explores the purposes of data reuse, the ways researchers discover and access data, the incentives and disincentives influencing sharing and reuse, the decision-making processes involved, and the broader practices surrounding data reuse. It identifies the three primary purposes for which IR researchers reuse data, exploratory purposes, verificatory purposes, and preparatory purposes, thereby broadening the typological framework for understanding why researchers reuse others' data. Regarding data discovery and access, the study finds that IR researchers primarily operate at the individual level, relying on heterogeneous practices shaped by their research areas, institutional affiliations, and disciplinary backgrounds. It further demonstrates that data reuse decisions are not made in a single step but instead unfold through a multi-stage process. Five key stages of decision-making are identified: methodology appropriateness evaluation, trustworthiness evaluation, reusability screening, further reusability evaluation, and compliance evaluation. Moreover, this study highlights how disciplinary context influences researchers’ approaches to data reuse. Through this analysis, it contributes to the studies on data sharing and reuse by demonstrating that data reuse behaviors are shaped not only by individual preferences and risk assessments, but also by collective consensus, incentives, and the epistemic norms of researchers’ communities.
This work aims to encourage scholars, practitioners, and infrastructure designers to engage with efforts to foster a sustainable culture of data reuse. Continued research in this area is critical for developing the protocols, standards, and knowledge infrastructures necessary to support seamless and meaningful data sharing and reuse, not only within IR but across diverse research communities. The dissertation ends by offering four directions for future research. These include studying how community-level factors influence data reuse practices at individual level, examining how researchers search for and discover data, expanding the focus beyond research data to consider other shareable resources such as code, experimental designs, and AI models, as well as exploring how AI technologies are changing data practices.