Content area
Abstract
Nowadays, big data is a key component in (bio)medical research. However, the meaning of the term is subject to a wide array of opinions, without a formal definition. This hampers communication and leads to missed opportunities. For example, in the (bio)medical field we have observed many different interpretations, some of which have a negative connotation, impeding exploitation of big data approaches. In this paper we pursue a better understanding of the term big data through a data-driven systematic approach using text analysis of scientific (bio)medical literature. We attempt to find how existing big data definitions are expressed within the chosen application domain. We build upon findings of previous qualitative research by De Mauro et al. (Lib Rev 65: 122–135, 14), which analysed fifteen definitions and identified four key big data themes (i.e., information, methods, technology, and impact). We have revisited these and other definitions of big data, and consolidated them into eight additional themes, resulting in a total of twelve themes. The corpus was composed of paper abstracts extracted from (bio)medical literature databases, searching for ‘big data’. After text pre-processing and parameter selection, topic modelling was applied with 25 topics. The resulting top-20 words per topic were annotated with the twelve big data themes by seven observers. The analysis of these annotations show that the themes proposed by De Mauro et al. are strongly expressed in the corpus. Furthermore, several of the most popular big data V’s (i.e., volume, velocity, and value) also have a relatively high presence. Other V’s introduced more recently (e.g. variability) were however hardly found in the 25 topics. These findings show that the current understanding of big data within the (bio)medical domain is in agreement with more general definitions of the term.
Details
1 Department of Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands




