Content area

Abstract

Nowadays, big data is a key component in (bio)medical research. However, the meaning of the term is subject to a wide array of opinions, without a formal definition. This hampers communication and leads to missed opportunities. For example, in the (bio)medical field we have observed many different interpretations, some of which have a negative connotation, impeding exploitation of big data approaches. In this paper we pursue a better understanding of the term big data through a data-driven systematic approach using text analysis of scientific (bio)medical literature. We attempt to find how existing big data definitions are expressed within the chosen application domain. We build upon findings of previous qualitative research by De Mauro et al. (Lib Rev 65: 122–135, 14), which analysed fifteen definitions and identified four key big data themes (i.e., information, methods, technology, and impact). We have revisited these and other definitions of big data, and consolidated them into eight additional themes, resulting in a total of twelve themes. The corpus was composed of paper abstracts extracted from (bio)medical literature databases, searching for ‘big data’. After text pre-processing and parameter selection, topic modelling was applied with 25 topics. The resulting top-20 words per topic were annotated with the twelve big data themes by seven observers. The analysis of these annotations show that the themes proposed by De Mauro et al. are strongly expressed in the corpus. Furthermore, several of the most popular big data V’s (i.e., volume, velocity, and value) also have a relatively high presence. Other V’s introduced more recently (e.g. variability) were however hardly found in the 25 topics. These findings show that the current understanding of big data within the (bio)medical domain is in agreement with more general definitions of the term.

Details

Title
Understanding big data themes from scientific biomedical literature through topic modeling
Author
van Altena, Allard J 1 ; Moerland, Perry D 1 ; Zwinderman, Aeilko H 1 ; Olabarriaga, Sílvia D 1 

 Department of Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands 
Pages
1-21
Publication year
2016
Publication date
Nov 2016
Publisher
Springer Nature B.V.
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1987907331
Copyright
Journal of Big Data is a copyright of Springer, (2016). All Rights Reserved.