Content area
Abstract
Big Data storage provides a repository for sizeable data sets empowering the potential of big data for myriad applications. Big Data storage requires the infrastructure to provide reliable data storage space while delivering access for query and analysis to realize Big Data value fully. Distributed File Systems lie at the core of many big data repositories. In large Distributed File Systems, servers numbering in the thousands will host directly attached storage executing user application requirements; as a result, the physical data volume search space can be immense and dispersed. The increased workload burden to search and identify data evidence in an enormous space imposes constraints upon data recovery in digital forensic investigations. This research proposes to formally model high-level events of the Hadoop Distributed File System abstraction layer to generate persistent properties of low-level events. By this method, digital forensic concepts of information extraction and analysis are effectively shifted forward in the investigative process to facilitate digital forensic triage. Through the development of a formal digital forensic investigative concept model, the high-level event reconstruction presents as an integration of digital forensic triage within a general digital forensic investigative process. Reconstruction of low-level events is generally reserved for digital forensic investigative phases after the potential data evidence is identified and collected. By exploiting data structure materialization ordering, it is possible to generate knowledge base facts utilized to form and test hypotheses about previous states; this enables the establishment of high-level event occurrences that, in turn, provide an understanding of low-level event facts. In applying a high-level reconstruction to emulate low-level forensic tool output, the proposed approach provides information that is ordinarily available only later in the general investigative process. Generating this logical understanding of low-level data storage properties helps to narrow the search space to identify relevant data volume, generates event timelines without reliance on physical clocks, and provides knowledge of facts supporting investigative phases holistically. The early information extraction moderates the overall investigative process latency through comprehensive investigative usefulness of the early information production.