Content area
Full text
Big data is ubiquitous; when used by governments, it can have significant, sometimes existential consequences for individuals, communities, or even the nation. This article is adapted from a study of the ethical challenges faced by institutions-particularly agencies of the government-in reconciling the diverse legal obligations to agency "missions" with sometimes conflicting requirements of the Federal Privacy Act and other statutory and policy protections and overarching ethical obligations confronting stewards of big data sets and their counsel. What considerations apply to tools and techniques used to analyze this data? How should big data stewards address these differing requirements and obligations associated with the data life cycle-the collection, stewardship, use, and dissemination of data, particularly through passive collection and predictive algorithms and other machine learning tools?
In 2011, big data was listed on Gartner's Hype Cycle for Emerging Trends1 in the "On the Rise" category. In 2015, big data had dropped off Gartner's Emerging Trends Hype Cycle altogether.2 According to Betsy Burton, who authored the 2015 study, "big data has quickly moved over the hype curve's 'Peak of Inflated Expectations' . . . and has become prevalent in our lives. . . ."3 Although some believe this was a premature declaration, data science based on big data is playing an increasingly more important role in organizations' efforts such as to better understand the habits of their customers or users and to increase operational efficiencies.
What is big data? Although big data does not have an authoritative definition, there are three commonly agreed-upon attributes:
* Volume-it has a large amount of data.
* Velocity-the data is ingested rapidly; in some cases, in real time.
* Variety-the data has varying types (e.g., structured and unstructured) and may come from disparate sources.
The volume, velocity, and variety of the data make it unsuitable for processing by traditional relational database applications. This article is focused on practices with the following three characteristics:
* Big data (defined above) is processed using big data tools (e.g., Hadoop).
* The data has personal information.
* Algorithms and modeling are used to derive "hidden, meaningful" information from the data. To clarify, an algorithm is comprised of a set of rules that need to be followed in order to solve a problem. A model is...