Content area
Full text
Data analytics is playing a central role in deriving useful information from large amounts of data available online in a variety of domains and applications. Analytics employs a wide array of methods ranging from classical statistical techniques to those exploiting the visual and cognitive capabilities of human users. In spite of all its capabilities, analytics at present seems to suffer from significant limitations in dealing with unstructured data and knowledge. This article explores the limitations and defines key requirements to be met by future developments in analytics. The article concludes with a sketch of true knowledge analytics which is capable of delivering insights from knowledge structures, not just tabular data.
Keywords: Analytics, Unstructured data, Knowledge, Capabilities, Limitations
1 INTRODUCTION
One of the key benefits of computerization comes from ready access to data. Extracting useful information from data has been a challenge for the field of information science. In the early stages of computerized data processing, data analysis was carried out in both scientific and business computing by applying well-known methods of statistics. Although many tools were developed, data analysis required expertise in both statistics and data processing. With the advent of the World Wide Web, social media and ubiquitous on-line access to data through personal computing devices and smart phones, data analytics is within the reach of everybody. Further advancements in programming and graphics technologies have made it possible to run analytical methods on current data and to generate colorful graphical renderings in real time. Open data initiatives of governments as well as non-governmental organizations have further democratized access to data by placing massive data repositories in the public domain. Ordinary citizens are beginning to look for useful trends, patterns and insightful guidance from such data sources.
The key question is whether analytics is ready to meet this challenge. Analytics thus far has focused mainly on well managed structured data, that is, collections of pieces of information in well defined formats found typically in spreadsheets or databases. Such well structured data can be readily classified into nominal, interval, ordinal and ratio data making data amenable to classical statistical analysis techniques. Unstructured data comprises of content in varied formats such as documents, images, emails, tweets, videos, blog posts and so on. Unstructured data is...





