Content area
Full text
The predictive analytics landscape covers a wide variety of techniques and methods designed to derive insights from data. These techniques, which include statistical modeling methods, classification rules, forecasting techniques, simulation models, machine learning tools, and so on, have been used successfully for many years on structured data (data that consists of numeric or categorical attributes, where the number of categories is limited). In recent times, the volume and variety of data available for analysis has exploded, and most of this data is in non-traditional forms, which the traditional techniques were not designed to handle.
This article describes how you can transform non-traditional data, such as unstructured data (text) or semi-structured data (networks), into a structured form that you can then use to augment traditional data. Combining both types of data provides greater opportunities for actionable insight.
Text Data
Traditional predictive modeling tools use structured data to predict a response variable, such as the likelihood of responding to a credit card offer, the probability of defaulting on a loan, or the possibility of reacting adversely to a drug treatment. Often, these applications include many sources of unstructured data that, until recently, have gone untapped. One of the most commonly available forms of such data is textual data, such as call center notes, warranty claims, survey responses, social media data, and blogs and tweets about new product releases.
An illustrative example of how such text...





