Content area
Full Text
Government entities tasked with both regulatory enforcement and data analysis have an increasing number of data sources at their disposal. However, the data now at their fingertips can be increasingly complex, unstructured and unmanageable. To effectively manage raw data that may be coming from different data streams, a systematic approach needs to be taken for projects that require data modeling. Predictive analytics is a process that encompasses a series of methodologies that can successfully manage large-scale, data-driven problems that many government entities face. It is an iterative process that meshes the statistical methods of sampling, model estimation, model prediction and evaluation to form a cohesive system for targeting fraud, waste, abuse and other outcomes of interest to government agencies. Predictive analytics is a powerful tool that can assist agencies with decision- and policy-making in areas ranging from audit selection to regulatory enforcement.
Firms are implementing varied predictive analytics systems in both the federal and private sectors to a great degree of success. In the federal sector, structured modeling techniques have been used as predictive analytics solutions. Examples of this include a probabilistic simulation tool that determines the impact of various economic scenarios on mortgage insurance fund performance, a multistate model that predicts the number of defaulted loans in a federal credit agency's large loan portfolio, and a risk-ranking model that assists with enforcement of regulatory compliance by ranking enforcement subjects in terms of probability of compliance violation. Private-sector examples of predictive analytics include more unstructured work in text mining analytics to assist a document management corporation in categorizing its collection of written and scanned documents. Methodologies in sampling, model estimation and evaluation differ greatly in each of the aforementioned examples. However, it is clear that data and the effective use of data are leading to an increase in efficiency and transparency in both the public and private sectors.
SAMPLING AND DATA COLLECTION
Data collection is a key first step of predictive analytics. Many of the data sources from administrative records are triggered by specific events, which could be unrepresentative of the underlying population at large. Crime rate statistics are a good example. The identification of a crime presupposes an investigation. To the extent that investigations are uneven across crimes, crime statistics are biased toward...