Content area
Full Text
Abstract- this study proposed an idea of multi-class classifier to predict intentional fraudulent financial restatement, unintentional financial restatement and normal financial statement. Most prior studies on detection of financial fraudulent restatement use balanced dataset. They only focused on the fraudulent case and ignored the unintentional case. This study employs a large imbalanced dataset that includes 70781 financial statement from 5962 companies. It includes 596 fraudulent cases, 14751 unintentional cases and 55434 normal cases. In this study, dataset is preprocessed and different feature selection algorithms, resample algorithms and different data mining techniques will be applied in the future study.
Index Terms-Financial restatement, multi-class classifiers, large imbalanced dataset.
I.Introduction
INANCIAL restatement, which can erodes investors' confidence on the companies, has receiving increased attention from both companies and investors. Financial restatements happen when firms make errors on their financial statements. There are two types of financial restatement: unintentional restatement and intentional restatement. The unintentional restatement is caused by the unintentional errors in financial statements. The intentional fraudulent restatement is caused by the intentional error like altering financial data to misguide market participants. Most prior researches focused on predictive model of fraudulent financial restatement and ignored the unintentional restatement. There is few research concentrate on unintentional restatement [3,7]. Dutta et al. [7] developed a predictive model to classify restatement (both fraudulent and unintentional) and normal statement. To the best of our knowledge, there is only one literature aims to classify fraudulent restatement, unintentional restatement and normal statement. In this research, they implemented a three-class financial statement fraud detection model which detects intentional misstatement, unintentional misstatement and non-fraud statement [3]. It used Accounting and Auditing Enforcement Releases (AAER) database. The restatement data from this database is reliable as all the data is investigated by Security Exchange Commission (SEC). However, this leads to an issue that many restatement data are excluded in this database as SEC review one-third of public companies' financial statements per year [11]. In our research, we uses Audit Analytics database, a commercial database that collects all the financial restatements from 1994 to current period. We use Audit Analytics database to get all the financial restatement data from 1994 to 2018. Meanwhile, we use Compustat database to get normal statement. The number of intentional fraudulent restatement is...