Global ETD Search

Return to search

Data Mining Techniques to Identify Financial Restatements

Data mining is a multi-disciplinary field of science and technology widely used in developing predictive models and data visualization in various domains. Although there are numerous data mining algorithms and techniques across multiple fields, it appears that there is no consensus on the suitability of a particular model, or the ways to address data preprocessing issues. Moreover, the effectiveness of data mining techniques depends on the evolving nature of data. In this study, we focus on the suitability and robustness of various data mining models for analyzing real financial data to identify financial restatements. From data mining perspective, it is quite interesting to study financial restatements for the following reasons: (i) the restatement data is highly imbalanced that requires adequate attention in model building, (ii) there are many financial and non-financial attributes that may affect financial restatement predictive models. This requires careful implementation of data mining techniques to develop parsimonious models, and (iii) the class imbalance issue becomes more complex in a dataset that includes both intentional and unintentional restatement instances. Most of the previous studies focus on fraudulent (or intentional) restatements and the literature has largely ignored unintentional restatements. Intentional (i.e. fraudulent) restatements instances are rare and likely to have more distinct features compared to non-restatement cases. However, unintentional cases are comparatively more prevalent and likely to have fewer distinct features that separate them from non-restatement cases. A dataset containing unintentional restatement cases is likely to have more class overlapping issues that may impact the effectiveness of predictive models. In this study, we developed predictive models based on all restatement cases (both intentional and unintentional restatements) using a real, comprehensive and novel dataset which includes 116 attributes and approximately 1,000 restatement and 19,517 non-restatement instances over a period of 2009 to 2014. To the best of our knowledge, no other study has developed predictive models for financial restatements using post-financial crisis events. In order to avoid redundant attributes, we use three feature selection techniques: Correlation based feature subset selection (CfsSubsetEval), Information gain attribute evaluation (InfoGainEval), Stepwise forward selection (FwSelect) and generate three datasets with reduced attributes. Our restatement dataset is highly skewed and highly biased towards non-restatement (majority) class. We applied various algorithms (e.g. random undersampling (RUS), Cluster based undersampling (CUS) (Sobhani et al., 2014), random oversampling (ROS), Synthetic minority oversampling technique (SMOTE) (Chawla et al., 2002), Adaptive synthetic sampling (ADASYN) (He et al., 2008), and Tomek links with SMOTE) to address class imbalance in the financial restatement dataset. We perform classification employing six different choices of classifiers, Decision three (DT), Artificial neural network (ANN), Naïve Bayes (NB), Random forest (RF), Bayesian belief network (BBN) and Support vector machine (SVM) using 10-fold cross validation and test the efficiency of various predictive models using minority class recall value, minority class F-measure and G-mean. We also experiment different ensemble methods (bagging and boosting) with the base classifiers and employ other meta-learning algorithms (stacking and cost-sensitive learning) to improve model performance. While applying cluster-based undersampling technique, we find that various classifiers (e.g. SVM, BBN) show a high success rate in terms of minority class recall value. For example, SVM classifier shows a minority recall value of 96% which is quite encouraging. However, the ability of these classifiers to detect majority class instances is dismal. We find that some variations of synthetic oversampling such as ‘Tomek Link + SMOTE’ and ‘ADASYN’ show promising results in terms of both minority recall value and G-mean. Using InfoGainEval feature selection method, RF classifier shows minority recall values of 92.6% for ‘Tomek Link + SMOTE’ and 88.9% for ‘ADASYN’ techniques, respectively. The corresponding G-mean values are 95.2% and 94.2% for these two oversampling techniques, which show that RF classifier is quite effective in predicting both minority and majority classes. We find further improvement in results for RF classifier with cost-sensitive learning algorithm using ‘Tomek Link + SMOTE’ oversampling technique. Subsequently, we develop some decision rules to detect restatement firms based on a subset of important attributes. To the best of our knowledge, only Kim et al. (2016) perform a data mining study using only pre-financial crisis restatement data. Kim et al. (2016) employed a matching sample based undersampling technique and used logistic regression, SVM and BBN classifiers to develop financial restatement predictive models. The study’s highest reported G-mean is 70%. Our results with clustering based undersampling are similar to the performance measures reported by Kim et al. (2016). However, our synthetic oversampling based results show a better predictive ability. The RF classifier shows a very high degree of predictive capability for minority class instances (97.4%) and a very high G-mean value (95.3%) with cost-sensitive learning. Yet, we recognize that Kim et al. (2016) use a different restatement dataset (with pre-crisis restatement cases) and hence a direct comparison of results may not be fully justified. Our study makes contributions to the data mining literature by (i) presenting predictive models for financial restatements with a comprehensive dataset, (ii) focussing on various datamining techniques and presenting a comparative analysis, and (iii) addressing class imbalance issue by identifying most effective technique. To the best of our knowledge, we used the most comprehensive dataset to develop our predictive models for identifying financial restatement.

Data mining

Class Imbalance

Financial restatement

Identifer	oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/37342
Date	27 March 2018
Creators	Dutta, Ila
Contributors	Raahemi, Bijan
Publisher	Université d'Ottawa / University of Ottawa
Source Sets	Université d’Ottawa
Language	English
Detected Language	English
Type	Thesis
Format	application/pdf

Page generated in 0.003 seconds

Data Mining Techniques to Identify Financial Restatements

Description

Links & Downloads

Tags

Additional Fields