Return to search

Selected impacts of missing data problem in economics

Data sources and data quality are indispensable in economical, medical, pharmaceutical or other studies and provide the basis for reliable study results in numerous research questions. Depending on the purpose of use, a high quality of data is a prerequisite. However, with increasing registry quality, costs also increase accordingly. Considering these time and cost consuming factors, this work is an attempt to estimate the cost advantages when applying statistical tools to existing registry data. This includes methodological considerations and suggestions regarding the evaluation of data quality including factors such as bias and reliability after dealing properly (or not) with missing data (MD), and possible consequences when ignoring the incompleteness of data. Results for the quality analysis of the gastric cancer patients’ data example showed that millions of Euros in study costs can be saved by reducing the time horizon. On average, €523,126.70 can be saved for every year that the study duration is shortened. Replacing additionally the over 25% of MD in some variables, data quality was immensely improved, but still showed quality difficulties, which – beside MD in variables – could be an indication for completely missing entries of patients in the registry. Capturerecapture methods were therefore discussed to demonstrate how the total completeness in a registry can be estimated. Since it was not possible to illustrate the CARE method with the example of the gastric cancer patients due to the given data structure (no access to required variables), other data sets had to be chosen – the publicly accessible data of the amyotrophic lateral sclerosis (ALS) and data of towed vehicles in the City of Chicago. The consequence of ignoring MD was further analyzed using bankruptcy prediction data sets of agribusiness companies and confirmed the assumption that MD have a negative impact on the data quality, in this case also regarding the misclassifications of predictions of bankrupted companies. Using the decision tree method (known as one of the most suitable methods in predicting financial distress), the percentage of correctly bankruptcy-predicted of bankrupted companies (one year to bankruptcy) with MD imputation was 87.5%, whereas it was only 60% when completely omitting MD. Overall, my findings showed dearly the importance of statistical methods to improve data quality which in turn helps to avoid drawing biased conclusions due to incomplete data.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:425121
Date January 2018
CreatorsUenal, Hatice
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0093 seconds