Return to search

Predicting Substantiation of Office of Inspector General Investigations Using Multinomial Naïve Bayes and Natural Language Processing

Low substantiation rates are pervasive across the federal Office of Inspector General (OIG) community due to high levels of uncertainty and limited data availability at the time of case selection. OIG management often selects cases based on intuition and past experience. Intuitive project selection has proven unsuccessful because the methods are often subjective, prone to bias, and lead to error. The high uncertainty surrounding case selection and the current selection method employed by OIG management teams results in a significant loss of investigative resources spent on unsubstantiated cases. This research presents a novel approach to predict OIG investigative case substantiation using natural language processing techniques and multinomial naïve Bayes to retrieve information from complaint intakes. It aims to improve OIG substantiation rates and reduce the cost associated with unsubstantiated cases. The model developed in this study significantly outperformed OIG management and was 20% more accurate in the prediction of substantiated and unsubstantiated cases. This model will augment investigative case selection and improve investigative targeting, increase impact of investigative work, and improve OIG investigative resource allocation. Its application will result in a significant savings by reducing the resources dedicated to cases with a low probability of substantiation.

Identiferoai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:28256297
Date01 January 2021
CreatorsStarr, Alexis V.
PublisherThe George Washington University
Source SetsProQuest.com
LanguageEnglish
Detected LanguageEnglish
Typethesis

Page generated in 0.002 seconds