Return to search

Comparing Human and Machine Learning Classification of Human Factors in Incident Reports From Aviation

Incident reporting systems are an integral part of any organization seeking to increase the safety of their operation by gathering data on past events, which can then be used to identify ways of mitigating similar events in the future. In order to analyze trends and common issues with regards to the human element in the system, reports are often classified according to a human factors taxonomy. Lately, machine learning algorithms have become popular tools for automated classification of text; however, performance of such algorithms varies and is dependent on several factors. In supervised machine learning tasks such as text classification, the algorithm is trained with features and labels, where the features here are a function of the incident reports themselves and the labels are supplied by a human annotator, whether that is the reporter or a third person. Aside from the intricacies of building and tuning machine learning models, a subjective classification according to a human factors taxonomy can generate considerable noise and bias. I examined the interdependencies between the features of incident reports, the subjective labeling process, the constraints that the taxonomy itself imposes, and basic characteristics of human factors taxonomies that can influence human, as well as automated, classification. In order to evaluate these challenges, I trained a machine learning classifier on 17,253 incident reports from the NASA Aviation Safety Reporting System (ASRS) using multi-label classification, and collected labels from six human annotators for a subset of 400 incident reports each, resulting in a total of 2,400 individual annotations. Results show that, in general, reliability of annotation for the set of incident reports selected in this study was comparatively low. It was also evident that some human factors labels were more agreed upon than others, sometimes related to the presence of key words in the reports which map directly to the label. Performance of machine learning annotation followed patterns of human agreement on labels. The high variability of content and quality of narratives has been identified as a major factor for difficulties in annotation. Suggestions on how to improve the data collection and labeling process are provided.

Identiferoai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd2020-1330
Date01 January 2020
CreatorsBoesser, Claas Tido
PublisherSTARS
Source SetsUniversity of Central Florida
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceElectronic Theses and Dissertations, 2020-

Page generated in 0.1897 seconds