Return to search

Text analytics to predict time and cause of death from verbal autopsies

This thesis describes the first Text Analytics approach to predicting Causes of Death (CoD) from Verbal Autopsies (VA). VA is an alternative technique recommended by the World Health Organisation for ascertaining CoD in low and middle-income countries (LMIC). CoD information is vitally important in the provision of healthcare. CoD information from VA can be obtained via two main approaches: manual, also referred to as the physician-review and automatic. The automatic-based approach is an active research area due to its efficiency and cost effectiveness over the manual approach. VA contains both closed responses and open narrative text. However, the open narrative text has been ignored by the state-of-art automatic approaches and this remains a challenge and an important research issue. We hypothesise that it is feasible to predict CoD from the narratives of VA. We further contend that an automatic approach that could utilise the information contained in both narrative and closed response text of VA could lead to an improved prediction accuracy of CoD. This research has been formulated as a Text Classification problem, which employs Corpus and Computational Linguistics, Natural Language Processing and Machine Learning techniques to automatically classify VA documents according to CoD. Firstly, the research uses a VA corpus built from a sample collection of over 11,400 VA documents collected during a 10 year period in Ghana, West Africa. About 80 per cent of these documents have been annotated with CoD by medical experts. Secondly, we design experiments to identify Machine Learning techniques (algorithm, feature representation scheme, and feature reduction strategy) suitable for classifying VA open narratives (VAModel1). Thirdly, we propose novel methods of extracting features to build a model that predicts CoD from VA narratives using the annotated VA corpus as training and testing set. Furthermore, we develop two additional models: only closed responses based (VAModel2); and a hybrid of closed and open narrative based model (VAModel3). Our VAModel1 performs reasonably better than our baseline model, suggesting the feasibility of predicting the CoD from the VA open narratives. Overall, VAModel3 performance was observed to achieve better performance than VAModel1 but not significantly better than VAModel2. Also, in terms of reliability, VAModel1 obtained a moderate agreement (kappa score = 0.4) when compared with the gold standard– medical experts (average annotation agreement between medical experts, kappa score= 0.64). Furthermore, an acceptable agreement was obtained for VAModel2 (kappa score =0.71) and VAModel3 (kappa score =0.75), suggesting the reliability of these two models is better than medical experts. Also, a detailed analysis suggested that combining information from narratives and closed responses leads to an increase in performance for some CoD categories whereas information obtained from the closed responses part is enough for other CoD categories. Our research provides an alternative automatic approach to predicting CoD from VA, which is essential for LMIC. Therefore, further research into various aspects of the modelling process could improve the current performance of automatically predicting CoD from VAs.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:684501
Date January 2015
CreatorsDanso, Samuel Odei
ContributorsAtwell, Eric ; Johnson, Owen
PublisherUniversity of Leeds
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://etheses.whiterose.ac.uk/12400/

Page generated in 0.0123 seconds