Return to search

Statistical Tools for Efficient Confirmation of Diagnosis in Patients with Suspected Primary Central Nervous System Vasculitis

The management of missing data is a major concern in classification model generation in all fields but poses a particular challenge in situations where there is only a small quantity of sparse data available. In the field of medicine, this is not an uncommon problem. While widely subscribed methodologies like logistic regression can, with minor modifications and potentially much labor, provide reasonable insights from the larger and less sparse datasets that are anticipated when analyzing diagnosis of common conditions, there are a multitude of rare conditions of interest. Primary angiitis of the central nervous system (PACNS) is a rare but devastating entity that given its range of presenting symptoms can be suspected in a variety of circumstances. It unfortunately continues to be a diagnosis that is hard to make. Aside from some general frameworks, there isn’t a rigorously defined diagnostic approach as is the case in other more common neuroinflammatory conditions like multiple sclerosis. Instead, clinicians currently rely on experience and clinical judgement to guide the reasonable exclusion of potential inciting entities and mimickers. In effect this results in a smaller quantity of heterogenous that may not optimally suited for more traditional classification methodology (e.g., logistic regression) without substantial contemplation and justification of appropriate data cleaning / preprocessing. It is therefore challenging to make and analyze systematic approaches that could direct clinicians in a way that standardizes patient care.

In this thesis, a machine learning approach was presented to derive quantitatively justified insights into the factors that are most important to consider during the diagnostic process to identify conditions like PACNS. Modern categorization techniques (i.e., random forest and support vector machines) were used to generate diagnostic models identifying cases of PACNS from which key elements of diagnostic importance could be identified. A novel variant of a random forest (RF) approach was also demonstrated as a means of managing missing data in a small sample, a significant problem encountered when exploring data on rare conditions without clear diagnostic frameworks. A reduced need to hypothesize the reasons for missingness when generating and applying the novel variant was discussed. The application of such tools to diagnostic model generation of PACNS and other rare and / or emerging diseases and provide objective feedback was explored. This primarily centered around a structured assessment on how to prioritize testing to rapidly rule out conditions that require alternative management and could be used to support future guidelines to optimize the care of these patients.

The material presented herein had three components. The first centered around the example of PACNS. It described, in detail, an example of a relevant medical condition and explores why the data is both rare and sparse. Furthermore, the reasons for the sparsity are heterogeneous or non-monotonic (i.e., not conducive to modelling with a singular model). This component concludes with a search for candidate variables to diagnose the condition by means of scoping review for subsequent comparative demonstration of the novel variant of random forest construction that was proposed. The second component discussed machine learning model development and simulates data with varying degrees and patterns of missingness to demonstrate how the models could be applied to data with properties like what would be expected of PACNS related data. Finally, described techniques were applied to separate a subset of patients with suspected PACNS from those with diagnosed PACNS using institutional data and proposes future study to expand upon and ultimately verify these insights. Further development of the novel random forest approach is also discussed.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/44862
Date27 April 2023
CreatorsBrooks, John
ContributorsRamsay, Timothy, Milman, Nataliya
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Page generated in 0.002 seconds