The rapid digitization in the health care sector leads to an increaseof data. This routine collected data in the form of electronic healthrecords (EHR) is not only used by medical professionals but also hasa secondary purpose: health care research. It can be opportune touse this EHR data for predictive modeling in order to support medi-cal professionals in their decisions. However, using routine collecteddata (RCD) often comes with subtle biases that might risk efficientlearning of predictive models. In this thesis the effects of RCD on theprediction performance are reviewed.In particular we thoroughly investigate and reason if the performanceof particular prediction models is consistent over a range of hand-crafted sub-populations within the data.Evidence is presented that the overall prediction score of the algo-rithms trained by EHR significantly differ for some groups of patientsin the data. A method is presented to give more insight why thesegroups of patients have different scores.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hh-39960 |
Date | January 2019 |
Creators | Boonen, Dries |
Publisher | Högskolan i Halmstad |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0022 seconds