Electronic medical record databases (e.g. the Clinical Practice Research Datalink, CPRD) are increasingly used in epidemiological research. The CPRD has two formats of data: coded, which is the sole format used in almost all research; and free-text (or ‘hidden’), which may contain much clinical information but is generally unavailable to researchers. This thesis examines the ramifications of omitting free-text records from research. Cases with bladder (n=4,915) or pancreatic (n=3,635) cancer were matched to controls (n=21,718, bladder; n=16,459, pancreas) on age, sex and GP practice. Coded and text-only records of attendance for haematuria, jaundice and abdominal pain in the year before cancer diagnosis were identified. The number of patients whose entire attendance record for a symptom/sign existed solely in the text was quantified. Associations between recording method (coded or text-only) and case/control status were estimated (χ2 test). For each symptom/sign, the positive predictive value (PPV, Bayes' Theorem) and odds ratio (OR, conditional logistic regression) for cancer were estimated before and after supplementation with text-only records. Text-only recording was considerable, with 7,951/20,958 (37%) of symptom records being in that format. For individual patients, text-only recording was more likely in controls (140/336=42%) than cases (556/3,147=18%) for visible haematuria in bladder cancer (χ2 test, p<0.001), and for jaundice (21/31=67% vs 463/1,565=30%, p<0.0001) and abdominal pain (323/1,126=29% vs 397/1,789=22%, p<0.001) in pancreatic cancer. Adding text records reduced PPVs of visible haematuria for bladder cancer from 4.0% (95% CI: 3.5–4.6%) to 2.9% (2.6–3.2%) and of jaundice for pancreatic cancer from 12.8% (7.3–21.6%) to 6.3% (4.5–8.7%). Coded records suggested that non-visible haematuria occurred in 127/4,915 (2.6%) cases, a figure below that generally used for study. Supplementation with text-only records increased this to 312/4,915 (6.4%), permitting the first estimation of its OR (28.0, 95% CI: 20.7–37.9, p<0.0001) and PPV (1.60%, 1.22–2.10%, p<0.0001) for bladder cancer. The results suggest that GPs make strong clinical judgements about the probable significance of symptoms – preferentially coding clinical features they consider significant to a diagnosis, while using text to record those that they think are not.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:700175 |
Date | January 2016 |
Creators | Price, Sarah Jane |
Contributors | Hamilton, William ; Stapley, Sal ; Barraclough, Kevin |
Publisher | University of Exeter |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://hdl.handle.net/10871/21692 |
Page generated in 0.0071 seconds