Return to search

Extracting Detailed Tobacco Exposure From The Electronic Health Record

Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status (ever-smoker vs. never-smoker) from electronic health record data, but no system to date extracts detailed smoking data needed to assess a patientâs eligibility for lung cancer screening. Here we describe the Smoking History And Pack-year Extraction System (SHAPES), a rules-based, NLP system to quantify tobacco exposure from electronic clinical notes.
SHAPES was developed based on 261 patient records with 9,573 clinical notes and validated on 352 randomly selected patient records with 4,040 notes. F-measures for never-smoking status, ever-smoking status, rate of smoking, duration of smoking, quantity of cigarettes, and years quit were 0.86, 0.82, 0.79, 0.62, 0.64, and 0.61, respectively. Sixteen of 22 individuals eligible for lung cancer screening were identified (precision = 0.94, recall = 0.73).
SHAPES was compared to a previously validated smoking classification system using a phenome wide association study (PheWAS). SHAPES predicted similar significant associations with 66% less sample size (10,000 vs. 35,788), and detected 411 (268%) more associations in the full dataset than when using just ever/never smoking status.
Using smoking data from SHAPES, a smoking genome by environment interaction study found 57 statistically significant interactions between smoking and diseases including previously describes interactions between ischemic heart disease and rs1746537, obesity and rs10871777, and type 2 diabetes and rs2943641.
These studies support the use of SHAPES for lung cancer screening and other research requiring quantitative smoking history. External validation needs to be performed prior to implementation at other medical centers.

Identiferoai:union.ndltd.org:VANDERBILT/oai:VANDERBILTETD:etd-07142017-193150
Date09 August 2017
CreatorsOsterman, Travis John
ContributorsJosh Denny, M.D., M.S., Mia Levy, M.D., Ph.D., Pierre Massion, M.D.
PublisherVANDERBILT
Source SetsVanderbilt University Theses
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.library.vanderbilt.edu/available/etd-07142017-193150/
Rightsrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Vanderbilt University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.0219 seconds