This thesis evaluates if and how drug name recognition can be used to find drug names in verbatims from reports on concomitant medication in clinical trial studies. In clinical trials, reports on concomitant medication are written if a trial participant takes other drugs than the studied drug. This information needs to be coded to a drug reference dictionary. Coded verbatims were used to create the data needed to train the drug name recognition models in this thesis. Labels for where in each verbatim the coded drugs name was, were created using a Levensthein distance. The drug name recognition models were trained and tested on verbatims with labels. Drug name recognition was performed using a logistic regression model and a bidirectional long short-term memory model. The bidirectional long short-term memory model performed the best result with an F1 score of 82.5% on classifying which words in the verbatims that were drug names. When the results were studied from case to case, they showed that the bidirectional long short-term memory classifications sometimes outperformed labels it was trained on in single word verbatims. The model was also tested on manually labelled golden standard data where it performed an F1-score of 46.4%. The results indicate that a bidirectional long short-term memory model can be implemented for drug name recognition, but that label reliability is an issue in this thesis.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-388687 |
Date | January 2019 |
Creators | Gräns, Arvid |
Publisher | Uppsala universitet, Avdelningen för systemteknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC STS, 1650-8319 ; 19034 |
Page generated in 0.0018 seconds