The number of publications in the biomedical field increases exponentially, which makes the task of keeping up with current research more and more difficult. However, rapid advances in the field of Natural Language Processing (NLP) offer possible solutions to this problem. In this thesis we focus on investigating three main questions of importance for utilizing the field of NLP, or more specifically the two subfields Named Entity Recognition (NER) and Large Language Models (LLM), to help solve this problem. The questions are; comparing LLM performance to NER models on NER-tasks, the importance of normalization, and how the analysis is affected by the availability of data. We find for the first question that the two models offer a reasonably comparable performance for the specific task we are looking at. For the second question, we find that normalization plays a substantial role in improving the results for tasks involving data synthesis and analysis. Lastly, for the third question, we find that it is important to have access to full papers in most cases since important information can be hidden outside of the abstracts.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-222394 |
Date | January 2024 |
Creators | Törnkvist, Betty |
Publisher | Umeå universitet, Institutionen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UMNAD ; 1453 |
Page generated in 0.002 seconds