• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Finding Synonyms in Medical Texts : Creating a system for automatic synonym extraction from medical texts

Cederblad, Gustav January 2018 (has links)
This thesis describes the work of creating an automatic system for identifying synonyms and semantically related words in medical texts. Before this work, as a part of the project E-care@home, medical texts have been classified as either lay or specialized by both a lay annotator and an expert annotator. The lay annotator, in this case, is a person without any medical knowledge, whereas the expert annotator has professional knowledge in medicine. Using these texts made it possible to create co-occurrences matrices from which the related words could be identified. Fifteen medical terms were chosen as system input. The Dice similarity of these words in a context window of ten words around them was calculated. As output, five candidate related terms for each medical term was returned. Only unigrams were considered. The candidate related terms were evaluated using a questionnaire, where 223 healthcare professionals rated the similarity using a scale from one to five. A Fleiss kappa test showed that the agreement among these raters was 0.28, which is a fair agreement. The evaluation further showed that there was a significant correlation between the human ratings and the relatedness score (Dice similarity). That is, words with higher Dice similarity tended to get a higher human rating. However, the Dice similarity interval in which the words got the highest average human rating was 0.35-0.39. This result means that there is much room for improving the system. Further developments of the system should remove the unigram limitation and expand the corpus the provide a more accurate and reliable result.

Page generated in 0.0735 seconds