1 |
An eye-tracking study on synonym replacement / En ögonrörelsestudie på synonymutbyteSvensson, Cassandra January 2015 (has links)
As the amount of information increase, the need for automatic textsimplication also increase. There are some strategies for doing thatand this thesis has studied two basic synonym replacement strategies.The rst one is called word length and is about always choosinga shorter synonym if it is possible. The second one is called wordfrequency and is about always choosing a more frequent synonym if itis possible. Three dierent versions of them were tried. The rst onewas about just choosing the shortest or most frequent synonym. Thesecond was about only choosing a synonym if it was extremely shorteror more frequent. The last was about only choosing a synonym if itmet the requirements for being replaced and was on synonym level 5.Statistical analysis of the data revealed no signicant dierence. Butsmall trends showed that always choosing a more frequent synonymthat is of level 5 seemed to make the text a bit easier.
|
2 |
Automatic Text Simplification via Synonym Replacement / Automatiskt textförenkling genom synonymutbyteKeskisärkkä, Robin January 2012 (has links)
In this study automatic lexical simplification via synonym replacement in Swedish was investigated using three different strategies for choosing alternative synonyms: based on word frequency, based on word length, and based on level of synonymy. These strategies were evaluated in terms of standardized readability metrics for Swedish, average word length, proportion of long words, and in relation to the ratio of errors (type A) and number of replacements. The effect of replacements on different genres of texts was also examined. The results show that replacement based on word frequency and word length can improve readability in terms of established metrics for Swedish texts for all genres but that the risk of introducing errors is high. Attempts were made at identifying criteria thresholds that would decrease the ratio of errors but no general thresholds could be identified. In a final experiment word frequency and level of synonymy were combined using predefined thresholds. When more than one word passed the thresholds word frequency or level of synonymy was prioritized. The strategy was significantly better than word frequency alone when looking at all texts and prioritizing level of synonymy. Both prioritizing frequency and level of synonymy were significantly better for the newspaper texts. The results indicate that synonym replacement on a one-to-one word level is very likely to produce errors. Automatic lexical simplification should therefore not be regarded a trivial task, which is too often the case in research literature. In order to evaluate the true quality of the texts it would be valuable to take into account the specific reader. A simplified text that contains some errors but which fails to appreciate subtle differences in terminology can still be very useful if the original text is too difficult to comprehend to the unassisted reader.
|
3 |
Text Simplification and Keyphrase Extraction for SwedishLindqvist, Ellinor January 2019 (has links)
Attempts have been made in Sweden to increase readability for texts addressed to the public, and ongoing projects are still being conducted by disability associations, private companies and Swedish authorities. In this thesis project, we explore automatic approaches to increase readability trough text simplification and keyphrase extraction, with the goal of facilitating text comprehension and readability for people with reading difficulties. A combination of handwritten rules and monolingual machine translation was used to simplify the syntactic and lexical content of Swedish texts, and noun phrases were extracted to provide the reader with a short summary of the textual content. A user evaluation was conducted to compare the original and the simplified version of the same text. Several texts and their simplified versions were also evaluated using established readability metrics. Although a manual evaluation of the result showed that the implemented rules generally worked as intended on the sentences that were targeted, the results from the user evaluation and readability metrics did not show improvements. We believe that further additions to the rule set, targeting a wider range of linguistic structures, have the potential to improve the results.
|
4 |
Exploring Automatic Synonym Generation for Lexical Simplification of Swedish Electronic Health RecordsJänich, Anna January 2023 (has links)
Electronic health records (EHRs) are used in Sweden's healthcare systems to store patients' medical information. Patients in Sweden have the right to access and read their health records. Unfortunately, the language used in EHRs is very complex and presents a challenge for readers who lack medical knowledge. Simplifying the language used in EHRs could facilitate the transfer of information between medical staff and patients. This project investigates the possibility of generating Swedish medical synonyms automatically. These synonyms are intended to be used in future systems for lexical simplification that can enhance the readability of Swedish EHRs and simplify medical terminology. Current publicly available Swedish corpora that provide synonyms for medical terminology are insufficient in size to be utilized in a system for lexical simplification. To overcome the obstacle of insufficient corpora, machine learning models are trained to generate synonyms and terms that convey medical concepts in a more understandable way. With the purpose of establishing a foundation for analyzing complex medical terms, a simple mechanism for Complex Word Identification (CWI) is implemented. The mechanism relies on matching strings and substrings from a pre-existing corpus containing hand-curated medical terms in Swedish. To find a suitable strategy for generating medical synonyms automatically, seven different machine learning models are queried for synonym suggestions for 50 complex sample terms. To explore the effect of different input data, we trained our models on different datasets with varying sizes. Three of the seven models are based on BERT and four of them are based on Word2Vec. For each model, results for the 50 complex sample terms are generated and raters with medical knowledge are asked to assess whether the automatically generated suggestions could be considered synonyms. The results vary between the different models and seem to be connected to the amount and quality of the data they have been trained on. Furthermore, the raters involved in judging the synonyms exhibit great disagreement, revealing the complexity and subjectivity of the task to find suitable and widely accepted medical synonyms. The method and models applied in this project do not succeed in creating a stable source of suitable synonyms. The chosen BERT approach based on Masked Language Modelling cannot reliably generate suitable synonyms due to the limitation of generating one term per synonym suggestion only. The Word2Vec models demonstrate some weaknesses due to the lack of context consideration. Despite the fact that the current performance of our models in generating automatic synonym suggestions is not entirely satisfactory, we have observed a promising number of accurate suggestions. This gives us reason to believe that with enhanced training and a larger amount of input data consisting of Swedish medical text, the models could be improved and eventually effectively applied.
|
Page generated in 0.0877 seconds