Global ETD Search

71	Semantisk spegling : En implementation för att synliggöra semantiska relationer i tvåspråkiga data Andersson, Sebastian January 2004 (has links) <p>Semantiska teorier inom traditionell lingvistik har i huvudsak fokuserat på relationen mellan ord och de egenskaper eller objekt som ordet står för. Dessa teorier har sällan varit empiriskt grundade utan resultatet av enskilda teoretikers tankemödor som exemplifierats med ett fåtal ord. För användning inom översättning eller maskinöversättning kan ett ords betydelse istället definieras utifrån dess relation till andra språk. Översättning av text lämnar dessutom analyserbart material efter sig i form av originaltext och översättning som öppnar möjlighet för empiriskt grundade semantiska relationer. En metod för att försöka hitta enspråkiga semantiska relationer utifrån tvåspråkiga översättningsdata är semantisk spegling. Genom att utnyttja att ord är tvetydiga på olika sätt i källspråk och målspråk kan semantiska relationer mellan ord i källspråket hittas utifrån relationen till målspråket. I denna uppsats har semantisk spegling implementerats och applicerats på tvåspråkiga (svenska ochengelska) ordboksdata. Eftersom de enspråkiga relationerna i semantisk spegling tas fram utifrån ett annat språk har detta utnyttjats i arbetet för att även ta fram tvåspråkiga semantiska relationer. Resultatet har jämförts med befintliga synonymlexikon, utvärderats kvalitativt samt jämförts med ursprungsdata. Resultaten är av varierande kvalitet men visar ändå på potential hos metoden och möjlighet att använda resultatet som lexikal resurs inom till exempel lexikografi</p> Interdisciplinary studies språkteknologi lexikografi semantik ordbok synonym TVÄRVETENSKAP INTERDISCIPLINARY RESEARCH AREAS TVÄRVETENSKAPLIGA FORSKNINGSOMRÅDEN
72	Classification into Readability Levels : Implementation and Evaluation Larsson, Patrik January 2006 (has links) <p>The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.</p> / <p>Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.</p> readability information retrieval search engines Computational linguistics läsbarhet sökmotorer informationssökning maskininlärning språkteknologi datorlingvistik Computational linguistics Datorlingvistik
73	Utveckling av ett svensk-engelskt lexikon inom tåg- och transportdomänen Axelsson, Hans, Blom, Oskar January 2006 (has links) <p>This paper describes the process of building a machine translation lexicon for use in the train and transport domain with the machine translation system MATS. The lexicon will consist of a Swedish part, an English part and links between them and is derived from a Trados</p><p>translation memory which is split into a training(90%) part and a testing(10%) part. The task is carried out mainly by using existing word linking software and recycling previous machine translation lexicons from other domains. In order to do this, a method is developed where focus lies on automation by means of both existing and self developed software, in combination with manual interaction. The domain specific lexicon is then extended with a domain neutral core lexicon and a less domain neutral general lexicon. The different lexicons are automatically and manually evaluated through machine translation on the test corpus. The automatic evaluation of the largest lexicon yielded a NEVA score of 0.255 and a BLEU score of 0.190. The manual evaluation saw 34% of the segments correctly translated, 37%, although not correct, perfectly understandable and 29% difficult to understand.</p> datorlingvistik BLEU NEVA lexikon maskinöversättning länkning Multra lemma språkteknologi Computational linguistics Datorlingvistik
74	A Pipeline for Automatic Lexical Normalization of Swedish Student Writings Liu, Yuhan January 2018 (has links) In this thesis, we aim to explore the combination of different lexical normalization methods and provide a practical lexical normalization pipeline for Swedish student writings within the framework of SWEGRAM(Näsman et al., 2017). An important improvement in my implementation is that the pipeline design should consider the unique morphological and phonological characteristics of the Swedish language. This kind of localization makes the system more robust for Swedish at the cost of being less applicable to other languages in similar tasks. The core of the localization lies in a phonetic algorithm we designed specifically for the Swedish language and a compound processing step for Swedish compounding phenomenon. The proposed pipeline consists of four steps, namely preprocessing, identification of out-of-vocabulary words, generation of normalization candidates and candidate selection. For each step we use different approaches. We perform experiments on the Uppsala Corpus of Student Writings (UCSW) (Megyesi et al., 2016), and evaluate the results in termsof precision, recall and accuracy measures. The techniques applied to the raw data and their impacts on the final result are presented. In our evaluation, we show that the pipeline can be useful in the lexical normalization task and our phonetic algorithm is proven to be effective for the Swedish language. Lexical normalization Phonetic algorithm for Swedish
75	On High-Dimensional Transformation Vectors Feuchtmüller, Sven January 2018 (has links) No description available. word embeddings transformation vectors
76	Semantic Text Matching Using Convolutional Neural Networks Wang, Run Fen January 2018 (has links) Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters. Text matching CNN TF-IDF Word embedding Word2vec NLP
77	Blending Words or: How I Learned to Stop Worrying and Love the Blendguage : A computational study of lexical blending in Swedish Ek, Adam January 2018 (has links) This thesis investigates Swedish lexical blends. A lexical blend is defined as the concatenation of two words, where at least one word has been reduced. Lexical blends are approached from two perspectives. First, the thesis investigates lexical blends as they appear in the Swedish language. It is found that there is a significant statistical relationship between the two source words in terms of orthographic, phonemic and syllabic length and frequency in a reference corpus. Furthermore, some uncommon lexical blends created from pronouns and interjections are described. A description of lexical blends through semantic construction and similarity to other word formation processes are also described. Secondly, the thesis develops a model which predicts source words of lexical blends. To predict the source words a logistic regression model is used. The evaluation shows that using a ranking approach, the correct source words are the highest ranking word pair in 32.2% of the cases. In the top 10 ranking word pairs, the correct word pair is found in 60.6% of the cases. The results are lower than in previous studies, but the number of blends used is also smaller. It is shown that lexical blends which overlap are easier to predict than lexical blends which do not overlap. Using feature ablation, it is shown that semantic and frequency related features have the most important for the prediction of source words. Lexical blends regression word formation feature ablation ranking
78	Unsupervised Normalisation of Historical Spelling : A Multilingual Evaluation Bergman, Nicklas January 2018 (has links) Historical texts are an important resource for researchers in the humanities. However, standard NLP tools typically perform poorly on them, mainly due to the spelling variations present in such texts. One possible solution is to normalise the spelling variations to equivalent contemporary word forms before using standard tools. Weighted edit distance has previously been used for such normalisation, improving over the results of algorithms based on standard edit distance. Aligned training data is needed to extract weights, but there is a lack of such data. An unsupervised method for extracting edit distance weights is therefore desirable. This thesis presents a multilingual evaluation of an unsupervised method for extracting edit distance weights for normalisation of historical spelling variations. The model is evaluated for English, German, Hungarian, Icelandic and Swedish. The results are mixed and show a high variance depending on the different data sets. The method generally performs better than normalisation basedon standard edit distance but as expected does not quite reach up to the results of a model trained on aligned data. The results show an increase in normalisation accuracy compared to standard edit distance normalisation for all languages except German, which shows a slightly reduced accuracy, and Swedish, which shows similar results to the standard edit distance normalisation. spelling normalisation historical spelling normalisation
79	Menings- och dokumentklassficering för identifiering av meningar / Sentence and document classification for identification of sentences Paulson, Jörgen, Huynh, Peter January 2018 (has links) Detta examensarbete undersöker hur väl tekniker inom meningsklassificering och dokumentklassificering fungerar för att välja ut meningar som innehåller de variabler som använts i experiment som beskrivs i medicinska dokument. För meningsklassificering används tillståndsmaskiner och nyckelord, för dokumentklassificering används linjär SVM och Random forest. De textegenskaper som har valts ut är LIX (läsbarhetsindex) och ordmängd (word count). Textegenskaperna hämtas från en färdig datamängd som skapades av Abrahamsson (T.B.D) från artiklar som samlas in för denna studie. Denna datamängd används sedan för dokumentklassificering. Det som undersöks hos dokumentklassificeringsteknikerna är förmågan att skilja dokument av typerna vetenskapliga artiklar med experiment, vetenskapliga artiklar utan experiment, vetenskapliga artiklar med metaanalyser och dokument som inte är vetenskapliga artiklar åt. Dessa dokument behandlas med meningsklassificering för att undersöka hur väl denna hittar meningar sominnehåller definitioner av variabler. Resultatet från experimentet tydde på att teknikerna för meningsklassificering inte var dugliga för detta ändamål på grund av låg precision. För dokumentklassificering var Randomforest bäst lämpad men hade problem att skilja olika typer av vetenskapliga artiklar åt. natural language processing classification finite state automata språkteknologi klassificering tillståndsmaskiner Computer and Information Sciences Data- och informationsvetenskap
80	Depending on VR : Rule-based Text Simplification Based on Dependency Relations Johansson, Vida January 2017 (has links) The amount of text that is written and made available increases all the time. However, it is not readily accessible to everyone. The goal of the research presented in this thesis was to develop a system for automatic text simplification based on dependency relations, develop a set of simplification rules for the system, and evaluate the performance of the system. The system was built on a previous tool and developments were made to ensure the that the system could perform the operations necessary for the rules included in the rule set. The rule set was developed by manual adaption of the rules to a set of training texts. The evaluation method used was a classification task with both objective measures (precision and recall) and a subjective measure (correctness). The performance of the system was compared to that of a system based on constituency relations. The results showed that the current system scored higher on both precision (96% compared to 82%) and recall (86% compared to 53%), indicating that the syntactic information dependency relations provide is sufficient to perform text simplification. Further evaluation should account for how helpful the text simplification produced by the current system is for target readers. text simplification dependency relations simplification rules digital inclusion

Search results