Global ETD Search

71	On High-Dimensional Transformation Vectors Feuchtmüller, Sven January 2018 (has links) No description available. word embeddings transformation vectors
72	Semantic Text Matching Using Convolutional Neural Networks Wang, Run Fen January 2018 (has links) Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters. Text matching CNN TF-IDF Word embedding Word2vec NLP
73	Blending Words or: How I Learned to Stop Worrying and Love the Blendguage : A computational study of lexical blending in Swedish Ek, Adam January 2018 (has links) This thesis investigates Swedish lexical blends. A lexical blend is defined as the concatenation of two words, where at least one word has been reduced. Lexical blends are approached from two perspectives. First, the thesis investigates lexical blends as they appear in the Swedish language. It is found that there is a significant statistical relationship between the two source words in terms of orthographic, phonemic and syllabic length and frequency in a reference corpus. Furthermore, some uncommon lexical blends created from pronouns and interjections are described. A description of lexical blends through semantic construction and similarity to other word formation processes are also described. Secondly, the thesis develops a model which predicts source words of lexical blends. To predict the source words a logistic regression model is used. The evaluation shows that using a ranking approach, the correct source words are the highest ranking word pair in 32.2% of the cases. In the top 10 ranking word pairs, the correct word pair is found in 60.6% of the cases. The results are lower than in previous studies, but the number of blends used is also smaller. It is shown that lexical blends which overlap are easier to predict than lexical blends which do not overlap. Using feature ablation, it is shown that semantic and frequency related features have the most important for the prediction of source words. Lexical blends regression word formation feature ablation ranking
74	Unsupervised Normalisation of Historical Spelling : A Multilingual Evaluation Bergman, Nicklas January 2018 (has links) Historical texts are an important resource for researchers in the humanities. However, standard NLP tools typically perform poorly on them, mainly due to the spelling variations present in such texts. One possible solution is to normalise the spelling variations to equivalent contemporary word forms before using standard tools. Weighted edit distance has previously been used for such normalisation, improving over the results of algorithms based on standard edit distance. Aligned training data is needed to extract weights, but there is a lack of such data. An unsupervised method for extracting edit distance weights is therefore desirable. This thesis presents a multilingual evaluation of an unsupervised method for extracting edit distance weights for normalisation of historical spelling variations. The model is evaluated for English, German, Hungarian, Icelandic and Swedish. The results are mixed and show a high variance depending on the different data sets. The method generally performs better than normalisation basedon standard edit distance but as expected does not quite reach up to the results of a model trained on aligned data. The results show an increase in normalisation accuracy compared to standard edit distance normalisation for all languages except German, which shows a slightly reduced accuracy, and Swedish, which shows similar results to the standard edit distance normalisation. spelling normalisation historical spelling normalisation
75	Menings- och dokumentklassficering för identifiering av meningar / Sentence and document classification for identification of sentences Paulson, Jörgen, Huynh, Peter January 2018 (has links) Detta examensarbete undersöker hur väl tekniker inom meningsklassificering och dokumentklassificering fungerar för att välja ut meningar som innehåller de variabler som använts i experiment som beskrivs i medicinska dokument. För meningsklassificering används tillståndsmaskiner och nyckelord, för dokumentklassificering används linjär SVM och Random forest. De textegenskaper som har valts ut är LIX (läsbarhetsindex) och ordmängd (word count). Textegenskaperna hämtas från en färdig datamängd som skapades av Abrahamsson (T.B.D) från artiklar som samlas in för denna studie. Denna datamängd används sedan för dokumentklassificering. Det som undersöks hos dokumentklassificeringsteknikerna är förmågan att skilja dokument av typerna vetenskapliga artiklar med experiment, vetenskapliga artiklar utan experiment, vetenskapliga artiklar med metaanalyser och dokument som inte är vetenskapliga artiklar åt. Dessa dokument behandlas med meningsklassificering för att undersöka hur väl denna hittar meningar sominnehåller definitioner av variabler. Resultatet från experimentet tydde på att teknikerna för meningsklassificering inte var dugliga för detta ändamål på grund av låg precision. För dokumentklassificering var Randomforest bäst lämpad men hade problem att skilja olika typer av vetenskapliga artiklar åt. natural language processing classification finite state automata språkteknologi klassificering tillståndsmaskiner Computer and Information Sciences Data- och informationsvetenskap
76	Depending on VR : Rule-based Text Simplification Based on Dependency Relations Johansson, Vida January 2017 (has links) The amount of text that is written and made available increases all the time. However, it is not readily accessible to everyone. The goal of the research presented in this thesis was to develop a system for automatic text simplification based on dependency relations, develop a set of simplification rules for the system, and evaluate the performance of the system. The system was built on a previous tool and developments were made to ensure the that the system could perform the operations necessary for the rules included in the rule set. The rule set was developed by manual adaption of the rules to a set of training texts. The evaluation method used was a classification task with both objective measures (precision and recall) and a subjective measure (correctness). The performance of the system was compared to that of a system based on constituency relations. The results showed that the current system scored higher on both precision (96% compared to 82%) and recall (86% compared to 53%), indicating that the syntactic information dependency relations provide is sufficient to perform text simplification. Further evaluation should account for how helpful the text simplification produced by the current system is for target readers. text simplification dependency relations simplification rules digital inclusion
77	Thoughts don't have Colour, do they? : Finding Semantic Categories of Nouns and Adjectives in Text Through Automatic Language Processing / Generering av semantiska kategorier av substantiv och adjektiv genom automatisk textbearbetning Fallgren, Per January 2017 (has links) Not all combinations of nouns and adjectives are possible and some are clearly more fre- quent than other. With this in mind this study aims to construct semantic representations of the two types of parts-of-speech, based on how they occur with each other. By inves- tigating these ideas via automatic natural language processing paradigms the study aims to find evidence for a semantic mutuality between nouns and adjectives, this notion sug- gests that the semantics of a noun can be captured by its corresponding adjectives, and vice versa. Furthermore, a set of proposed categories of adjectives and nouns, based on the ideas of Gärdenfors (2014), is presented that hypothetically are to fall in line with the produced representations. Four evaluation methods were used to analyze the result rang- ing from subjective discussion of nearest neighbours in vector space to accuracy generated from manual annotation. The result provided some evidence for the hypothesis which suggests that further research is of value. semantic representations semantic categories word vectors adjective noun pair
78	Dependency Parsing and Dialogue Systems : an investigation of dependency parsing for commercial application Adams, Allison January 2017 (has links) In this thesis, we investigate dependency parsing for commercial application, namely for future integration in a dialogue system. To do this, we conduct several experiments on dialogue data to assess parser performance on this domain, and to improve this performance over a baseline. This work makes the following contributions: first, the creation and manual annotation of a gold-standard data set for dialogue data; second, a thorough error analysis of the data set, comparing neural network parsing to traditional parsing methods on this domain; and finally, various domain adaptation experiments show how parsing on this data set can be improved over a baseline. We further show that dialogue data is characterized by questions in particular, and suggest a method for improving overall parsing on these constructions. dependency parsing dialogue systems error analysis
79	Större chans att klara det? : En specialpedagogisk studie av 10 ungdomars syn på hur datorstöd har påverkat deras språk, lärande och skolsituation. Hansson, Britt January 2008 (has links) I studien intervjuades 10 ungdomar om sina erfarenheter av att använda dator med talsyntes och inspelade böcker. De tillfrågades om i vilka situationer verktygen har kommit till nytta eller upplevts hämmande i deras lärande och skolsituation. På grund av stora skolsvårigheter har ungdomarna fått låna en bärbar dator av skolan. Den har de använt både hemma och i skolan. Tillsammans med föräldrar och lärare har de fått handledning vid kommunens Skoldatatek. Att språket utvecklas när det används har varit utgångspunkt i studien, ur ett sociokulturellt perspektiv. Skolan ska erbjuda en tidsenlig utbildning och elever i skolsvårigheter har rätt att få stöd. Hur detta stöd ska utformas kan skapa ett dilemma på den enskilda skolan. Ett stöd riktat direkt till den enskilde kan nämligen uppfattas som att skolsvårigheter ses som en elevburen problematik, vilket inte får förekomma i ”en skola för alla”. Med tanke på detta dilemma var det viktigt att efterforska ungdomarnas upplevelser av stöd, utveckling och hinder, för att förstå om de orsakar utpekande och exkludering. Resultatet visade att ungdomarna upplevde att de kände sig mer motiverade med sina datorverktyg, som har kompenserat deras svårigheter och tilltalat deras olika lärstilar. Ungdomarna sade sig ha blivit säkrare skribenter och läsare tack vare ökat språkbruk. I deras berättelse framgår även nödvändigheten av stöd från lärare och föräldrar. Resultatet pekar på att alternativa verktyg i lärandet skulle kunna medverka till större måluppfyllelse i en skola för alla, med pedagogisk mångfald. datorstöd specialpedagogik skoldatatek alternativa verktyg datoranvändning kompensation
80	Semantisk spegling : En implementation för att synliggöra semantiska relationer i tvåspråkiga data Andersson, Sebastian January 2004 (has links) Semantiska teorier inom traditionell lingvistik har i huvudsak fokuserat på relationen mellan ord och de egenskaper eller objekt som ordet står för. Dessa teorier har sällan varit empiriskt grundade utan resultatet av enskilda teoretikers tankemödor som exemplifierats med ett fåtal ord. För användning inom översättning eller maskinöversättning kan ett ords betydelse istället definieras utifrån dess relation till andra språk. Översättning av text lämnar dessutom analyserbart material efter sig i form av originaltext och översättning som öppnar möjlighet för empiriskt grundade semantiska relationer. En metod för att försöka hitta enspråkiga semantiska relationer utifrån tvåspråkiga översättningsdata är semantisk spegling. Genom att utnyttja att ord är tvetydiga på olika sätt i källspråk och målspråk kan semantiska relationer mellan ord i källspråket hittas utifrån relationen till målspråket. I denna uppsats har semantisk spegling implementerats och applicerats på tvåspråkiga (svenska ochengelska) ordboksdata. Eftersom de enspråkiga relationerna i semantisk spegling tas fram utifrån ett annat språk har detta utnyttjats i arbetet för att även ta fram tvåspråkiga semantiska relationer. Resultatet har jämförts med befintliga synonymlexikon, utvärderats kvalitativt samt jämförts med ursprungsdata. Resultaten är av varierande kvalitet men visar ändå på potential hos metoden och möjlighet att använda resultatet som lexikal resurs inom till exempel lexikografi Interdisciplinary studies språkteknologi lexikografi semantik ordbok synonym TVÄRVETENSKAP Social Sciences Interdisciplinary

Search results