• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 313
  • 47
  • Tagged with
  • 360
  • 351
  • 321
  • 306
  • 303
  • 296
  • 296
  • 98
  • 87
  • 81
  • 78
  • 76
  • 73
  • 65
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Tree Transformations in Inductive Dependency Parsing

Nilsson, Jens January 2007 (has links)
<p>This licentiate thesis deals with automatic syntactic analysis, or parsing, of natural languages. A parser constructs the syntactic analysis, which it learns by looking at correctly analyzed sentences, known as training data. The general topic concerns manipulations of the training data in order to improve the parsing accuracy.</p><p>Several studies using constituency-based theories for natural languages in such automatic and data-driven syntactic parsing have shown that training data, annotated according to a linguistic theory, often needs to be adapted in various ways in order to achieve an adequate, automatic analysis. A linguistically sound constituent structure is not necessarily well-suited for learning and parsing using existing data-driven methods. Modifications to the constituency-based trees in the training data, and corresponding modifications to the parser output, have successfully been applied to increase the parser accuracy. The topic of this thesis is to investigate whether similar modifications in the form of tree transformations to training data, annotated with dependency-based structures, can improve accuracy for data-driven dependency parsers. In order to do this, two types of tree transformations are in focus in this thesis.</p><p>The first one concerns non-projectivity. The full potential of dependency parsing can only be realized if non-projective constructions are allowed, which pose a problem for projective dependency parsers. On the other hand, non-projective parsers tend, among other things, to be slower. In order to maintain the benefits of projective parsing, a tree transformation technique to recover non-projectivity while using a projective parser is presented here.</p><p>The second type of transformation concerns linguistic phenomena that are possible but hard for a parser to learn, given a certain choice of dependency analysis. This study has concentrated on two such phenomena, coordination and verb groups, for which tree transformations are applied in order to improve parsing accuracy, in case the original structure does not coincide with a structure that is easy to learn.</p><p>Empirical evaluations are performed using treebank data from various languages, and using more than one dependency parser. The results show that the benefit of these tree transformations used in preprocessing and postprocessing to a large extent is language, treebank and parser independent.</p>
62

A Study on Text Classification Methods and Text Features

Danielsson, Benjamin January 2019 (has links)
When it comes to the task of classification the data used for training is the most crucial part. It follows that how this data is processed and presented for the classifier plays an equally important role. This thesis attempts to investigate the performance of multiple classifiers depending on the features that are used, the type of classes to classify and the optimization of said classifiers. The classifiers of interest are support-vector machines (SMO) and multilayer perceptron (MLP), the features tested are word vector spaces and text complexity measures, along with principal component analysis on the complexity measures. The features are created based on the Stockholm-Umeå-Corpus (SUC) and DigInclude, a dataset containing standard and easy-to-read sentences. For the SUC dataset the classifiers attempted to classify texts into nine different text categories, while for the DigInclude dataset the sentences were classified into either standard or simplified classes. The classification tasks on the DigInclude dataset showed poor performance in all trials. The SUC dataset showed best performance when using SMO in combination with word vector spaces. Comparing the SMO classifier on the text complexity measures when using or not using PCA showed that the performance was largely unchanged between the two, although not using PCA had slightly better performance
63

Word embeddings and Patient records : The identification of MRI risk patients

Kindberg, Erik January 2019 (has links)
Identification of risks ahead of MRI examinations is identified as a cumbersome and time-consuming process at the Linköping University Hospital radiology clinic. The hospital staff often have to search through large amounts of unstructured patient data to find information about implants. Word embeddings has been identified as a possible tool to speed up this process. The purpose of this thesis is to evaluate this method, and that is done by training a Word2Vec model on patient journal data and analyzing the close neighbours of key search words by calculating cosine similarity. The 50 closest neighbours of each search words are categorized and annotated as relevant to the task of identifying risk patients ahead of MRI examinations or not. 10 search words were explored, leading to a total of 500 terms being annotated. In total, 14 different categories were observed in the result and out of these 8 were considered relevant. Out of the 500 terms, 340 (68%) were considered relevant. In addition, 48 implant models could be observed which are particularly interesting because if a patient have an implant, hospital staff needs to determine it’s exact model and the MRI conditions of that model. Overall these findings points towards a positive answer for the aim of the thesis, although further developments are needed.
64

Is Simple Wikipedia simple? : – A study of readability and guidelines

Isaksson, Fabian January 2018 (has links)
Creating easy-to-read text is an issue that has traditionally been solved with manual work. But with advancing research in natural language processing, automatic systems for text simplification are being developed. These systems often need training data that is parallel aligned. For several years, simple Wikipedia has been the main source for this data. In the current study, several readability measures has been tested on a popular simplification corpus. A selection of guidelines from simple Wikipedia has also been operationalized and tested. The results imply that the following of guidelines are not greater in simple Wikipedia than in standard Wikipedia. There are however differences in the readability measures. The syntactical structures of simple Wikipedia seems to be less complex than those of standard Wikipedia. A continuation of this study would be to examine other readability measures and evaluate the guidelines not covered within the current work.
65

Hierarchical text classification of fiction books : With Thema subject categories

Reinaudo, Alice January 2019 (has links)
Categorizing books and literature of any genre and subject area is a vital task for publishers which seek to distribute their books to the appropriate audiences. It is common that different countries use different subject categorization schemes, which makes international book trading more difficult due to the need to categorize books from scratch once they reach another country. A solution to this problem has been proposed in the form of an international standard called Thema, which encompasses thousands of hierarchical subject categories. However, because this scheme is quite recent, many books published before its creation are yet to be assigned subject categories. It also is often the case that even recent books are not categorized. In this work, methods for automatic categorization of books are investigated, based on multinomial Naive Bayes and Facebook's classifier fastText. The results show some amount of promise for both classifiers, but overall, due to data imbalance and a very long training time that made it difficult to use more data, it is not possible to determine with certainty which classifier actually is best.
66

Större chans att klara det? : En specialpedagogisk studie av 10 ungdomars syn på hur datorstöd har påverkat deras språk, lärande och skolsituation.

Hansson, Britt January 2008 (has links)
<p>I studien intervjuades 10 ungdomar om sina erfarenheter av att använda dator med talsyntes och inspelade böcker. De tillfrågades om i vilka situationer verktygen har kommit till nytta eller upplevts hämmande i deras lärande och skolsituation. På grund av stora skolsvårigheter har ungdomarna fått låna en bärbar dator av skolan. Den har de använt både hemma och i skolan. Tillsammans med föräldrar och lärare har de fått handledning vid kommunens Skoldatatek. Att språket utvecklas när det används har varit utgångspunkt i studien, ur ett sociokulturellt perspektiv. Skolan ska erbjuda en tidsenlig utbildning och elever i skolsvårigheter har rätt att få stöd. Hur detta stöd ska utformas kan skapa ett dilemma på den enskilda skolan. Ett stöd riktat direkt till den enskilde kan nämligen uppfattas som att skolsvårigheter ses som en elevburen problematik, vilket inte får förekomma i ”en skola för alla”. Med tanke på detta dilemma var det viktigt att efterforska ungdomarnas upplevelser av stöd, utveckling och hinder, för att förstå om de orsakar utpekande och exkludering. Resultatet visade att ungdomarna upplevde att de kände sig mer motiverade med sina datorverktyg, som har kompenserat deras svårigheter och tilltalat deras olika lärstilar. Ungdomarna sade sig ha blivit säkrare skribenter och läsare tack vare ökat språkbruk. I deras berättelse framgår även nödvändigheten av stöd från lärare och föräldrar. Resultatet pekar på att alternativa verktyg i lärandet skulle kunna medverka till större måluppfyllelse i en skola för alla, med pedagogisk mångfald.</p>
67

Semantisk spegling : En implementation för att synliggöra semantiska relationer i tvåspråkiga data

Andersson, Sebastian January 2004 (has links)
<p>Semantiska teorier inom traditionell lingvistik har i huvudsak fokuserat på relationen mellan ord och de egenskaper eller objekt som ordet står för. Dessa teorier har sällan varit empiriskt grundade utan resultatet av enskilda teoretikers tankemödor som exemplifierats med ett fåtal ord. För användning inom översättning eller maskinöversättning kan ett ords betydelse istället definieras utifrån dess relation till andra språk. Översättning av text lämnar dessutom analyserbart material efter sig i form av originaltext och översättning som öppnar möjlighet för empiriskt grundade semantiska relationer. En metod för att försöka hitta enspråkiga semantiska relationer utifrån tvåspråkiga översättningsdata är semantisk spegling. Genom att utnyttja att ord är tvetydiga på olika sätt i källspråk och målspråk kan semantiska relationer mellan ord i källspråket hittas utifrån relationen till målspråket. I denna uppsats har semantisk spegling implementerats och applicerats på tvåspråkiga (svenska ochengelska) ordboksdata. Eftersom de enspråkiga relationerna i semantisk spegling tas fram utifrån ett annat språk har detta utnyttjats i arbetet för att även ta fram tvåspråkiga semantiska relationer. Resultatet har jämförts med befintliga synonymlexikon, utvärderats kvalitativt samt jämförts med ursprungsdata. Resultaten är av varierande kvalitet men visar ändå på potential hos metoden och möjlighet att använda resultatet som lexikal resurs inom till exempel lexikografi</p>
68

Classification into Readability Levels : Implementation and Evaluation

Larsson, Patrik January 2006 (has links)
<p>The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.</p> / <p>Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.</p>
69

Utveckling av ett svensk-engelskt lexikon inom tåg- och transportdomänen

Axelsson, Hans, Blom, Oskar January 2006 (has links)
<p>This paper describes the process of building a machine translation lexicon for use in the train and transport domain with the machine translation system MATS. The lexicon will consist of a Swedish part, an English part and links between them and is derived from a Trados</p><p>translation memory which is split into a training(90%) part and a testing(10%) part. The task is carried out mainly by using existing word linking software and recycling previous machine translation lexicons from other domains. In order to do this, a method is developed where focus lies on automation by means of both existing and self developed software, in combination with manual interaction. The domain specific lexicon is then extended with a domain neutral core lexicon and a less domain neutral general lexicon. The different lexicons are automatically and manually evaluated through machine translation on the test corpus. The automatic evaluation of the largest lexicon yielded a NEVA score of 0.255 and a BLEU score of 0.190. The manual evaluation saw 34% of the segments correctly translated, 37%, although not correct, perfectly understandable and 29% difficult to understand.</p>
70

A Pipeline for Automatic Lexical Normalization of Swedish Student Writings

Liu, Yuhan January 2018 (has links)
In this thesis, we aim to explore the combination of different lexical normalization methods and provide a practical lexical normalization pipeline for Swedish student writings within the framework of SWEGRAM(Näsman et al., 2017). An important improvement in my implementation is that the pipeline design should consider the unique morphological and phonological characteristics of the Swedish language. This kind of localization makes the system more robust for Swedish at the cost of being less applicable to other languages in similar tasks. The core of the localization lies in a phonetic algorithm we designed specifically for the Swedish language and a compound processing step for Swedish compounding phenomenon. The proposed pipeline consists of four steps, namely preprocessing, identification of out-of-vocabulary words, generation of normalization candidates and candidate selection. For each step we use different approaches. We perform experiments on the Uppsala Corpus of Student Writings (UCSW) (Megyesi et al., 2016), and evaluate the results in termsof precision, recall and accuracy measures. The techniques applied to the raw data and their impacts on the final result are presented. In our evaluation, we show that the pipeline can be useful in the lexical normalization task and our phonetic algorithm is proven to be effective for the Swedish language.

Page generated in 0.0936 seconds