• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 313
  • 47
  • Tagged with
  • 360
  • 351
  • 321
  • 306
  • 303
  • 296
  • 296
  • 98
  • 87
  • 81
  • 78
  • 76
  • 73
  • 65
  • 58
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Grundtonsstrategier vid tonlösa segment

von Kartaschew, Filip January 2007 (has links)
Prosodimodeller som bl.a. kan användas i talsynteser grundar sig ofta på analyser av tal som består av enbart tonande segment. Framför tonlös konsonant saknar vokalsegments grundtonskurvor möjlig fortsättning och blir dessutom kortare. Detta brukar då justeras med hjälp av trunkering av grundtonskurvan. Tidigare studier har i korthet visat att skillnader, förutom trunkering, i vokalers grundtonskurva kan uppstå beroende på om efterföljande segment är tonande eller tonlöst. Med utgångspunkt från dessa studier undersöks i detta examensarbete grundtonskurvan i svenska satser. Även resultaten i denna studie visar att olika strategier i grundtonskurvan används, och att trunkering inte räcker för att förklara vad som sker med grundtonskurvan i dessa kontexter. Generellt visar resultaten på att det verkar viktigt för försökspersonerna att behålla den information som grundtonskurvan ger i form av max- och minimumvärde, och att fall och stigningar så långt det går bibehålls.
42

Word Alignment by Re-using Parallel Phrases

Holmqvist, Maria January 2008 (has links)
In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more  context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present  experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.
43

Grundtonsstrategier vid tonlösa segment

von Kartaschew, Filip January 2007 (has links)
<p>Prosodimodeller som bl.a. kan användas i talsynteser grundar sig ofta på analyser av tal som består av enbart tonande segment. Framför tonlös konsonant saknar vokalsegments grundtonskurvor möjlig fortsättning och blir dessutom kortare. Detta brukar då justeras med hjälp av trunkering av grundtonskurvan. Tidigare studier har i korthet visat att skillnader, förutom trunkering, i vokalers grundtonskurva kan uppstå beroende på om efterföljande segment är tonande eller tonlöst. Med utgångspunkt från dessa studier undersöks i detta examensarbete grundtonskurvan i svenska satser. Även resultaten i denna studie visar att olika strategier i grundtonskurvan används, och att trunkering inte räcker för att förklara vad som sker med grundtonskurvan i dessa kontexter. Generellt visar resultaten på att det verkar viktigt för försökspersonerna att behålla den information som grundtonskurvan ger i form av max- och minimumvärde, och att fall och stigningar så långt det går bibehålls.</p>
44

MaltParser -- An Architecture for Inductive Labeled Dependency Parsing

Hall, Johan January 2006 (has links)
<p>This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).</p><p>The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.</p> / <p>Denna licentiatavhandling presenterar en mjukvaruarkitektur för</p><p>datadriven dependensparsning, dvs. för att automatiskt skapa en</p><p>syntaktisk analys i form av dependensgrafer för meningar i texter</p><p>på naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektor-maskiner (SVM).</p><p>MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVM-gränssnittet. Experimenten visar att det är möjligt att reducera inlärnings- och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodell</p><p>har genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats.</p>
45

Improved Automatic Text Simplification by Manual Training / Förbättrad automatisk textförenkling genom manuell träning

Rennes, Evelina January 2015 (has links)
The purpose of this thesis was the further development of a rule set used in an automatic text simplification system, and the exploration of whether it is possible to improve the performance of a rule based text simplification system by manual training. A first rule set was developed from a thor- ough literature review, and the rule refinement was performed by manually adapting the first rule set to a set of training texts. When there was no more change added to the set of rules, the training was considered to be completed, and the two sets were applied to a test set, for evaluation. This thesis evaluated the performance of a text simplification system as a clas- sification task, by the use of objective metrics: precision and recall. The comparison of the rule sets revealed a clear improvement of the system, since precision increased from 45% to 82%, and recall increased from 37% to 53%. Both recall and precision was improved after training for the ma- jority of the rules, with a few exceptions. All rule types resulted in a higher score on correctness for R2. Automatic text simplification systems target- ing real life readers need to account for qualitative aspects, which has not been considered in this thesis. Future evaluation should, in addition to quantitative metrics such as precision, recall, and complexity metrics, also account for the experience of the reader.
46

Implementation och utvärdering av termlänkare i Java

Axelsson, Robin January 2013 (has links)
Aligning parallell terms in a parallell corpus can be done by aligning all words and phrases in the corpus and then performing term extraction on the aligned set of word pairs. Alternatively, term extraction in the source and target text can be made separately and then the resulting term candidates can be aligned, forming aligned parallell terms. This thesis describes an implementation of a word aligner that is applied on extracted term candidates in both the source and the target texts. The term aligner uses statistical measures, the tool Giza++ and heuristics in the search for alignments. The evaluation reveals that the best results are obtained when the term alignment relies heavily on the Giza++ tool and Levenshtein heuristic.
47

Separating the Signal from the Noise: Predicting the Correct Entities in Named-Entity Linking

Perkins, Drew January 2020 (has links)
In this study, I constructed a named-entity linking system that maps between contextual word embeddings and knowledge graph embeddings to predict correct entities. To establish a named-entity linking system, I first applied named-entity recognition to identify the entities of interest. I then performed candidate generation via locality sensitivity hashing (LSH), where a candidate group of potential entities were created for each identified entity. Afterwards, my named-entity disambiguation component was performed to select the most probable candidate. By concatenating contextual word embeddings and knowledge graph embeddings in my disambiguation component, I present a novel approach to named-entity linking. I conducted the experiments with the Kensho-Derived Wikimedia Dataset and the AIDA CoNLL-YAGO Dataset; the former dataset was used for deployment and the later is a benchmark dataset for entity linking tasks. Three deep learning models were evaluated on the named-entity disambiguation component with different context embeddings. The evaluation was treated as a classification task, where I trained my models to select the correct entity from a list of candidates. By optimizing the named-entity linking through this methodology, this entire system can be used in recommendation engines with high F1 of 86% using the former dataset. With the benchmark dataset, the proposed method is able to achieve F1 of 79%.
48

Hur gör vi läsning lättare när det är så tråkigt att läsa? : En studie om textförenkling och hur personer med dyslexi upplever texter

Mahne, Niklas January 2022 (has links)
Denna studie undersökte hur automatiskt textförenkling upplevs av och hjälper högstadieelever med dyslexi. Deltagarna har fått läsa totalt fyra texter om olika ämnen uppdelat på två tillfällen där ena texten alltid varit förenklad och den andra en originaltext.Efter att läst texterna har deltagarna fått svara på läsförståelsefrågor för att se hur mycketde förstod av texten och sedan delta i en semi-strukturerad intervju för att ta reda på deras upplevelse av texten de hade läst. Resultatet från läsförståelsen delades sedan upp i degrupper som läst samma version av texterna och svaren i de semi-strukturerade intervjuerna analyserades i en tematisk analys. Resultatet visade att ingen större skillnad uppstodmellan texterna utan att båda texterna fungerade lika bra.
49

Unsupervised multilingual distractor generation for fill-in-the-blank questions

Han, Zhe January 2022 (has links)
Fill-in-the-blank multiple choice questions (MCQs) play an important role in the educational field, but the manual generation of them is quite resource-consuming, so it has gradually turned into an attractive NLP task. Thereinto, question creation itself has become a mainstream NLP research topic, while distractor (wrong alternative) generation (DG) still remains out of the spotlight. Although several studies on distractor generation have been conducted in recent years, there is little previous work on languages other than English. The goal of this thesis is to generate multilingual distractors in Chinese, Arabic, German, and English across domains. The initial step is to construct small-sized multilingual scientific datasets (En, Zh, Ar, and De) and general datasets (Zh and Ar) from scratch. Considering that there are limited multilingual labelled datasets, unsupervised experiments based on WordNet, Word Embedding, transformer-based models, translation methods, and domain adaptation are conducted to generate their corresponding candidate distractors. Finally, the performance of methods is evaluated against our newly-created datasets, where three metrics are applied. Lastly, statistical results show that monolingual transformer-based together with translation-based methods outperform the rest of the approaches for multilingual datasets, except for German, which reaches its highest score only through the translation-based means, and distractor generation in English datasets is the simplest to implement, whereas it is the most difficult in Arabic datasets.
50

Detecting Dissimilarity in Discourse on Social Media

Mineur, Mattias January 2022 (has links)
A lot of interaction between humans take place on social media. Groups and communities are sometimes formed both with and without intention. These interactions generate a large quantity of text data. This project aims to detect dissimilarity in discourse between communities on social media using a distributed approach. A data set of tweets was used to test and evaluate the method. Tweets produced from two communities were extracted from the data set. Two Natural Language Processing techniques were used to vectorise the tweets for each community. Namely LIWC, dictionary based on knowledge acquired from professionals in linguistics and psychology, and BERT, an embedding model which uses machine learning to present words and sentences as a vector of decimal numbers. These vectors were then used as representations of the text to measure the similarity of discourse between the communities. Both distance and similarity were measured. It was concluded that none of the combinations of measure or vectorisation method that was tried could detect a dissimilarity in discourse on social media for the sample data set.

Page generated in 0.0436 seconds