Global ETD Search

1	Intonation and sentence type interpretation in Greek : A production and perception approach Kotsifas, Dimitrios January 2009 (has links) <p>This thesis examines the intonation patterns of Modern Greek with regard to different interpretations of the sentence types (declarative, interrogative, imperative).</p><p>14 utterances are produced by Greek native speakers (2 men and 2 women) so as to express various speech acts: STATEMENT, QUESTION, COMMAND and REQUEST.</p><p>The acquisition of the F0 curve for each utterance by means of the Wavesurfer tool leads to an analysis of the pitch movements and their alignments.</p><p>After the F0 curves are analyzed and illustrated using the Excel program we are able to compare and group them. Thus, we come up with 5 different intonation patterns. After a second-level comparison based on the fact that some of the F0 curves were similar but they differed only as far as the final pitch movement is concerned, we ended up with 3 fundamental categories of intonation patterns: Category I whose main feature is the rising pitch movement aligned to the onset of the stressed syllables. This category includes only sentences that denote Statement so we can call it the STATEMENT category. Category II’s main characteristic is a dipping pitch movement aligned to the head of the utterance that is the stress of the verb or a particle that signifies negation (/min/, /den/). Sentences meaning Command or Request belong to this category. Lastly, Category III’s intonation pattern consists of peaking pitch movements aligned to the initial and final stressed syllables. Interrogative sentences belong to this category no matter their interpretation.</p><p>A secondary goal of the thesis is to examine to which extent intonation can be a safe criterion for the “correct” interpretation of a sentence. A de facto presumption that since the ratio between the number of utterances (14) and the different intonation patterns (5) is not 1:1 there can always be misunderstandings among speakers, is basically verified by the results of our perception test conducted to Greek native speakers: Greek native speakers were able to identify most of the speech acts that were expressed by the most common (default) sentence type (i.e. imperative sentence for COMMAND and interrogative for QUESTION) however there were combinations that they had difficulties to identify, such as interrogative sentences that were denoting other than QUESTION, e.g. REQUEST or STATEMENT.Ending, a perception test conducted to Flemish speakers (subjects that were native speakers of another language than Greek) showed that they were more successful in sentences that meant STATEMENT and QUESTION but they could hardly identify an interrogative sentence that meant other than QUESTION and they also confused between COMMAND and REQUEST. This implies that the intonation used to convey different interpretations is basically language-dependent.</p><p>Concluding, this study offers a description of the intonation patterns (based on pitch movements) regarding the 3 sentence types with 4 different interpretations. Our findings prove that the intonation for some cases (i.e. for sentences that express COMMAND or STATEMENT) seems to be structure-independent and for others structure-dependent (cf. the interrogative sentences). Additionally, the fact that the negation can play an important role for the choice of intonation pattern (as shown for the case of COMMAND and STATEMENT) could be considered as a structure-dependent feature of intonation. This approach contrasts the approach used for many years in the traditional Grammar according to which the structure alone (sentence type) defines the meaning that is to be conveyed.</p> Read more Computational linguistics Datorlingvistik
2	Grundtonsstrategier vid tonlösa segment von Kartaschew, Filip January 2007 (has links) <p>Prosodimodeller som bl.a. kan användas i talsynteser grundar sig ofta på analyser av tal som består av enbart tonande segment. Framför tonlös konsonant saknar vokalsegments grundtonskurvor möjlig fortsättning och blir dessutom kortare. Detta brukar då justeras med hjälp av trunkering av grundtonskurvan. Tidigare studier har i korthet visat att skillnader, förutom trunkering, i vokalers grundtonskurva kan uppstå beroende på om efterföljande segment är tonande eller tonlöst. Med utgångspunkt från dessa studier undersöks i detta examensarbete grundtonskurvan i svenska satser. Även resultaten i denna studie visar att olika strategier i grundtonskurvan används, och att trunkering inte räcker för att förklara vad som sker med grundtonskurvan i dessa kontexter. Generellt visar resultaten på att det verkar viktigt för försökspersonerna att behålla den information som grundtonskurvan ger i form av max- och minimumvärde, och att fall och stigningar så långt det går bibehålls.</p> prosodimodellering språkteknologi datorlingvistik grundton prosodi talteknologi fonetik Computational linguistics Datorlingvistik
3	Word Alignment by Re-using Parallel Phrases Holmqvist, Maria January 2008 (has links) <p>In this thesis we present the idea of using parallel phrases for word alignment. Each parallel phrase is extracted from a set of manual word alignments and contains a number of source and target words and their corresponding alignments. If a parallel phrase matches a new sentence pair, its word alignments can be applied to the new sentence. There are several advantages of using phrases for word alignment. First, longer text segments include more context and will be more likely to produce correct word alignments than shorter segments or single words. More importantly, the use of longer phrases makesit possible to generalize words in the phrase by replacing words by parts-of-speech or other grammatical information. In this way, the number of words covered by the extracted phrases can go beyond the words and phrases that were present in the original set of manually aligned sentences. We present experiments with phrase-based word alignment on three types of English–Swedish parallel corpora: a software manual, a novel and proceedings of the European Parliament. In order to find a balance between improved coverage and high alignment accuracy we investigated different properties of generalised phrases to identify which types of phrases are likely to produce accurate alignments on new data. Finally, we have compared phrase-based word alignments to state-of-the-art statistical alignment with encouraging results. We show that phrase-based word alignments can be used to enhance statistical word alignment. To evaluate word alignments an English–Swedish reference set for the Europarl corpus was constructed. The guidelines for producing this reference alignment are presented in the thesis.</p> Read more computational linguistics Computational linguistics Datorlingvistik
4	Effects of open and directed prompts on filled pauses and utterance production Eklund, Robert, Wirén, Mats January 2010 (has links) This paper describes an experiment where open and directed prompts were alternated when collecting speech data for the deployment of a call-routing application. The experiment tested whether open and directed prompts resulted in any differences with respect to the filled pauses exhibited by the callers, which is interesting in the light of the “many-options” hypothesis of filled pause production. The experiment also investigated the effects of the prompts on utterance form and meaning of the callers. Computational linguistics Datorlingvistik Phonetics Fonetik
5	Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing Tiedemann, Jörg January 2003 (has links) <p>The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.</p><p>Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.</p><p>Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.</p><p>A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).</p><p>Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.</p> Read more Computational linguistics word alignment parallel corpora translation corpora computational lexicography machine translation Datorlingvistik Computational linguistics Datorlingvistik
6	Creation of a customised character recognition application Sandgren, Frida January 2005 (has links) <p>This master’s thesis describes the work in creating a customised optical character recognition (OCR) application; intended for use in digitisation of theses submitted to the Uppsala University in the 18th and 19th centuries. For this purpose, an open source software called Gamera has been used for recognition and classification of the characters in the documents. The software provides specific algorithms for analysis of heritage documents and is designed to be used as a tool for creating domain-specific (i.e. customised) recognition applications.</p><p>By using the Gamera classifier training interface, classifier data was created which reflects the characters in the particular theses. The data can then be used in automatic recognition of ‘new’ characters, by loading it into one of Gamera’s classifiers. The output of Gamera are sets of classified glyphs (i.e. small images of characters), stored in an XML-based format.</p><p>However, as OCR typically involves translation of images of text into a machine-readable format, a complementary OCR-module was needed. For this purpose, an external Gamera module for page segmentation was modified and used.</p><p>In addition, a script for control of the OCR-process was created, which initiates the page segmentation on Gamera classified glyphs. The result is written to text files.</p><p>Finally, in a test for recognition accuracy, one of the theses was used for creation of training data and for test of data. The result from the test show an average accuracy rate of 82% and that there is a need for a better pre-processing module which removes more noise from the images, as well as recognises different character sizes in the images before they are run by the OCR-process.</p> Read more Computational linguistics OCR Digitisation Character recognition Classification Heritage documents Datorlingvistik Computational linguistics Datorlingvistik
7	Classification into Readability Levels : Implementation and Evaluation Larsson, Patrik January 2006 (has links) <p>The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.</p> / <p>Uppsatsen beskriver utvecklandet av en klassificeringsmodell för Svenska texter beroende på dess läsbarhet. Användningsområdet för en läsbaretsklassificeringsmodell är främst inom informationssökningssystem. Modellen kan öka träffsäkerheten på de dokument som anses relevanta av en sökmotor genom att matcha användarens krav på läsbarhet med de indexerade dokumentens läsbarhet. Resultatet av uppsatsen är ett antal modeller för klassificering av text beroende på läsbarhet. Modellerna har tagits fram genom att träna upp en Support Vector Machines klassificerare, på ett antal särdrag som av tidigare forskning har fastslagits vara goda mått på läsbarhet. Särdragen extraherades från en korpus som är annoterad med tre läsbarhetsnivåer. Språkteknologiska verktyg för taggning och parsning användes för att möjliggöra extraktionen av särdragen. Särdragen utvärderades empiriskt i olika särdragskombinationer för att optimera modellerna. Modellerna testades och utvärderades med goda resultat. Den bästa modellen hade en precision på 90,21 och en recall på 89,56, detta ger en F-score som är 89,88. Uppsatsen presenterar förslag på vidareutveckling samt potentiella användningsområden.</p> Read more readability information retrieval search engines Computational linguistics läsbarhet sökmotorer informationssökning maskininlärning språkteknologi datorlingvistik Computational linguistics Datorlingvistik
8	Utveckling av ett svensk-engelskt lexikon inom tåg- och transportdomänen Axelsson, Hans, Blom, Oskar January 2006 (has links) <p>This paper describes the process of building a machine translation lexicon for use in the train and transport domain with the machine translation system MATS. The lexicon will consist of a Swedish part, an English part and links between them and is derived from a Trados</p><p>translation memory which is split into a training(90%) part and a testing(10%) part. The task is carried out mainly by using existing word linking software and recycling previous machine translation lexicons from other domains. In order to do this, a method is developed where focus lies on automation by means of both existing and self developed software, in combination with manual interaction. The domain specific lexicon is then extended with a domain neutral core lexicon and a less domain neutral general lexicon. The different lexicons are automatically and manually evaluated through machine translation on the test corpus. The automatic evaluation of the largest lexicon yielded a NEVA score of 0.255 and a BLEU score of 0.190. The manual evaluation saw 34% of the segments correctly translated, 37%, although not correct, perfectly understandable and 29% difficult to understand.</p> datorlingvistik BLEU NEVA lexikon maskinöversättning länkning Multra lemma språkteknologi Computational linguistics Datorlingvistik
9	Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing Tiedemann, Jörg January 2003 (has links) The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup. Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques. A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb). Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems. Read more Computational linguistics word alignment parallel corpora translation corpora computational lexicography machine translation Datorlingvistik Computational linguistics Datorlingvistik
10	Controlled Languages in Software User Documentation Steensland, Henrik, Dervisevic, Dina January 2005 (has links) <p>In order to facilitate comprehensibility and translation, the language used in software user documentation must be standardized. If the terminology and language rules are standardized and consistent, the time and cost of translation will be reduced. For this reason, controlled languages have been developed. Controlled languages are subsets of other languages, purposely limited by restricting the terminology and grammar that is allowed.</p><p>The purpose and goal of this thesis is to investigate how using a controlled language can improve comprehensibility and translatability of software user documentation written in English. In order to reach our goal, we have performed a case study at IFS AB. We specify a number of research questions that help satisfy some of the goals of IFS and, when generalized, fulfill the goal of this thesis.</p><p>A major result of our case study is a list of sixteen controlled language rules. Some examples of these rules are control of the maximum allowed number of words in a sentence, and control of when the author is allowed to use past participles. We have based our controlled language rules on existing controlled languages, style guides, research reports, and the opinions of technical writers at IFS.</p><p>When we applied these rules to different user documentation texts at IFS, we managed to increase the readability score for each of the texts. Also, during an assessment test of readability and translatability, the rewritten versions were chosen in 85 % of the cases by experienced technical writers at IFS.</p><p>Another result of our case study is a prototype application that shows that it is possible to develop and use a software checker for helping the authors when writing documentation according to our suggested controlled language rules.</p> Read more Controlled Language Readability Translatability Style Guides IFS Computational linguistics Datorlingvistik

Search results