Global ETD Search

1	Discourse in Statistical Machine Translation Hardmeier, Christian January 2014 (has links) This thesis addresses the technical and linguistic aspects of discourse-level processing in phrase-based statistical machine translation (SMT). Connected texts can have complex text-level linguistic dependencies across sentences that must be preserved in translation. However, the models and algorithms of SMT are pervaded by locality assumptions. In a standard SMT setup, no model has more complex dependencies than an n-gram model. The popular stack decoding algorithm exploits this fact to implement efficient search with a dynamic programming technique. This is a serious technical obstacle to discourse-level modelling in SMT. From a technical viewpoint, the main contribution of our work is the development of a document-level decoder based on stochastic local search that translates a complete document as a single unit. The decoder starts with an initial translation of the document, created randomly or by running a stack decoder, and refines it with a sequence of elementary operations. After each step, the current translation is scored by a set of feature models with access to the full document context and its translation. We demonstrate the viability of this decoding approach for different document-level models. From a linguistic viewpoint, we focus on the problem of translating pronominal anaphora. After investigating the properties and challenges of the pronoun translation task both theoretically and by studying corpus data, a neural network model for cross-lingual pronoun prediction is presented. This network jointly performs anaphora resolution and pronoun prediction and is trained on bilingual corpus data only, with no need for manual coreference annotations. The network is then integrated as a feature model in the document-level SMT decoder and tested in an English–French SMT system. We show that the pronoun prediction network model more adequately represents discourse-level dependencies for less frequent pronouns than a simpler maximum entropy baseline with separate coreference resolution. By creating a framework for experimenting with discourse-level features in SMT, this work contributes to a long-term perspective that strives for more thorough modelling of complex linguistic phenomena in translation. Our results on pronoun translation shed new light on a challenging, but essential problem in machine translation that is as yet unsolved. Statistical machine translation Discourse-level machine translation Document decoding Local search Pronominal anaphora Pronoun translation Neural networks
2	Linguistic sexism : A study of sexist language in a British online newspaper Demberg, Rebecca January 2014 (has links) The aim of this study is to investigate the occurrence of sexist language-use in the British online newspaper The Daily Mail. The material consists of 162 articles that were analysed by using feminist stylistics. The scope of the study was limited to selected features from feminist stylistics at word- and discourse-level. The features of linguistic sexism analysed were the use of gendered generic words, naming of females and males and how female and male characters are described. The gender of the journalists was also analysed to examine if it affected the language-use in terms of sexism. The results show that linguistic sexism is expressed to some extent at both word-level and discourse-level. At word-level linguistic sexism is expressed inthe generic use of some masculine words, the difference of how first name and surname are used to refer to women and men and in the use of titles. At the level of discourse linguistic sexism is expressed in the difference of how women and men are referred to in terms of their relationship to others and in terms of appearance. The gender of the journalist did not show any significance for the language-use in terms of sexism. Considering the limited material of the study, the results might not be suitable for generalisations. The results are nonetheless interesting and it can be concluded that the toolkit of feminist stylistic is relevant to this day and that linguistic sexism exists to some extent in the online version of The Daily Mail. describing characters discourse level feminist stylistics generic words newspaper language sexism The Daily Mail titles word level General Language Studies and Linguistics

Search results

Discourse in Statistical Machine Translation

Linguistic sexism : A study of sexist language in a British online newspaper