Global ETD Search

91	Surface Realisation from Knowledge Bases / Bases de connaissances et réalisation de surface Gyawali, Bikash 20 January 2016 (has links) La Génération Automatique de Langue Naturelle vise à produire des textes dans une langue humaine à partir d'un ensemble de données non-linguistiques. Elle comprend généralement trois sous-tâches principales: (i) sélection et organisation d'un sous-ensemble des données d'entrée; ii) détermination des mots à utiliser pour verbaliser les données d'entrée; et (iii) regroupement de ces mots en un texte en langue naturelle. La dernière sous-tâche est connue comme la tâche de Réalisation de Surface (RS). Dans ma thèse, j'étudie la tâche de RS quand les données d'entrée sont extraites de Bases de Connaissances (BC). Je présente deux nouvelles approches pour la réalisation de surface à partir de bases de connaissances: une approche supervisée et une approche faiblement supervisée. Dans l'approche supervisée, je présente une méthode basée sur des corpus pour induire une grammaire à partir d'un corpus parallèle de textes et de données. Je montre que la grammaire induite est compacte et suffisamment générale pour traiter les données de test. Dans l'approche faiblement supervisée, j'explore une méthode pour la réalisation de surface à partir de données extraites d'une BC qui ne requière pas de corpus parallèle. À la place, je construis un corpus de textes liés au domaine et l'utilise pour identifier les lexicalisations possibles des symboles de la BC et leurs modes de verbalisation. J'évalue les phrases générées et analyse les questions relatives à l'apprentissage à partir de corpus non-alignés. Dans chacune de ces approches, les méthodes proposées sont génériques et peuvent être facilement adaptées pour une entrée à partir d'autres ontologies / Natural Language Generation is the task of automatically producing natural language text to describe information present in non-linguistic data. It involves three main subtasks: (i) selecting the relevant portion of input data; (ii) determining the words that will be used to verbalise the selected data; and (iii) mapping these words into natural language text. The latter task is known as Surface Realisation (SR). In my thesis, I study the SR task in the context of input data coming from Knowledge Bases (KB). I present two novel approaches to surface realisation from knowledge bases: a supervised approach and a weakly supervised approach. In the first, supervised, approach, I present a corpus-based method for inducing a Feature Based Lexicalized Tree Adjoining Grammar from a parallel corpus of text and data. I show that the induced grammar is compact and generalises well over the test data yielding results that are close to those produced by a handcrafted symbolic approach and which outperform an alternative statistical approach. In the weakly supervised approach, I explore a method for surface realisation from KB data which does not require a parallel corpus. Instead, I build a corpus from heterogeneous sources of domain-related text and use it to identify possible lexicalisations of KB symbols and their verbalisation patterns. I evaluate the output sentences and analyse the issues relevant to learning from non-parallel corpora. In both these approaches, the proposed methods are generic and can be easily adapted for input from other ontologies for which a parallel/non-parallel corpora exists Réalisation de Surface Bases de Connaissances Apprentissage de Grammaire Interface Syntaxe/Sémantique Approches guidées par les Corpus Surface Realisation Knowledge Bases Grammar based Surface Realisation Grammar Learning Syntax/Semantics linking Corpus based approaches 006.35
92	Patterns of growing standardisation and interference in interpreted German discourse Dose, Stephanie 30 November 2010 (has links) This study compares simultaneously interpreted German speech to non-interpreted German discourse in order to determine whether interpreted language is characterised by any of the laws that have been found to feature in translated text, i.e. the law of growing standardisation and the law of interference. It is hypothesised that interpreters typically exaggerate German communicative norms, thereby producing manifestations of growing standardisation. In order to test this hypothesis, comparative and parallel analyses are carried out using corpora of interpreted and non-interpreted discourse. During the comparative phase, two types of interpreted German speech are each compared to non-interpreted language and to each other in order to determine how interpreted speech differs from non-interpreted discourse. During the parallel analysis, the interpreted German segments are compared to their source language counterparts with the aim of determining the reasons for the production of the patterns discovered during the first phase. The results indicate that interpreters do not produce patterns similar to those that characterise translated text: neither the law of growing standardisation nor the law of interference is manifest in the data. Instead, a different feature, namely an increased degree of generalisation, is discovered in the interpreters‟ output. This feature appears to be the result of the use of strategies that enable interpreters to deal with time, memory and linearity constraints inherent in SI. It can hence be confirmed that interpreted German differs from non-interpreted German discourse in certain respects. / Linguistics and Modern Languages / M.A. (Linguistics) Simultaneous interpreting Growing standardisation Communicative norms Corpus-based interpreting studies Three-phase comparative analysis Interference Normalisation 430.141 German language -- Spoken German German language -- Discourse analysis German language -- Translating Interference (Linguistics)
93	\'Eu quero cesárea!\' ou \'Just cut it out!\': análise crítica do discurso de relatos de parto normal após cesárea de mulheres brasileiras e estadunidenses à luz da linguística de corpus / Eu quero cesárea! or Just cut it out!:: A Corpus-based Critical Discourse Analysis of vaginal-birth-after-c-section stories by Brazilian and American women Luciana Carvalho Fonseca 27 November 2014 (has links) No Brasil, a maioria absoluta das primíparas, deseja parto normal logo que engravida, porém, em mais da metade dos casos, os nascimentos são cirúrgicos. O fenômeno da falta de correspondência entre o desejado e o efetivamente alcançado não é exclusividade das mulheres brasileiras, mas ocorre em vários países do ocidente. Por meio da Análise Crítica do Discurso (ACD) de relatos de parto normal após cesárea (relatos de VBAC, do inglês, vaginal birth after c-section) à luz da Linguística de Corpus (LC), buscamos elucidar o problema social entre a falta de correspondência entre o tipo de experiência desejada e a experiência obtida. O discurso dos relatos de VBAC nos parece ser o discurso ideal para desvelar os elementos dessa falta de correspondência, pois abordam tanto a experiência da cesárea anterior indesejada e, em regra, mal indicada, como a do parto desejado e alcançado. O recorte teórico-metodológico adotado reúne a ACD (Fairclough, 1989, 1992; Chouliaraki & Fairclough, 1999; Fairclough, 2003); a LC (Stubbs, 1993, McEnery & Wilson, 1997 e 2003, Tognini-Bonelli, 2001) e a Análise Crítica do Discurso Baseada em Corpus (Baker et al 2008; Baker, 2013; Baker & McEnery, 2005; Flowerdew, 2014). Para o estudo, foi compilado um corpus eletrônico em inglês e português. O corpus é composto por textos escritos pelas mulheres que passaram pela experiência de VBAC e não inclui textos mediados (entrevistas e relatos escritos por terceiros não foram incluídos). O Corpus BRABA (Corpus eletrônico de relatos de parto de mulheres brasileiras, estadunidenses, britânicas e australianas) se divide, respectivamente, em quatro subcorpora: Corpus BRA (93 relatos, 250 807 palavras), Corpus EUA (101 relatos, 225 736 palavras), Corpus UK (97 relatos, 92 197 palavras) e Corpus AU (92 relatos, 200 639 palavras. Os primeiros dois subcorpora Corpus BRA e Corpus EUA foram selecionados para esta pesquisa que pretende investigar como as identidades e a experiência do nascimento são representadas nos relatos de mulheres brasileiras e americanas e por meio dessa investigação chegar a elementos que elucidem o problema social. O processamento eletrônico valeu-se do programa AntConc 3.4.0w (Anthony, 2012) e das ferramentas da LC (listas de frequência, lista de palavras-chave, linhas de concordância, padrões lexicais, etc.). A análise foi direcionada pelas palavras-chave que correspondem aos sujeitos envolvidos e pelos colocados mais estatisticamente relevantes dessas palavras. No Corpus BRA, foram analisadas: eu (colocados: desisto, renasci, mamava); bebê (encaixado, morrer/morresse, sexo, batimentos, alto); marido (companheiro, apoiou, cortou); doula (amada, obstetriz, querida, presença); médico (fofa/fofinha, mudar/mudei, cesarista, ginecologista, humanizada); anestesista; enfermeira (obstétrica/obstetra, cadê, soro, chamar); parteira (liguei/ ligar, doula, casa); obstetriz (doula, toque). No Corpus EUA: I (wish, protested, lamented); baby (pound, girl, boy); midwife (certified, asst/assistant, student, assist); doula (hired, friend, called); nurse (practitioner, tells, triage); doctor (office, seen, comes); anesthesiologist; husband (poor, run, children). A análise permitiu que fosse elucidado o problema social em ambas as sociedades e fossem reveladas diferenças discursivas e culturais. A falta de correspondência entre a experiência desejada e a alcançada é representada como tendo sido causada pela sucessão de eventos distintos. Contudo, em ambos os corpora, as experiências são representadas e a autoidentidade e as identidades construídas discursivamente sob a égide dos traços da modernidade, marcadamente, em relação à reflexividade exercida ideologicamente. Porém, a reflexividade é operada, não só como um modo de sustentar, mas principalmente como forma de transformar as relações de dominação. / In Brazil the vast majority of primiparous women, on discovering that they are pregnant, hope to have normal deliveries. However, in over half of such cases surgical deliveries ensue. This mismatch between what pregnant women desire and what they actually experience is not exclusive to Brazil, but takes place in several Western countries. Through Corpus Linguistics (CL)-based Critical Discourse Analysis (CDA) of vaginal birth after c-section (VBAC) stories we seek to shed light on the social problem of a mismatch between the desired experience and the actual experience. VBAC stories seemed to us the ideal discourse for revealing elements of this mismatch, since they address both the experience of an unwanted (and usually wrongly indicated) prior C-section and that of the desired, and achieved, delivery. The theoretical-methodological approach we have adopted brings together CDA (Fairclough, 1989, 1992; Chouliaraki & Fairclough, 1999; Fairclough, 2003); CL (Stubbs, 1993, McEnery & Wilson, 1997 e 2003, Tognini-Bonelli, 2001), and Corpus-based Critical Discourse Analysis (Baker et al 2008; Baker, 2013; Baker & McEnery, 2005; Flowerdew, 2014). An electronic corpus was compiled in English and Portuguese for this study. The corpus is made up of texts written by women who have experienced VBAC and includes no mediated texts (i.e. interviews and third-party reports). The BRABA Corpus (Corpus of the birth stories of Brazilian, American, British and Australian women) encompasses four subcorpora respectively: Corpus BRA (93 stories, 250,807 words), Corpus USA (101 stories, 225,736 words), Corpus UK (97 stories, 92,197 words), and Corpus AU (92 stories, 200,639 words. The first two of these subcorporaCorpus BRA and Corpus USAwere chosen for this study, which investigates how identities and birth experiences are represented in the accounts of Brazilian and American women, and thus through this investigation uncovers elements that will shed light on the selected social problem. The computer processing used AntConc 3.4.0w (Anthony, 2012) and CL tools (frequency lists, keyword lists, concordance lines, etc.). Analysis was guided by keywords corresponding to the people mentioned in the stories and by the most statistically significant collocates of these keywords. From Corpus BRA the words were: eu (collocates: desisto, renasci, mamava); bebê (encaixado, morrer/morresse, sexo, batimentos, alto); marido (companheiro, apoiou, cortou); doula (amada, obstetriz, querida, presença); médico (fofa/fofinha, mudar/mudei, cesarista, ginecologista, humanizada); anestesista; enfermeira (obstétrica/obstetra, cadê, soro, chamar); parteira (liguei/ligar, doula, casa); obstetriz (doula, toque). From Corpus USA: I (wish, protested, lamented); baby (pound, girl, boy); midwife (certified, asst/assistant, student, assist); doula (hired, friend, called); nurse (practitioner, tells, triage); doctor (office, seen, comes); anesthesiologist; husband (poor, run, children). Analysis enabled this social problem to be laid bare in both societies, revealing discourse and cultural similarities and differences. The mismatch between the desired and the experienced outcomes is represented as having been caused by a succession of discrete events. In both corpora, experiences are represented, and self-identity and other identities are notably constructed in discourse under the aegis of features of modernity, above all, under reflexivity, which, in the discourses of VBAC stories takes place through empowerment, understood as self-actualization through newly gathered knowledge and ensuing courses of action/measures (Giddens, 2002). Eelatos de parto normal após cesárea Identidade Linguística de corpus Modernidade Corpus linguistics Corpus-based critical discourse analysis Identity Modernity
94	A comparative analysis of stylistic devices in Shakespeare’s plays, Julius Caesar and Macbeth and their xitsonga translations Baloyi, Mafemani Joseph 06 1900 (has links) The study adopts a theory of Descriptive Translation Studies to undertake a comparative analysis of stylistic devices in Shakespeare’s two plays, Julius Caesar and Macbeth and their Xitsonga translations. It contextualises its research aim and objectives after outlining a sequential account of theory development in the discipline of translation; and arrives at the desired and suitable tools for data collection and analysis.Through textual observation and notes of reading, the current study argues that researchers and scholars in the discipline converge when it comes to a dire need for translation strategies, but diverge in their classification and particular application for convenience in translating and translation. This study maintains that the translation strategies should be grouped into explicitation, normalisation and simplification, where each is assigned specific translation procedures. The study demonstrates that explicitation and normalisation translation strategies are best suited in dealing with translation constraints at a microtextual level. The sampled excerpts from both plays were examined on the preference for the analytical framework based on subjective sameness within a Skopos theory. The current study acknowledges that there is no single way of translating a play from one culture to the other. It also acknowledges that there appears to be no way the translator can refrain from the influence of the source text, as an inherent cultural feature that makes it unique. With no sure way of managing stylistic devices as translation constraints, translation as a problem-solving process requires creativity, a demonstration of mastery of language and style of the author of the source text, as well as a power drive characterised by the aspects of interlingual psychological balance of power and knowledge power. These aspects will help the translator to manage whatever translation brief(s) better, and arrive at a product that is accessible, accurate and acceptable to the target readership. They will also ensure that the translator maintains a balance between the two languages in contact, in order to guard against domination of one language over the other. The current study concludes that the Skopos theory has a larger influence in dealing with anticipating the context of the target readership as a factor that can introduce high risk when assessing the communicability conditions for the translated message. Contrariwise, when dealing with stylistic devices and employ literal translation as a translation procedure to simplification, the translator only aims at simplifying the language and making it accessible for the sake of ‘accessibility’ as it remains a product with communicative inadequacies. The study also concludes by maintaining that translation is not only transcoding, but the activity that calls for the translator’s creativity in order to identify and analyse the constraints encountered and decide on the corresponding translation strategies. / African Languages / D. Litt. et Phil. (African Languages) Translation Stylistic devices Comparative analysis Equivalence-based translation studies Corpus-based translation studies Descriptive Translation Studies Explicitation Normalisation and Simplification 822.33 Shakespeare, William, 1564-1616. Macbeth Style, Literary Translating and interpreting
95	Desarrollo y evaluación de diferentes metodologías para la gestión automática del diálogo Griol Barres, David 07 May 2008 (has links) El objetivo principal de la tesis que se presenta es el estudio y desarrollo de diferentes metodologías para la gestión del diálogo en sistemas de diálogo hablado. El principal reto planteado en la tesis reside en el desarrollo de metodologías puramente estadísticas para la gestión del diálogo, basadas en el aprendizaje de un modelo a partir de un corpus de diálogos etiquetados. En este campo, se presentan diferentes aproximaciones para realizar la gestión, la mejora del modelo estadístico y la evaluación del sistema del diálogo. Para la implementación práctica de estas metodologías, en el ámbito de una tarea específica, ha sido necesaria la adquisición y etiquetado de un corpus de diálogos. El hecho de disponer de un gran corpus de diálogos ha facilitado el aprendizaje y evaluación del modelo de gestión desarrollado. Así mismo, se ha implementado un sistema de diálogo completo, que permite evaluar el funcionamiento práctico de las metodologías de gestión en condiciones reales de uso. Para evaluar las técnicas de gestión del diálogo se proponen diferentes aproximaciones: la evaluación mediante usuarios reales; la evaluación con el corpus adquirido, en el cual se han definido unas particiones de entrenamiento y prueba; y la utilización de técnicas de simulación de usuarios. El simulador de usuario desarrollado permite modelizar de forma estadística el proceso completo del diálogo. En la aproximación que se presenta, tanto la obtención de la respuesta del sistema como la generación del turno de usuario se modelizan como un problema de clasificación, para el que se codifica como entrada un conjunto de variables que representan el estado actual del diálogo y como resultado de la clasificación se obtienen las probabilidades de seleccionar cada una de las respuestas (secuencia de actos de diálogo) definidas respectivamente para el usuario y el sistema. / Griol Barres, D. (2007). Desarrollo y evaluación de diferentes metodologías para la gestión automática del diálogo [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1956 Sistemas de diálogo Gestión de diálogo Métodos estadísticos Metodologías basadas en corpus Aprendizaje automático Simulación de usuarios Técnicas de evaluación Spoken dialog systems Dialog management Statistical modelling Corpus-based methodologies User simulation LENGUAJES Y SISTEMAS INFORMATICOS 120317 - Informática 1203 - Ciencia de los ordenadores
96	The Perception of Lexical Similarities Between L2 English and L3 Swedish Utgof, Darja January 2008 (has links) <p>The present study investigates lexical similarity perceptions by students of Swedish as a foreign language (L3) with a good yet non-native proficiency in English (L2). The general theoretical framework is provided by studies in transfer of learning and its specific instance, transfer in language acquisition.</p><p>It is accepted as true that all previous linguistic knowledge is facilitative in developing proficiency in a new language. However, a frequently reported phenomenon is that students see similarities between two systems in a different way than linguists and theoreticians of education do. As a consequence, the full facilitative potential of transfer remains unused.</p><p>The present research seeks to shed light on the similarity perceptions with the focus on the comprehension of a written text. In order to elucidate students’ views, a form involving similarity judgements and multiple choice questions for formally similar items has been designed, drawing on real language use as provided by corpora. 123 forms have been distributed in 6 groups of international students, 4 of them studying Swedish at Level I and 2 studying at Level II. </p><p>The test items in the form vary in the degree of formal, semantic and functional similarity from very close cognates, to similar words belonging to different word classes, to items exhibiting category membership and/or being in subordinate/superordinate relation to each other, to deceptive cognates. The author proposes expected similarity ratings and compares them to the results obtained. The objective measure of formal similarity is provided by a string matching algorithm, Levenshtein distance.</p><p>The similarity judgements point at the fact that intermediate similarity values can be considered problematic. Similarity ratings between somewhat similar items are usually lower than could be expected. Besides, difference in grammatical meaning lowers similarity values significantly even if lexical meaning nearly coincides. Thus, the obtained results indicate that in order to utilize similarities to facilitate language learning, more attention should be paid to underlying similarities.</p> master master's programme language culture general linguistics linguistics foreign language acquisition similarity formal similarity semantic similarity functional similarity transfer transfer of learning language acquisition form form-based research crosslinguistic infuence one-year master english swedish lexical similarities false friends cognates deceptive cognates origin words of the same origin proto-germanic proto-indo-european competence performance comprehension corpus corpus-based data corpus-based research levenshtein distance objective similarity perceived similarity frequency prototype prototypicality proficiency context informed guess simchecker word recognition tachitoscopic experiments semantic correspondence similarity judgements eurocom surface transfer deep transfer lateral transfer English language Engelska språket Linguistic subjects Lingvistikämnen
97	The Perception of Lexical Similarities Between L2 English and L3 Swedish Utgof, Darja January 2008 (has links) The present study investigates lexical similarity perceptions by students of Swedish as a foreign language (L3) with a good yet non-native proficiency in English (L2). The general theoretical framework is provided by studies in transfer of learning and its specific instance, transfer in language acquisition. It is accepted as true that all previous linguistic knowledge is facilitative in developing proficiency in a new language. However, a frequently reported phenomenon is that students see similarities between two systems in a different way than linguists and theoreticians of education do. As a consequence, the full facilitative potential of transfer remains unused. The present research seeks to shed light on the similarity perceptions with the focus on the comprehension of a written text. In order to elucidate students’ views, a form involving similarity judgements and multiple choice questions for formally similar items has been designed, drawing on real language use as provided by corpora. 123 forms have been distributed in 6 groups of international students, 4 of them studying Swedish at Level I and 2 studying at Level II. The test items in the form vary in the degree of formal, semantic and functional similarity from very close cognates, to similar words belonging to different word classes, to items exhibiting category membership and/or being in subordinate/superordinate relation to each other, to deceptive cognates. The author proposes expected similarity ratings and compares them to the results obtained. The objective measure of formal similarity is provided by a string matching algorithm, Levenshtein distance. The similarity judgements point at the fact that intermediate similarity values can be considered problematic. Similarity ratings between somewhat similar items are usually lower than could be expected. Besides, difference in grammatical meaning lowers similarity values significantly even if lexical meaning nearly coincides. Thus, the obtained results indicate that in order to utilize similarities to facilitate language learning, more attention should be paid to underlying similarities. master master's programme language culture general linguistics linguistics foreign language acquisition similarity formal similarity semantic similarity functional similarity transfer transfer of learning language acquisition form form-based research crosslinguistic infuence one-year master english swedish lexical similarities false friends cognates deceptive cognates origin words of the same origin proto-germanic proto-indo-european competence performance comprehension corpus corpus-based data corpus-based research levenshtein distance objective similarity perceived similarity frequency prototype prototypicality proficiency context informed guess simchecker word recognition tachitoscopic experiments semantic correspondence similarity judgements eurocom surface transfer deep transfer lateral transfer Specific Languages Studier av enskilda språk General Language Studies and Linguistics
98	Lexical cohesion register variation in transition : "The merchants of Venice" in afrikaans Kruger, Alet 03 1900 (has links) On the assumption that different registers of translated drama have different functions and that they therefore present information differently, the aim of the present study is to identify textual features that distinguish an Afrikaans stage translation from a page translation of Shakespeare's The Merchant of Venice. The first issue addressed concerns the nature and extent of lexical cohesion in these two registers. The second issue concerns my contention that the dialogue of a stage translation is more "involved". (Biber 1988) than that of a page translation. The research was conducted within the overall Descriptive Translation Studies (DTS) paradigm but the analytical frameworks by means of which these aims were accomplished were derived from text linguistics and register variation studies, making this an interdisciplinary study. Aspects of Hoey's ( 1991) bonding model, in particular, the classification of repetition links, were adapted so as to quantify lexical cohesion in the translations. Similarly, aspects of Biber's (1988) multi-dimensional approach to register variation were used to quantify linguistic features that signal involvement. The main finding of the study is that drama translation register (page or stage translation) does have a constraining effect on lexical cohesion and involved production. For Act IV of the play an overall higher density of lexical cohesion strategies was generated by the stage translation. In the case of the involved production features analysed, the overall finding was that the stage translation displayed more involvement than the page translation, to a statistically highly significant extent. The features analysed here cluster together sufficiently to reveal that in comparison with an Afrikaans page translation of a Shakespeare play, a recent stage translation displays a definite tendency towards a more oral, more involved and more situated style, reflecting no doubt a general modern trend towards creating more appropriate and accessible texts / Linguistics / D. Litt. et Phil. (Translation Studies) Drama translation Lexical cohesion Lexical repetition Corpus linguistics Corpus translation studies Stage translation Discriptive translation studies Page translation Involved production Corpus-based translation research Involvement 418.02 Translating and interpreting Drama -- Translating Discourse analysis Register (Linguistics) Context (Linguistics) Cohesion (Linguistics) Computational linguistics
99	Measuring Semantic Distance using Distributional Profiles of Concepts Mohammad, Saif 01 August 2008 (has links) Semantic distance is a measure of how close or distant in meaning two units of language are. A large number of important natural language problems, including machine translation and word sense disambiguation, can be viewed as semantic distance problems. The two dominant approaches to estimating semantic distance are the WordNet-based semantic measures and the corpus-based distributional measures. In this thesis, I compare them, both qualitatively and quantitatively, and identify the limitations of each. This thesis argues that estimating semantic distance is essentially a property of concepts (rather than words) and that two concepts are semantically close if they occur in similar contexts. Instead of identifying the co-occurrence (distributional) profiles of words (distributional hypothesis), I argue that distributional profiles of concepts (DPCs) can be used to infer the semantic properties of concepts and indeed to estimate semantic distance more accurately. I propose a new hybrid approach to calculating semantic distance that combines corpus statistics and a published thesaurus (Macquarie Thesaurus). The algorithm determines estimates of the DPCs using the categories in the thesaurus as very coarse concepts and, notably, without requiring any sense-annotated data. Even though the use of only about 1000 concepts to represent the vocabulary of a language seems drastic, I show that the method achieves results better than the state-of-the-art in a number of natural language tasks. I show how cross-lingual DPCs can be created by combining text in one language with a thesaurus from another. Using these cross-lingual DPCs, we can solve problems in one, possibly resource-poor, language using a knowledge source from another, possibly resource-rich, language. I show that the approach is also useful in tasks that inherently involve two or more languages, such as machine translation and multilingual text summarization. The proposed approach is computationally inexpensive, it can estimate both semantic relatedness and semantic similarity, and it can be applied to all parts of speech. Extensive experiments on ranking word pairs as per semantic distance, real-word spelling correction, solving Reader's Digest word choice problems, determining word sense dominance, word sense disambiguation, and word translation show that the new approach is markedly superior to previous ones. Computational Linguistics Natural Language Processing Lexical semantics semantic distance distributional similarity semantic similarity semantic relatedness word concept co-occurrence matrix distributional profiles of concepts thesaurus corpus-based techniques word senses cross-lingual techniques word sense dominance word sense disambiguation wordnet 0984 0800
100	Exploring the use of parallel corpora in the complilation of specialised bilingual dictionaries of technical terms: a case study of English and isiXhosa Shoba, Feziwe Martha 07 1900 (has links) Text in English / Abstracts in English, isiXhosa and Afrikaans / The Constitution of the Republic of South Africa, Act 108 of 1996, mandates the state to take practical and positive measures to elevate the status and the use of indigenous languages. The implementation of this pronouncement resulted in a growing demand for specialised translations in fields like technology, science, commerce, law and finance. The lack of terminology and resources such as specialised bilingual dictionaries in indigenous languages, particularly isiXhosa remains a growing concern that hinders the translation and the intellectualisation of isiXhosa. A growing number of African scholars affirm the importance of specialised dictionaries in the African languages as tools for language and terminology development so that African languages can be used in the areas of science and technology. In the light of the background above, this study explored how parallel corpora can be interrogated using a bilingual concordancer, ParaConc to extract bilingual terminology that can be used to create specialised bilingual dictionaries. A corpus-based approach was selected due to its speed, efficiency and accuracy in extracting bilingual terms in their immediate contexts. In enhancing the research outcomes, Descriptive Translations Studies (DTS) and Corpus-based translation studies (CTS) were used in a complementary manner. Because the study is interdisciplinary, the function theories of lexicography that emphasise the function and needs of users were also applied. The analysis and extraction of bilingual terminology for dictionary making was successful through the use of the following ParaConc features, namely frequencies, hot word lists, hot words, search facility and concordances (Key Word in Context), among others. The findings revealed that English-isiXhosa Parallel Corpus is a repository of translation equivalents and other information categories that can make specialised dictionaries more user-friendly and multifunctional. The frequency lists were revealed as an effective method of selecting headwords for inclusion in a dictionary. The results also unraveled the complex functions of bilingual concordances where information on collocations and multiword units, sense distinction and usage examples could be easily identifiable proving that this approach is more efficient than the traditional method. The study contributes to the knowledge on corpus-based lexicography, standardisation of finance terminology resource development and making of user-friendly dictionaries that are tailor-made for different needs of users. / Umgaqo-siseko weli loMzantsi Afrika ukhululele uRhulumente ukuba athabathe amanyathelo abonakalayo ekuphuhliseni nasekuphuculeni iilwimi zesiNtu. Esi sindululo sibangele ukwanda kokuguqulelwa kwamaxwebhu angezobuchwepheshe, inzululwazi, umthetho, ezemali noqoqosho angesiNgesi eguqulelwa kwiilwimi ebezifudula zingasiwe-so ezinjengesiXhosa. Ukunqongophala kwesigama kunye nezichazi-magama kube yingxaki enkulu ekuguquleleni ngakumbi izichazi-magama ezilwimi-mbini eziqulethe isigama esikhethekileyo. Iingcali ezininzi ziyangqinelana ukuba olu hlobo lwezi zichazi-magama luyimfuneko kuba ludlala iindima enkulu ekuphuhlisweni kweelwimi zesiNtu, ekuyileni isigama, nasekusetyenzisweni kwazo kumabakala obunzululwazi nobuchwepheshe. Olu phando ke luvavanya ukusetyenziswa kwekhophasi equlethe amaxwebhu esiNgesi neenguqulelo zawo zesiXhosa njengovimba wokudimbaza isigama sezemali esinokunceda ekuqulunqweni kwesichazi-magama esilwimi-mbini. Isizathu esibangele ukukhetha le ndlela yophando esebenzisa ikhompyutha kukuba iyakhawuleza, ulwazi oluthathwe kwikhophasi luchanekile, yaye isigama kwikhophasi singqamana ngqo nomxholo wamaxwebhu nto leyo eyenza kube lula ukufumana iintsingiselo nemizekelo ephilayo. Ukutyebisa olu phando indlela yekhophasi iye yaxhaswa zezinye iindlela zophando ezityunjiweyo: ufundo lwenguguqulelo oluchazayo (DTS) kunye neendlela zokuguqulela ezijoliswe kumsebenzi nakuhlobo lwabasebenzisi zinguqulelo ezo. Kanti ke ziqwalaselwe neenkqubo zophando lobhalo-zichazi-magama eziinjongo zokuqulunqa izichazi-magama ezesebenzisekayo neziluncedo kuninzi lwabasebenzisi zichazi-magama ngakumbi kwisizwe esisebenzisa iilwimi ezininzi. Ukuhlalutya nokudimbaza isigama kwikhophasi kolu phando kusetyenziswe isixhobo sekhompyutha esilungiselelwe ikhophasi enelwiimi ezimbini nangaphezulu ebizwa ngokuba yiParaConc. Iziphumo zolu phando zibonise mhlophe ukuba ikhophasi eneenguqulelo nguvimba weendidi ngendidi zamagama nolwazi olunokuphucula izichazi-magama zeli xesha. Kaloku abaguquleli basebenzise amaqhinga ngamaqhinga ukunika iinguqulelo bekhokelwa yimigomo nemithetho yoguqulelo enxuse abasebenzisi bamaxwebhu aguqulelweyo. Ubuchule beParaConc bokukwazi ukuhlela amagama ngokwendlela afumaneka ngayo kunye neenkcukacha zamanani budandalazise indlela eyiyo yokukhetha imichazwa enokungena kwisichazi-magama. Iziphumo zikwabonakalise iintlaninge yolwazi olufumaneka kwiKWIC, lwazi olo olungelula ukulufumana xa usebenzisa undlela-ndala wokwakha isichazi-magama. Esi sifundo esihlanganyele uGuqulelo olusekelwe kwiKhophasi noQulunqo-zichazi-magama zobuchwepheshe luya kuba negalelo elingathethekiyo kwindlela yokwakha izichazi-magama kwilwiimi zeSintu ngokubanzi nancakasana kwisiXhosa, nto leyo eya kothula umthwalo kubaqulunqi-zichazi-magama. Ukwakha nokuqulunqa izichazi-magama ezilwimi-mbini zezemali kuya kwandisa imithombo yesigama esinqongopheleyo kananjalo sivelise izichazi-magama eziluncedo kwisininzi sabantu. / Die Grondwet van die Republiek van Suid-Afrika, Wet 108 van 1996, gee aan die staat die mandaat om praktiese en positiewe maatreëls te tref om die status en gebruik van inheemse tale te verhoog. Die implementering van hierdie uitspraak het gelei tot ’n toenemende vraag na gespesialiseerde vertalings in domeine soos tegnologie, wetenskap, handel, regte en finansies. Die gebrek aan terminologie en hulpbronne soos gespesialiseerde woordeboeke in inheemse tale, veral Xhosa, wek toenemende kommer wat die vertaling en die intellektualisering van Xhosa belemmer. ’n Toenemende aantal vakkundiges in Afrika beklemtoon die belangrikheid van gespesialiseerde woordeboeke in die Afrikatale as instrumente vir taal- en terminologie-ontwikkeling sodat Afrikatale gebruik kan word in die areas van wetenskap en tegnologie. In die lig van die voorafgaande agtergrond het hierdie studie ondersoek ingestel na hoe parallelle korpora deursoek kan word deur ’n tweetalige konkordanser (ParaConc) te gebruik om tweetalige terminologie te ontgin wat gebruik kan word in die onwikkeling van tweetalige gespesialiseerde woordeboeke. ’n Korpusgebaseerde benadering is gekies vir die spoed, doeltreffendheid en akkuraatheid waarmee dit tweetalige terme uit hulle onmiddellike kontekste kan onttrek. Beskrywende Vertaalstudies (DTS) en Korpusgebaseerde Vertaalstudies (CTS) is op ’n aanvullende wyse gebruik om die navorsingsuitkomste te verbeter. Aangesien die studie interdissiplinêr is, is die funksieteorieë van leksikografie wat die funksie en behoeftes van gebruikers beklemtoon, ook toegepas. Die analise en ontginning van tweetalige terminologie om woordeboeke te ontwikkel was suksesvol deur, onder andere, gebruik te maak van die volgende ParaConc-eienskappe, naamlik, frekwensies, hotword-lyste, hot words, die soekfunksie en konkordansies (Sleutelwoord-in-Konteks). Die bevindings toon dat ’n Engels-Xhosa Parallelle Korpus ’n bron van vertaalekwivalente en ander inligtingskategorieë is wat gespesialiseerde woordeboeke meer gebruikersvriendelik en multifunksioneel kan maak. Die frekwensielyste is geïdentifiseer as ’n doeltreffende metode om hoofwoorde te selekteer wat opgeneem kan word in ’n woordeboek. Die bevindings het ook die komplekse funksies van tweetalige konkordansers ontknoop waar inligting oor kollokasies en veelvuldigewoord-eenhede, betekenisonderskeiding en gebruiksvoorbeelde maklik identifiseer kon word wat aandui dat hierdie metode viii doeltreffender is as die tradisionele metode. Die studie dra by tot die kennisveld van korpusgebaseerde leksikografie, standaardisering van finansiële terminologie, hulpbronontwikkeling en die ontwikkeling van gebruikersvriendelike woordeboeke wat doelgemaak is vir verskillende behoeftes van gebruikers. / Linguistics and Modern Languages / D. Litt. et Phil. (Linguistics (Translation Studies)) Parallel corpora Frequency Concordances Specialised bilingual dictionaries Corpus-based translation studies Descriptive translation studies Function theory of lexicography Parallelle korpora Frekwensie Konkordansers Gespesialiseerde tweetalige woordeboeke Korpusgebaseerde vertaalstudies Beskrywende vertaalstudies Funksieteorie van leksikografie 423.963985 Technology -- Terminology

Search results