Global ETD Search

11	Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English Ochieng, Dunlop 22 October 2015 (has links) (PDF) Some proverbs, idioms, nominal compounds, and slogans duplicate in form and meaning between several languages. An example of these between German and English is Liebe auf den ersten Blick and “love at first sight” (Flippo, 2009), whereas, an example between Kiswahili and English is uchaguzi ulio huru na haki and “free and fair election.” Duplication of these strings of words between languages that are as different in descent and typology as Kiswahili and English is irregular. On this ground, Kiswahili academies and a number of experts of Kiswahili assumed – prior to the present study – that the Kiswahili versions of the expressions are the derivatives from their English congruent counterparts. The assumption nonetheless lacked empirical evidence and also discounted other potential causes of the phenomenon, i.e. analogical extension, nativism and cognitive metaphoricalization (Makkai, 1972; Land, 1974; Lakoff & Johnson, 1980b; Ruhlen, 1987; Lakoff, 1987; Gleitman and Newport, 1995). Out of this background, we assumed an academic obligation of empirically investigating what causes this formal and semantic duplication of strings of words (multiword expressions) between English and Kiswahili to a degree beyond chance expectations. In this endeavour, we employed checklist to 24, interview to 43, online questionnaire to 102, translation test to 47 and translationality test to 8 respondents. Online questionnaire respondents were from 21 regions of Tanzania, whereas, those of the rest of the tools were from Zanzibar, Dar es Salaam, Pwani, Lindi, Dodoma and Kigoma. Complementarily, we analysed the Chemnitz Corpus of Swahili (CCS), the Helsinki Swahili Corpus (HSC), and the Corpus of Contemporary American English (COCA) for clues on the sources and trends of expressions exhibiting this characteristic between Kiswahili and English. Furthermore, we reviewed the Bible, dictionaries, encyclopaedia, books, articles, expressions lists, wikis, and phrase books in pursuit of etymologies, and histories of concepts underlying the focus expressions. Our analysis shows that most of the Kiswahili versions of the focus expressions are the function of loan translation and rendition from English. We found that economic, political and technological changes, mostly induced by liberalization policy of the 1990s in Tanzania, created lexical gaps in Kiswahili that needed to be filled. We discovered that Kiswahili, among other means, fill such gaps through loan translation and loan rendition of English phrases. Prototypical examples of notions whose English labels Kiswahili has translated word for word are such as “human rights”, “free and fair election”, “the World Cup” and “multiparty democracy”. We can conclude that Kiswahili finds it easier and economical to translate the existing English labels for imported notions rather than innovating original labels for the concepts. Even so, our analysis revealed that a few of the Kiswahili duplicate multiword expressions might be a function of nativism, cognitive metaphoricalization and analogy phenomena. We, for instance, observed that formulation of figurative meanings follow more or less similar pattern across human languages – the secondary meanings deriving from source domains. As long as the source domains are common in many human\'s environment, we found it plausible for certain multiword expressions to spontaneously duplicate between several human languages. Academically, our study has demonstrated how multiword expressions, which duplicate between several languages, can be studied using primary data, corpora, documentary review and observation. In particular, the study has designed a framework for studying sources of the expressions and even terminologies for describing the phenomenon. What\'s more, the study has collected a number of expressions that duplicate between Kiswahili and English languages, which other researchers can use in similar studies. English Kiswahili Duplicate multiword expressions English on Kiswahili Loan expressionsi Indirect loan influence Widespread expressions Anglicism Englishzitation Borrowing of expressions Tanzania borrowings morphosyntax multiword expressions Swahili Englisch Tansania Morphosyntax Lehnwörter Mehrwortausdrücke ddc:820 Englisch Swahili Morphosyntax Tansania
12	Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English Ochieng, Dunlop 04 February 2015 (has links) Some proverbs, idioms, nominal compounds, and slogans duplicate in form and meaning between several languages. An example of these between German and English is Liebe auf den ersten Blick and “love at first sight” (Flippo, 2009), whereas, an example between Kiswahili and English is uchaguzi ulio huru na haki and “free and fair election.” Duplication of these strings of words between languages that are as different in descent and typology as Kiswahili and English is irregular. On this ground, Kiswahili academies and a number of experts of Kiswahili assumed – prior to the present study – that the Kiswahili versions of the expressions are the derivatives from their English congruent counterparts. The assumption nonetheless lacked empirical evidence and also discounted other potential causes of the phenomenon, i.e. analogical extension, nativism and cognitive metaphoricalization (Makkai, 1972; Land, 1974; Lakoff & Johnson, 1980b; Ruhlen, 1987; Lakoff, 1987; Gleitman and Newport, 1995). Out of this background, we assumed an academic obligation of empirically investigating what causes this formal and semantic duplication of strings of words (multiword expressions) between English and Kiswahili to a degree beyond chance expectations. In this endeavour, we employed checklist to 24, interview to 43, online questionnaire to 102, translation test to 47 and translationality test to 8 respondents. Online questionnaire respondents were from 21 regions of Tanzania, whereas, those of the rest of the tools were from Zanzibar, Dar es Salaam, Pwani, Lindi, Dodoma and Kigoma. Complementarily, we analysed the Chemnitz Corpus of Swahili (CCS), the Helsinki Swahili Corpus (HSC), and the Corpus of Contemporary American English (COCA) for clues on the sources and trends of expressions exhibiting this characteristic between Kiswahili and English. Furthermore, we reviewed the Bible, dictionaries, encyclopaedia, books, articles, expressions lists, wikis, and phrase books in pursuit of etymologies, and histories of concepts underlying the focus expressions. Our analysis shows that most of the Kiswahili versions of the focus expressions are the function of loan translation and rendition from English. We found that economic, political and technological changes, mostly induced by liberalization policy of the 1990s in Tanzania, created lexical gaps in Kiswahili that needed to be filled. We discovered that Kiswahili, among other means, fill such gaps through loan translation and loan rendition of English phrases. Prototypical examples of notions whose English labels Kiswahili has translated word for word are such as “human rights”, “free and fair election”, “the World Cup” and “multiparty democracy”. We can conclude that Kiswahili finds it easier and economical to translate the existing English labels for imported notions rather than innovating original labels for the concepts. Even so, our analysis revealed that a few of the Kiswahili duplicate multiword expressions might be a function of nativism, cognitive metaphoricalization and analogy phenomena. We, for instance, observed that formulation of figurative meanings follow more or less similar pattern across human languages – the secondary meanings deriving from source domains. As long as the source domains are common in many human\'s environment, we found it plausible for certain multiword expressions to spontaneously duplicate between several human languages. Academically, our study has demonstrated how multiword expressions, which duplicate between several languages, can be studied using primary data, corpora, documentary review and observation. In particular, the study has designed a framework for studying sources of the expressions and even terminologies for describing the phenomenon. What\'s more, the study has collected a number of expressions that duplicate between Kiswahili and English languages, which other researchers can use in similar studies. info:eu-repo/classification/ddc/820 ddc:820
13	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
14	A generic and open framework for multiword expressions treatment : from acquisition to applications Ramisch, Carlos Eduardo January 2012 (has links) The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work. Linguagem natural Linguística computacional Natural language processing Computational linguistics Multiword expressions Lexical acquisition Machine translation Lexicography Corpus linguistics
15	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
16	A generic and open framework for multiword expressions treatment : from acquisition to applications Ramisch, Carlos Eduardo January 2012 (has links) The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work. Linguagem natural Linguística computacional Natural language processing Computational linguistics Multiword expressions Lexical acquisition Machine translation Lexicography Corpus linguistics
17	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
18	A generic and open framework for multiword expressions treatment : from acquisition to applications Ramisch, Carlos Eduardo January 2012 (has links) The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work. Linguagem natural Linguística computacional Natural language processing Computational linguistics Multiword expressions Lexical acquisition Machine translation Lexicography Corpus linguistics
19	[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINAL MILENA DE UZEDA GARRAO 22 August 2006 (has links) [pt] Muitos estudos recentes sobre a identificação e uso de combinações multivocabulares (CMs) adotam uma perspectiva representacionista do significado da palavra. Este estudo propõe que é muito mais interessante identificar as CMs por um olhar não-representacionista. A metodologia proposta foi testada em CMs do tipo V+SN, um padrão bastante freqüente no português do Brasil (PB). Trata-se de uma análise estatística com base em córpus que pode ser resumida em três etapas: 1) córpus robusto do PB como base de análise, 2) aplicação de um teste estatístico ao córpus, a saber, teste de Logaritmo de Verossimilhança (Banerjee e Pedersen, 2003), para detecção das CMs mais freqüentes com padrão V+SN (como tomar café) e exclusão de co-ocorrências sintáticas aleatórias dos mesmos itens lexicais, 3) aplicação de Medidas de Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre todos os parágrafos contendo uma certa CM (por exemplo, fazer campanha) e todos os parágrafos contendo o substantivo fora da CM (campanha). Esta última etapa foi utilizada para avaliar o grau de composicionalidade da CM. Pôde-se concluir que quanto maior a similaridade entre os parágrafos contendo a CM e os parágrafos contendo o substantivo fora da expressão, maior será o grau de composicionalidade da CM. Por essa razão, este estudo tem um impacto tanto teórico quanto prático para a semântica. / [en] A considerable amount of recent researches on defining multi-word expressions´ (MWE) phenomenon has an underlying representational framework of word meaning. In this study we claim that it is much more interesting to view MWE from a non-representational perspective. By choosing this path, we avoid the time-consuming and controversial human intuitions to MWE identification and definition. Our methodology was tested on Brazilian Portuguese verbal phrases of V+NP pattern. It is a statistically-based corpus analysis which could be summed up as the following three sequent steps: 1) robust linguistic corpora as output, 2) application of a probabilistic test to the corpora, namely Log Likelihood test (Banerjee and Pedersen, 2003), in order to spot the Portuguese MWEs of V+NP pattern (such as tomar café) and disregard casual syntactic and not otherwise motivated co-occurrences of the same lexical items, 3) application of Similarity Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all the paragraphs containing a certain MWE and all the paragraphs containing its separate noun. This latter step is crucial to assess the MWE compositionality level. We conclude that the higher are the similarity measures between the MWE (such as fazer campanha) and its separate noun (campanha), the more compositional will be the MWE. Therefore, we believe that this work has both a practical and a theoretical impact to semantics. [pt] COMBINACOES MULTIVOCABULARES [en] MULTIWORD EXPRESSIONS [pt] COLOCACOES VERBAIS [en] VERBAL COLLOCATIONS [pt] LEXICOGRAFIA DE CORPUS [en] CORPUS LEXICOGRAPHY [pt] SEMANTICA DE CORPUS [en] CORPUS SEMANTICS
20	Víceslovná pojmenování v italštině / Multiword expressions in Italian Jungwirthová, Klára January 2015 (has links) The main topic of this thesis are the multiword expressions in the italian language. The thesis is divided into two parts - the theorical and the empirical part. The theorical part deals with the multiword expressions, the syntagmas and the idiomatic expressions. In the empirical part the connections between the constituents of the multiword expressions will be researched. Than four criteria will be on the multiword expressions applied (head inflection, insertion of the head's modifiers, pronominalisation of the head and dislocation and topicalization of the head). These transformations will be verified with the aid of corpora and questionnaires. Depending on the results of this research will be decided if the multiword expressions resemble the syntagmas or the idiomatic expressions.

Search results