Spelling suggestions: "subject:"multiwall"" "subject:"multiway""
11 |
Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and EnglishOchieng, Dunlop 22 October 2015 (has links) (PDF)
Some proverbs, idioms, nominal compounds, and slogans duplicate in form and meaning between several languages. An example of these between German and English is Liebe auf den ersten Blick and “love at first sight” (Flippo, 2009), whereas, an example between Kiswahili and English is uchaguzi ulio huru na haki and “free and fair election.” Duplication of these strings of words between languages that are as different in descent and typology as Kiswahili and English is irregular. On this ground, Kiswahili academies and a number of experts of Kiswahili assumed – prior to the present study – that the Kiswahili versions of the expressions are the derivatives from their English congruent counterparts. The assumption nonetheless lacked empirical evidence and also discounted other potential causes of the phenomenon, i.e. analogical extension, nativism and cognitive metaphoricalization (Makkai, 1972; Land, 1974; Lakoff & Johnson, 1980b; Ruhlen, 1987; Lakoff, 1987; Gleitman and Newport, 1995). Out of this background, we assumed an academic obligation of empirically investigating what causes this formal and semantic duplication of strings of words (multiword expressions) between English and Kiswahili to a degree beyond chance expectations.
In this endeavour, we employed checklist to 24, interview to 43, online questionnaire to 102, translation test to 47 and translationality test to 8 respondents. Online questionnaire respondents were from 21 regions of Tanzania, whereas, those of the rest of the tools were from Zanzibar, Dar es Salaam, Pwani, Lindi, Dodoma and Kigoma. Complementarily, we analysed the Chemnitz Corpus of Swahili (CCS), the Helsinki Swahili Corpus (HSC), and the Corpus of Contemporary American English (COCA) for clues on the sources and trends of expressions exhibiting this characteristic between Kiswahili and English. Furthermore, we reviewed the Bible, dictionaries, encyclopaedia, books, articles, expressions lists, wikis, and phrase books in pursuit of etymologies, and histories of concepts underlying the focus expressions.
Our analysis shows that most of the Kiswahili versions of the focus expressions are the function of loan translation and rendition from English. We found that economic, political and technological changes, mostly induced by liberalization policy of the 1990s in Tanzania, created lexical gaps in Kiswahili that needed to be filled. We discovered that Kiswahili, among other means, fill such gaps through loan translation and loan rendition of English phrases. Prototypical examples of notions whose English labels Kiswahili has translated word for word are such as “human rights”, “free and fair election”, “the World Cup” and “multiparty democracy”. We can conclude that Kiswahili finds it easier and economical to translate the existing English labels for imported notions rather than innovating original labels for the concepts.
Even so, our analysis revealed that a few of the Kiswahili duplicate multiword expressions might be a function of nativism, cognitive metaphoricalization and analogy phenomena. We, for instance, observed that formulation of figurative meanings follow more or less similar pattern across human languages – the secondary meanings deriving from source domains. As long as the source domains are common in many human\'s environment, we found it plausible for certain multiword expressions to spontaneously duplicate between several human languages.
Academically, our study has demonstrated how multiword expressions, which duplicate between several languages, can be studied using primary data, corpora, documentary review and observation. In particular, the study has designed a framework for studying sources of the expressions and even terminologies for describing the phenomenon. What\'s more, the study has collected a number of expressions that duplicate between Kiswahili and English languages, which other researchers can use in similar studies.
|
12 |
Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and EnglishOchieng, Dunlop 04 February 2015 (has links)
Some proverbs, idioms, nominal compounds, and slogans duplicate in form and meaning between several languages. An example of these between German and English is Liebe auf den ersten Blick and “love at first sight” (Flippo, 2009), whereas, an example between Kiswahili and English is uchaguzi ulio huru na haki and “free and fair election.” Duplication of these strings of words between languages that are as different in descent and typology as Kiswahili and English is irregular. On this ground, Kiswahili academies and a number of experts of Kiswahili assumed – prior to the present study – that the Kiswahili versions of the expressions are the derivatives from their English congruent counterparts. The assumption nonetheless lacked empirical evidence and also discounted other potential causes of the phenomenon, i.e. analogical extension, nativism and cognitive metaphoricalization (Makkai, 1972; Land, 1974; Lakoff & Johnson, 1980b; Ruhlen, 1987; Lakoff, 1987; Gleitman and Newport, 1995). Out of this background, we assumed an academic obligation of empirically investigating what causes this formal and semantic duplication of strings of words (multiword expressions) between English and Kiswahili to a degree beyond chance expectations.
In this endeavour, we employed checklist to 24, interview to 43, online questionnaire to 102, translation test to 47 and translationality test to 8 respondents. Online questionnaire respondents were from 21 regions of Tanzania, whereas, those of the rest of the tools were from Zanzibar, Dar es Salaam, Pwani, Lindi, Dodoma and Kigoma. Complementarily, we analysed the Chemnitz Corpus of Swahili (CCS), the Helsinki Swahili Corpus (HSC), and the Corpus of Contemporary American English (COCA) for clues on the sources and trends of expressions exhibiting this characteristic between Kiswahili and English. Furthermore, we reviewed the Bible, dictionaries, encyclopaedia, books, articles, expressions lists, wikis, and phrase books in pursuit of etymologies, and histories of concepts underlying the focus expressions.
Our analysis shows that most of the Kiswahili versions of the focus expressions are the function of loan translation and rendition from English. We found that economic, political and technological changes, mostly induced by liberalization policy of the 1990s in Tanzania, created lexical gaps in Kiswahili that needed to be filled. We discovered that Kiswahili, among other means, fill such gaps through loan translation and loan rendition of English phrases. Prototypical examples of notions whose English labels Kiswahili has translated word for word are such as “human rights”, “free and fair election”, “the World Cup” and “multiparty democracy”. We can conclude that Kiswahili finds it easier and economical to translate the existing English labels for imported notions rather than innovating original labels for the concepts.
Even so, our analysis revealed that a few of the Kiswahili duplicate multiword expressions might be a function of nativism, cognitive metaphoricalization and analogy phenomena. We, for instance, observed that formulation of figurative meanings follow more or less similar pattern across human languages – the secondary meanings deriving from source domains. As long as the source domains are common in many human\'s environment, we found it plausible for certain multiword expressions to spontaneously duplicate between several human languages.
Academically, our study has demonstrated how multiword expressions, which duplicate between several languages, can be studied using primary data, corpora, documentary review and observation. In particular, the study has designed a framework for studying sources of the expressions and even terminologies for describing the phenomenon. What\'s more, the study has collected a number of expressions that duplicate between Kiswahili and English languages, which other researchers can use in similar studies.
|
13 |
Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignmentSchreiner, Paulo January 2010 (has links)
O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work.
|
14 |
A generic and open framework for multiword expressions treatment : from acquisition to applicationsRamisch, Carlos Eduardo January 2012 (has links)
The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work.
|
15 |
Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignmentSchreiner, Paulo January 2010 (has links)
O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work.
|
16 |
A generic and open framework for multiword expressions treatment : from acquisition to applicationsRamisch, Carlos Eduardo January 2012 (has links)
The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work.
|
17 |
Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignmentSchreiner, Paulo January 2010 (has links)
O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work.
|
18 |
A generic and open framework for multiword expressions treatment : from acquisition to applicationsRamisch, Carlos Eduardo January 2012 (has links)
The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work.
|
19 |
[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINALMILENA DE UZEDA GARRAO 22 August 2006 (has links)
[pt] Muitos estudos recentes sobre a identificação e uso de
combinações
multivocabulares (CMs) adotam uma perspectiva
representacionista do
significado da palavra. Este estudo propõe que é muito
mais interessante
identificar as CMs por um olhar não-representacionista. A
metodologia proposta
foi testada em CMs do tipo V+SN, um padrão bastante
freqüente no português do
Brasil (PB). Trata-se de uma análise estatística com base
em córpus que pode ser
resumida em três etapas: 1) córpus robusto do PB como base
de análise, 2)
aplicação de um teste estatístico ao córpus, a saber,
teste de Logaritmo de
Verossimilhança (Banerjee e Pedersen, 2003), para detecção
das CMs mais
freqüentes com padrão V+SN (como tomar café) e exclusão de
co-ocorrências
sintáticas aleatórias dos mesmos itens lexicais, 3)
aplicação de Medidas de
Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre
todos os parágrafos
contendo uma certa CM (por exemplo, fazer campanha) e
todos os parágrafos
contendo o substantivo fora da CM (campanha). Esta última
etapa foi utilizada
para avaliar o grau de composicionalidade da CM. Pôde-se
concluir que quanto
maior a similaridade entre os parágrafos contendo a CM e
os parágrafos contendo
o substantivo fora da expressão, maior será o grau de
composicionalidade da CM.
Por essa razão, este estudo tem um impacto tanto teórico
quanto prático para a
semântica. / [en] A considerable amount of recent researches on defining
multi-word
expressions´ (MWE) phenomenon has an underlying
representational framework
of word meaning. In this study we claim that it is much
more interesting to view
MWE from a non-representational perspective. By choosing
this path, we avoid
the time-consuming and controversial human intuitions to
MWE identification
and definition. Our methodology was tested on Brazilian
Portuguese verbal
phrases of V+NP pattern. It is a statistically-based
corpus analysis which could be
summed up as the following three sequent steps: 1) robust
linguistic corpora as
output, 2) application of a probabilistic test to the
corpora, namely Log Likelihood
test (Banerjee and Pedersen, 2003), in order to spot the
Portuguese MWEs of V+NP
pattern (such as tomar café) and disregard casual
syntactic and not otherwise
motivated co-occurrences of the same lexical items, 3)
application of Similarity
Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all
the paragraphs
containing a certain MWE and all the paragraphs containing
its separate noun.
This latter step is crucial to assess the MWE
compositionality level. We conclude
that the higher are the similarity measures between the
MWE (such as fazer
campanha) and its separate noun (campanha), the more
compositional will be the
MWE. Therefore, we believe that this work has both a
practical and a theoretical
impact to semantics.
|
20 |
Víceslovná pojmenování v italštině / Multiword expressions in ItalianJungwirthová, Klára January 2015 (has links)
The main topic of this thesis are the multiword expressions in the italian language. The thesis is divided into two parts - the theorical and the empirical part. The theorical part deals with the multiword expressions, the syntagmas and the idiomatic expressions. In the empirical part the connections between the constituents of the multiword expressions will be researched. Than four criteria will be on the multiword expressions applied (head inflection, insertion of the head's modifiers, pronominalisation of the head and dislocation and topicalization of the head). These transformations will be verified with the aid of corpora and questionnaires. Depending on the results of this research will be decided if the multiword expressions resemble the syntagmas or the idiomatic expressions.
|
Page generated in 0.0513 seconds