291 |
Investigating the effectiveness of available tools for translating into tshiVendaNemutamvuni, Mulalo Edward 11 1900 (has links)
Text in English / Abstracts in English and Venda / This study has investigated the effectiveness of available tools used for translating from English into Tshivenḓa and vice versa with the aim to investigate and determine the effectiveness of these tools. This study dealt with the problem of lack of effective translation tools used to translate between English and Tshivenḓa. Tshivenḓa is one of South Africa’s minority languages. Its (Tshivenḓa) lack of effective translation tools negatively affects language practitioners’ work. This situation is perilous for translation quality assurance. Translation tools, both computer technology and non-computer technology tools abound for developed languages such as English, French and others. Based on the results of this research project, the researcher did make recommendations that could remedy the situation. South Africa is a democratic country that has a number of language-related policies. This then creates a conducive context for stakeholders with language passion to fully develop Tshivenḓa language in all dimensions. The fact is that all languages have evolved and they were all underdeveloped. This vividly shows that Tshivenḓa language development is also possible just like Afrikaans, which never existed on earth before 1652. It (Afrikaans) has evolved and overtaken all indigenous South African languages.
This study did review the literature regarding translation and translation tools. The literature was obtained from both published and unpublished sources. The study has used mixed methods research, i.e. quantitative and qualitative research methods. These methods successfully complemented each other throughout the entire research. Data were gathered through questionnaires and interviews wherein both open and closed-ended questions were employed. Both purposive/judgemental and snowball (chain) sampling have been applied in this study. Data analysis was addressed through a combination of methods owing to the nature of mixed methods research. Guided by analytic comparison approach when grouping together related data during data analysis and presentation, both statistical and textual analyses have been vital in this study. Themes were constructed to lucidly present the gathered data. At the last chapters, the researcher discussed the findings and evaluated the entire research before making recommendations and conclusion. / Iyi ṱhoḓisiso yo ita tsedzuluso nga ha kushumele kwa zwishumiswa zwi re hone zwine zwa shumiswa u pindulela u bva kha luambo lwa English u ya kha Tshivenḓa na u bva kha Tshivenḓa u ya kha English ndivho I ya u sedzulusa na u lavhelesa kushumele kwa izwi zwishumiswa uri zwi a thusa naa. Ino ṱhoḓisiso yo shumana na thaidzo ya ṱhahelelo ya zwishumiswa zwa u pindulela zwine zwa shumiswa musi hu tshi pindulelwa vhukati ha English na Tshivenḓa. Tshivenḓa ndi luṅwe lwa nyambo dza Afrika Tshipembe dzine dza ambiwa nga vhathu vha si vhanzhi. U shaea ha zwishumiswa zwa u pindulela zwine zwa shuma nga nḓila I thusaho zwi kwama mushumo wa vhashumi vha zwa nyambo nga nḓila I si yavhuḓi. Iyi nyimele I na mulingo u kwamaho khwaḽithi ya zwo pindulelwaho. Zwishumiswa zwa u pindulela, zwa thekhnoḽodzhi ya khomphiyutha na zwi sa shumisi thekhnoḽodzhi ya khomphiyutha zwo ḓalesa kha nyambo dzo bvelelaho u tou fana na kha English, French na dziṅwe. Zwo sendeka kha mvelelo dza ino thandela ya ṱhoḓisiso, muṱoḓisisi o ita themendelo dzine dza nga fhelisa thaidzo ya nyimele. Afrika Tshipembe ndi shango ḽa demokirasi ḽine ḽa vha na mbekanyamaitele dzo vhalaho nga ha dzinyambo. Izwi zwi ita uri hu vhe na nyimele ine vhafaramikovhe vhane vha funesa nyambo vha kone u bveledza Tshivenḓa kha masia oṱhe. Zwavhukuma ndi zwa uri nyambo dzoṱhe dzi na mathomo nahone dzoṱhe dzo vha dzi songo bvelela. Izwi zwi ita uri zwi vhe khagala uri luambo lwa Tshivenḓa na lwone lu nga bveledzwa u tou fana na luambo lwa Afrikaans lwe lwa vha lu si ho ḽifhasini phanḓa ha ṅwaha wa 1652. Ulu luambo (Afrikaans) lwo vha hone shangoni lwa mbo bveledzwa lwa fhira nyambo dzoṱhe dza fhano hayani Afrika Tshipembe.
Kha ino ṱhoḓisiso ho vhaliwa maṅwalwa ane a amba nga ha u pindulela na nga ha zwishumiswa zwa u pindulela. Maṅwalwa e a vhalwa o wanala kha zwiko zwo kanḓiswaho na zwiko zwi songo kanḓiswaho. Ino ṱhoḓisiso yo shumisa ngona dza ṱhoḓisiso dzo ṱanganyiswaho, idzo ngona ndi khwanthithethivi na khwaḽithethivi. Idzi ngona dzo shumisana zwavhuḓisa kha ṱhoḓisiso yoṱhe. Data yo kuvhanganywa hu tshi khou shumiswa dzimbudziso na u tou vhudzisa hune afho ho shumiswa mbudziso dzo vuleaho na dzo valeaho. Ngona dza u nanga sambula muṱoḓisisi o shumisa khaṱulo yawe uri ndi nnyi ane a nga vha a na data yo teaho na u humbela vhavhudziswa uri vha bule vhaṅwe vhathu vha re na data yo teaho ino ṱhoḓisiso.
viii
Tsenguluso ya data ho ṱanganyiswa ngona dza u sengulusa zwo itiswa ngauri ṱhoḓisiso ino yo ṱanganyisa ngona dza u ita ṱhoḓisiso. Sumbanḓila ho shumiswa tsenguluso ya mbambedzo kha u sengulusa data. Data ine ya fana yo vhewa fhethu huthihi musi hu tshi khou senguluswa na u vhiga. Tsenguluso I shumisaho mbalo/tshivhalo (khwanthithethivi) na I shumisaho maipfi kha ino ngudo dzo shumiswa. Ho vhumbiwa dziṱhoho u itela u ṱana data ye ya kuvhanganywa. Ngei kha ndima dza u fhedza, muṱodisisi o rera nga ha mawanwa, o ṱhaṱhuvha ṱhoḓisiso yoṱhe phanḓa ha u ita themendelo na u vhina. / African Languages / M.A. (African Languages)
|
292 |
Extraction de phrases parallèles à partir d’un corpus comparable avec des réseaux de neurones récurrents bidirectionnelsGrégoire, Francis 12 1900 (has links)
No description available.
|
293 |
Étude sur l’influence du vocabulaire utilisé pour l’indexation des images en contexte de repérage multilingueMénard, Elaine 27 November 2008 (has links)
Depuis quelques années, Internet est devenu un média incontournable pour la diffusion de ressources multilingues. Cependant, les différences linguistiques constituent souvent un obstacle majeur aux échanges de documents scientifiques, culturels, pédagogiques et commerciaux. En plus de cette diversité linguistique, on constate le développement croissant de bases de données et de collections composées de différents types de documents textuels ou multimédias, ce qui complexifie également le processus de repérage documentaire. En général, on considère l’image comme « libre » au point de vue linguistique. Toutefois, l’indexation en vocabulaire contrôlé ou libre (non contrôlé) confère à l’image un statut linguistique au même titre que tout document textuel, ce qui peut avoir une incidence sur le repérage.
Le but de notre recherche est de vérifier l’existence de différences entre les caractéristiques de deux approches d’indexation pour les images ordinaires représentant des objets de la vie quotidienne, en vocabulaire contrôlé et en vocabulaire libre, et entre les résultats obtenus au moment de leur repérage. Cette étude suppose que les deux approches d’indexation présentent des caractéristiques communes, mais également des différences pouvant influencer le repérage de l’image. Cette recherche permet de vérifier si l’une ou l’autre de ces approches d’indexation surclasse l’autre, en termes d’efficacité, d’efficience et de satisfaction du chercheur d’images, en contexte de repérage multilingue.
Afin d’atteindre le but fixé par cette recherche, deux objectifs spécifiques sont définis : identifier les caractéristiques de chacune des deux approches d’indexation de l’image ordinaire représentant des objets de la vie quotidienne pouvant influencer le repérage, en contexte multilingue et exposer les différences sur le plan de l’efficacité, de l’efficience et de la satisfaction du chercheur d’images à repérer des images ordinaires représentant des objets de la vie quotidienne indexées à l’aide d’approches offrant des caractéristiques variées, en contexte multilingue. Trois modes de collecte des données sont employés : l’analyse des termes utilisés pour l’indexation des images, la simulation du repérage d’un ensemble d’images indexées selon chacune des formes d’indexation à l’étude réalisée auprès de soixante répondants, et le questionnaire administré aux participants pendant et après la simulation du repérage. Quatre mesures sont définies pour cette recherche : l’efficacité du repérage d’images, mesurée par le taux de succès du repérage calculé à l’aide du nombre d’images repérées; l’efficience temporelle, mesurée par le temps, en secondes, utilisé par image repérée; l’efficience humaine, mesurée par l’effort humain, en nombre de requêtes formulées par image repérée et la satisfaction du chercheur d’images, mesurée par son autoévaluation suite à chaque tâche de repérage effectuée.
Cette recherche montre que sur le plan de l’indexation de l’image ordinaire représentant des objets de la vie quotidienne, les approches d’indexation étudiées diffèrent fondamentalement l’une de l’autre, sur le plan terminologique, perceptuel et structurel. En outre, l’analyse des caractéristiques des deux approches d’indexation révèle que si la langue d’indexation est modifiée, les caractéristiques varient peu au sein d’une même approche d’indexation. Finalement, cette recherche souligne que les deux approches d’indexation à l’étude offrent une performance de repérage des images ordinaires représentant des objets de la vie quotidienne différente sur le plan de l’efficacité, de l’efficience et de la satisfaction du chercheur d’images, selon l’approche et la langue utilisées pour l’indexation. / During the last few years, the Internet has become an indispensable medium for the dissemination of multilingual resources. However, language differences are often a major obstacle to the exchange of scientific, cultural, educational and commercial documents. Besides this linguistic diversity, many databases and collections now contain documents in various formats that can also adversely affect their retrieval process. In general, images are considered to be language-independent resources. Nevertheless, the image indexing process using either a controlled or uncontrolled vocabulary gives the image a linguistic status similar to any other textual document and thus leads to the same difficulties in their retrieval.
The goal of our research is to first identify the differences between the indexing approaches using a controlled and an uncontrolled vocabulary for ordinary images of everyday-life objects and to then differentiate between the results obtained at the time of image retrieval. This study supposes that the two indexing approaches show not only common characteristics, but also differences that can influence image retrieval. Thus, this research makes it possible to indicate if one of these indexing approaches surpasses the other in terms of effectiveness, efficiency, and satisfaction of the image searcher in a multilingual retrieval context.
For this study, two specific objectives are defined: to identify the characteristics of each approach used for ordinary image indexing of everyday-life objects that can effect image retrieval in a multilingual context; and to explore the differences between the two indexing approaches in terms of their effectiveness, their efficiency, and the satisfaction of the image searcher when trying to retrieve ordinary images of everyday-life objects indexed according to either approach in a multilingual retrieval context. Three methods of data collection are used: an analysis of the image indexing terms, a simulation of the retrieval of a set of images indexed according to each of the two indexing approaches conducted with sixty respondents, and a questionnaire submitted to the participants during and after the retrieval simulation. Four measures are defined in this research: the effectiveness of image retrieval measured by the success rate calculated in terms of the number of retrieved images; time efficiency measured by the average time, in seconds, used to retrieve an image; human efficiency measured in terms of the human effort represented per average number of queries necessary to retrieve an image; and the satisfaction of the image searcher measured by the self-evaluation of the participant of the retrieval process after each completed task.
This research shows that in terms of ordinary image indexing representing everyday-life objects, the two approaches investigated are fundamentally distinct on the terminological, perceptual, and structural perspective. Additionally, the analysis of the characteristics of the two indexing approaches reveals that if the indexing language differs, the characteristics vary little within the same indexing approach. Finally, this research underlines that the two indexing approaches of ordinary images representing everyday-life objects have a retrieval performance that is different in terms of its effectiveness, efficiency, and satisfaction of the image searcher according to the approach and the language used for indexing.
|
294 |
An exploratory study of translations of the Dewey Decimal Classification system into South African languagesDe Jager, Gert Johannes Jacobus 06 1900 (has links)
This research investigated the feasibility of South African translations of Dewey Decimal Classification (DDC). The study provides an introductory overview of DDC throughout the world, followed by its use in South Africa. The introduction highlights shortcomings and possible solutions – of which translations seem to be the most ideal. This research involved a critical analysis of the literature on DDC
translations, a documentary analysis and technology-based research in the form of Google translations and evaluation of parts of Abridged Edition 15 of DDC.
The critical analysis of the literature and the documentary analysis identified problems relating to translations, how translations deal with shortcomings in DDC, the fact that no literature exists on multilingual translations, and the process of translations (including the fact that this is an expensive endeavour). It also revealed information about sponsorship and the mixed translation model.
The technology-based research, using Google Translate for translations of parts of Abridged Edition 15 and the subsequent evaluation of these translations indicated that Google translations were comprehensive and needed minimum editorial effort. Further to this it paved the way for describing a possible workflow for South African translations and indicated that the parts already translated as well as
further Google translations can expedite the translation process. A model for South African translations, based on only the cost of the Pansoft translation software was proposed. The mixed model approach, where some languages are used as main languages (schedules, Relative Index terms and the like) and others for Relative Index terms only, was deemed the most appropriate in the South African context.
This led to the conclusion that DDC translations into ten of the official South African languages are indeed feasible. The research supports translations that keep the integrity of DDC intact, with possible expansions based on literary arrant. It is important, though, to get the support of the South African library community and authoritative bodies such as the National Library of South Africa and/or the Library and Information Association of South Africa (LIASA) to negotiate and sign a contract for these translations. / Information Science / D. Litt. et Phil. (Information Science)
|
295 |
Skoner en kleiner vertaalgeheuesWolff, Friedel 10 1900 (has links)
Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings
is vertaalgeheuestelsels en masjienvertaalstelsels. By
hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n
tweetalige versameling vorige vertalings. Hierdie proefskrif
bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter.
’n Masjienleerbenadering word gevolg om foutiewe inskrywings
in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke
in drie kategorieë word aangebied: kenmerke wat
verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers
soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser
bereken word, asook statistiese kenmerke wat met behulp van
eksterne data bereken word.
Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer
nie. In hierdie proefskrif word ’n verskeidenheid
probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde
evaluasiemetode word ontwikkel.
Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder,
is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings.
Eksperimente dui aan dat so ’n vertaalgeheue beter
prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende
bewys vir die waarde van ’n skoner vertaalgeheue word ’n
verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches
are translation memory systems and machine translation
systems. With these two technologies a translation memory
is used— a bilingual collection of previous translations.
This thesis presents methods to improve the quality of a translation
memory.
A machine learning approach is followed to identify incorrect
entries in a translation memory. A variety of learning features
in three categories are presented: features associated with text
length, features calculated by quality checkers such as translation
checkers, a spell checker and a grammar checker, as well
as statistical features computed with the help of external data.
The evaluation of translation memory systems is not yet standardised.
This thesis points out a number of problems with existing
evaluation methods, and an improved evaluation method
is developed.
By removing the incorrect entries in a translation memory, a
smaller, cleaner translation memory is available to applications.
Experiments demonstrate that such a translation memory results
in better performance in a translation memory system.
As supporting evidence for the value of a cleaner translation
memory, an improvement is also achieved in training a machine
translation system. / School of Computing / Ph. D. (Rekenaarwetenskap)
|
296 |
La traduction automatique statistique factorisée : une application à la paire de langues français - roumain / Factored phrase based statistical machine translation : a French - Romanian applicationLaporte, Elena-Mirabela 13 June 2014 (has links)
Un premier objectif de cette thèse est la constitution de ressources linguistiques pour un système de traduction automatique statistique factorisée français - roumain. Un deuxième objectif est l’étude de l’impact des informations linguistiques exploitées dans le processus d’alignement lexical et de traduction. Cette étude est motivée, d’une part, par le manque de systèmes de traduction automatique pour la paire de langues étudiées et, d’autre part, par le nombre important d’erreurs générées par les systèmes de traduction automatique actuels. Les ressources linguistiques requises par ce système sont des corpus parallèles alignés au niveau propositionnel et lexical. Ces corpus sont également segmentés lexicalement, lemmatisés et étiquetés au niveau morphosyntaxique. / Our first aim is to build linguistic resources for a French - Romanian factored phrase - based statistical machine translation system. Our second aim is to study the impact of exploited linguistic information in the lexical alignment and translation process. On the one hand, this study is motivated by the lack of such systems for the studied languages. On the other hand, it is motivated by the high number of errors provided by the current machine translation systems. The linguistic resources required by the system are tokenized, lemmatized, tagged, word, and sentence - aligned parallel corpora.
|
297 |
C2C: um chat bilíngue com apoio de senso comum / C2C: um chat bilíngue com apoio de senso comumSugiyama, Bruno Akio 21 October 2011 (has links)
Made available in DSpace on 2016-06-02T19:05:53Z (GMT). No. of bitstreams: 1
3952.pdf: 2039842 bytes, checksum: 13562902fbbd07a996facebf66d35dd7 (MD5)
Previous issue date: 2011-10-21 / Financiadora de Estudos e Projetos / In this research, we describe how common sense knowledge with machine translation can help the communication among people with different cultural background. In order to evaluate this possibility, we developed a bilingual chat called Culture-to-Chat or C2C that provides a communication channel and has resources that help its user to create messages in a second language. In the computer-mediated communication field, it is possible to notice that people are crossing geographic borders and having opportunities to share experiences among different cultures. Sharing information in a non native language can be difficult to some users. Some computational tools that support communication uses machine translation to aid users that need to work with different language. C2C also uses this approach and adopts a semantic network of cultural knowledge, collaboratively built on the Web through the Open Mind Common Sense in Brazil project (OMCS-Br) to work with cultural expression, in other words, terms whose meaning depends on user‟s culture. Following a user-centered design approach that focuses on prototyping, we present the development of C2C passing by low, middle and high fidelity prototypes. In order to observe how this computational tool is used and collect the opinion of target users, we perform a pilot study involving Brazilian and Canadian users. This study showed some enhancements for the tool and pointed evidences that this chat contributes to the communication between people with different cultural backgrounds. / Neste trabalho é descrito como o conhecimento de senso comum em conjunto com a tradução automática pode apoiar a comunicação entre pessoas de diferentes culturas. Para verificar a viabilidade do senso comum foi desenvolvido um chat bilíngue chamado Culture-to-Chat ou C2C, que, além de prover um canal de comunicação, possui mecanismos que auxiliam o usuário na criação de mensagens em língua estrangeira. No campo da comunicação mediada por computador, percebe-se que as pessoas estão cruzando as fronteiras geográficas e tendo oportunidades de troca de experiências entre diferentes culturas. Para alguns usuários, essa troca de informações é feita em uma língua não nativa, o que pode ser uma tarefa difícil para eles. Algumas ferramentas computacionais que apoiam a comunicação propõem o uso de tradução automática para trabalhar com usuários falantes de diferentes línguas. O C2C, além de adotar tal estratégia, adota a rede semântica de conhecimento cultural construida colaborativamente via Web do projeto Open Mind Common Sense no Brasil (OMCS-Br) para trabalhar com expressões culturais, ou seja, termos cujo significado depende da cultura do usuário. Utilizando uma abordagem centrada no usuário focando-se em prototipação, o desenvolvimento do C2C e suas funcionalidades são apresentados por meio de protótipos de diferentes níveis de fidelidade (baixa, média e alta). Com o intuito de observar o uso dessa ferramenta computacional e coletar opiniões de usuários, foi realizado um estudo piloto envolvendo usuários brasileiros e canadenses. Tal estudo mostrou possíveis melhorias pra a ferramenta e apontou indícios de que este chat contribui na comunicação entre pessoas de diferentes culturas.
|
298 |
Tradução automática estatística baseada em sintaxe e linguagens de árvoresBeck, Daniel Emilio 19 June 2012 (has links)
Made available in DSpace on 2016-06-02T19:05:58Z (GMT). No. of bitstreams: 1
4541.pdf: 1339407 bytes, checksum: be0e2f3bb86e7d6b4c8d03f4f20214ef (MD5)
Previous issue date: 2012-06-19 / Universidade Federal de Minas Gerais / Machine Translation (MT) is one of the classic Natural Language Processing (NLP) applications. The state-of-the-art in MT is represented by statistical methods that aim to learn all necessary linguistic knowledge automatically through large collections of texts (corpora). However, while the quality of statistical MT systems had improved, nowadays these advances are not significant. For this reason, research in the area have sought to involve more explicit linguistic knowledge in these systems. One issue that purely statistical MT systems have is the lack of correct treatment of syntactic phenomena. Thus, one of the research directions when trying to incorporate linguistic knowledge in those systems is through the addition of syntactic rules. To accomplish this, many methods and formalisms with this goal in mind are studied. This text presents the investigation of methods which aim to advance the state-of-the-art in statistical MT through models that consider syntactic information. The methods and formalisms studied are those used to deal with tree languages, mainly Tree Substitution Grammars (TSGs) and Tree-to-String (TTS) Transducers. From this work, a greater understanding was obtained about the studied formalisms and their behavior when used in NLP applications. / A Tradução Automática (Machine Translation - MT) é uma das aplicações clássicas dentro do Processamento da Língua Natural (Natural Language Processing - NLP). O estado-da-arte em MT é representado por métodos estatísticos, que buscam aprender o conhecimento linguístico necessário de forma automática por meio de grandes coleções de textos (os corpora). Entretanto, ainda que se tenha avançado bastante em relação à qualidade de sistemas estatísticos de MT, hoje em dia esses avanços não estão sendo significativos. Por conta disso, as pesquisas na área têm buscado formas de envolver mais conhecimento linguístico explícito nesses sistemas. Um dos problemas que não é bem resolvido por sistemas de MT puramente estatísticos é o correto tratamento de fenômenos sintáticos. Assim, uma das direções que as pesquisas tomam na hora de incorporar conhecimento linguístico a esses sistemas é através da adição de regras sintáticas. Para isso, uma série de métodos e formalismos foram e são estudados até hoje. Esse texto apresenta a investigação de métodos que se utilizam de informação sintática na tentativa de avançar no estado-da-arte da MT estatística. Foram utilizados métodos e formalismos que lidam com linguagens de a´rvores, em especial as Gramáticas de Substituição de Árvores (Tree Substitution Grammars - TSGs) e os Transdutores Árvore-para-String (Tree-to-String - TTS). Desta investigação, obteve-se maior entendimento sobre os formalismos estudados e seu comportamento em aplicações de NLP.
|
299 |
Étude sur l’influence du vocabulaire utilisé pour l’indexation des images en contexte de repérage multilingueMénard, Elaine 27 November 2008 (has links)
No description available.
|
300 |
Learning and time : on using memory and curricula for language understandingGulcehre, Caglar 05 1900 (has links)
No description available.
|
Page generated in 0.058 seconds