Spelling suggestions: "subject:"masjienvertaling"" "subject:"masjienverstellings""
1 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
2 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
3 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
4 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
5 |
Skoner en kleiner vertaalgeheuesWolff, Friedel 10 1900 (has links)
Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings
is vertaalgeheuestelsels en masjienvertaalstelsels. By
hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n
tweetalige versameling vorige vertalings. Hierdie proefskrif
bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter.
’n Masjienleerbenadering word gevolg om foutiewe inskrywings
in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke
in drie kategorieë word aangebied: kenmerke wat
verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers
soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser
bereken word, asook statistiese kenmerke wat met behulp van
eksterne data bereken word.
Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer
nie. In hierdie proefskrif word ’n verskeidenheid
probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde
evaluasiemetode word ontwikkel.
Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder,
is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings.
Eksperimente dui aan dat so ’n vertaalgeheue beter
prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende
bewys vir die waarde van ’n skoner vertaalgeheue word ’n
verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches
are translation memory systems and machine translation
systems. With these two technologies a translation memory
is used— a bilingual collection of previous translations.
This thesis presents methods to improve the quality of a translation
memory.
A machine learning approach is followed to identify incorrect
entries in a translation memory. A variety of learning features
in three categories are presented: features associated with text
length, features calculated by quality checkers such as translation
checkers, a spell checker and a grammar checker, as well
as statistical features computed with the help of external data.
The evaluation of translation memory systems is not yet standardised.
This thesis points out a number of problems with existing
evaluation methods, and an improved evaluation method
is developed.
By removing the incorrect entries in a translation memory, a
smaller, cleaner translation memory is available to applications.
Experiments demonstrate that such a translation memory results
in better performance in a translation memory system.
As supporting evidence for the value of a cleaner translation
memory, an improvement is also achieved in training a machine
translation system. / School of Computing / Ph. D. (Rekenaarwetenskap)
|
Page generated in 0.083 seconds