Spelling suggestions: "subject:"statistiese masjiene"" "subject:"statistiese masjienleer""
1 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
2 |
Sintaktiese herrangskikking as voorprosessering in die ontwikkeling van Engels na Afrikaanse statistiese masjienvertaalsisteem / Marissa GrieselGriesel, Marissa January 2011 (has links)
Statistic machine translation to any of the resource scarce South African languages generally results in low quality output. Large amounts of training data are required to generate output of such a standard that it can ease the work of human translators when incorporated into a translation environment. Sufficiently large corpora often do not exist and other techniques must be researched to improve the quality of the output. One of the methods in international literature that yielded good improvements in the quality of the output applies syntactic reordering as pre-processing. This pre-processing aims at simplifying the decod-ing process as less changes will need to be made during translation in this stage. Training will also benefit since the automatic word alignments can be drawn more easily because the word orders in both the source and target languages are more similar. The pre-processing is applied to the source language training data as well as to the text that is to be translated. It is in the form of rules that recognise patterns in the tags and adapt the structure accordingly. These tags are assigned to the source language side of the aligned parallel corpus with a syntactic analyser. In this research project, the technique is adapted for translation from English to Afrikaans and deals with the reordering of verbs, modals, the past tense construct, construc-tions with “to” and negation. The goal of these rules is to change the English (source language) structure to better resemble the Afrikaans (target language) structure. A thorough analysis of the output of the base-line system serves as the starting point. The errors that occur in the output are divided into categories and each of the underlying constructs for English and Afrikaans are examined. This analysis of the output and the literature on syntax for the two languages are combined to formulate the linguistically motivated rules. The module that performs the pre-processing is evaluated in terms of the precision and the recall, and these two measures are then combined in the F-score that gives one number by which the module can be assessed. All three of these measures compare well to international standards. Furthermore, a compari-son is made between the system that is enriched by the pre-processing module and a baseline system on which no extra processing is applied. This comparison is done by automatically calculating two metrics (BLEU and NIST scores) and it shows very positive results. When evaluating the entire document, an increase in the BLEU score from 0,4968 to 0,5741 (7,7 %) and in the NIST score from 8,4515 to 9,4905 (10,4 %) is reported. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
3 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
4 |
Dataselektering en –manipulering vir statistiese Engels–Afrikaanse masjienvertaling / McKellar C.A.McKellar, Cindy. January 2011 (has links)
Die sukses van enige masjienvertaalsisteem hang grootliks van die hoeveelheid en kwaliteit van die beskikbare afrigtingsdata af. n Sisteem wat met foutiewe of lae–kwaliteit data afgerig is, sal uiteraard swakker afvoer lewer as n sisteem wat met korrekte of hoë–kwaliteit data afgerig is. In die geval van hulpbronarm tale waar daar min data beskikbaar is en data dalk noodgedwonge vertaal moet word vir die skep van parallelle korpora wat as afrigtingsdata kan dien, is dit dus baie belangrik dat die data wat vir vertaling gekies word, so gekies word dat dit teksgedeeltes insluit wat die meeste waarde tot die masjienvertaalsisteem sal bydra. Dit is ook in so n geval uiters belangrik om die beskikbare data so goed moontlik aan te wend.
Hierdie studie stel ondersoek in na metodes om afrigtingsdata te selekteer met die doel om n optimale masjienvertaalsisteem met beperkte hulpbronne af te rig. Daar word ook aandag gegee aan die moontlikheid om die gewigte van sekere gedeeltes van die afrigtingsdata te verhoog om sodoende die data wat die meeste waarde tot die masjienvertaalsisteem bydra te beklemtoon. Alhoewel hierdie studie spesifiek gerig is op metodes vir dataselektering en –manipulering vir die taalpaar Engels–Afrikaans, sou die metodes ook vir toepassing op ander taalpare gebruik kon word.
Die evaluasieproses dui aan dat beide die dataselekteringsmetodes, asook die aanpassing van datagewigte, n positiewe impak op die kwaliteit van die resulterende masjienvertaalsisteem het. Die uiteindelike sisteem, afgerig deur n kombinasie van verskillende metodes, toon n 2.0001 styging in die NIST–telling en n 0.2039 styging in die BLEU–telling. / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2011.
|
Page generated in 0.1248 seconds