• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • 1
  • Tagged with
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Outomatiese genreklassifikasie vir hulpbronskaars tale / Dirk Snyman

Snyman, Dirk Petrus January 2012 (has links)
When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated using automatic text classification systems which classifies a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is the genre of a text. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aims to: i) investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language), ii) transfer this approach to other indigenous South African resource scarce languages, and iii) investigate the effectiveness of technology recycling for closely related languages in a resource scarce environment. To achieve the first goal, five machine learning approaches were identified from the literature that are generally used for text classification, together with five common approaches to feature extraction. Two different approaches to the identification of genre classes are presented. The machine learning-, feature extraction- and genre class identification approaches were used in a series of experiments to identify the best approach for genre classification for a resource scarce language. The best combination is identified as the multinomial naïve Bayes algorithm, using a bag of words approach as features to classify texts into three abstract classes. This results in an f-score (performance measure) of 0.929 and it was subsequently shown that this approach can be successfully applied to other indigenous South African languages. To investigate the viability of technology recycling for genre classification systems for closely related languages, Dutch test data was classified using an Afrikaans genre classification system and it is shown that this approach works well. A pre-processing step was implemented by using a machine translation system to increase the compatibility between Afrikaans and Dutch by translating the Dutch texts before classification. This results in an f-score of 0.577, indicating that technology recycling between closely related languages has merit. This approach can be used to promote and fast track the development of genre classification systems in a resource scarce environment. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2013
2

Outomatiese genreklassifikasie vir hulpbronskaars tale / Dirk Snyman

Snyman, Dirk Petrus January 2012 (has links)
When working in the terrain of text processing, metadata about a particular text plays an important role. Metadata is often generated using automatic text classification systems which classifies a text into one or more predefined classes or categories based on its contents. One of the dimensions by which a text can be can be classified, is the genre of a text. In this study the development of an automatic genre classification system in a resource scarce environment is postulated. This study aims to: i) investigate the techniques and approaches that are generally used for automatic genre classification systems, and identify the best approach for Afrikaans (a resource scarce language), ii) transfer this approach to other indigenous South African resource scarce languages, and iii) investigate the effectiveness of technology recycling for closely related languages in a resource scarce environment. To achieve the first goal, five machine learning approaches were identified from the literature that are generally used for text classification, together with five common approaches to feature extraction. Two different approaches to the identification of genre classes are presented. The machine learning-, feature extraction- and genre class identification approaches were used in a series of experiments to identify the best approach for genre classification for a resource scarce language. The best combination is identified as the multinomial naïve Bayes algorithm, using a bag of words approach as features to classify texts into three abstract classes. This results in an f-score (performance measure) of 0.929 and it was subsequently shown that this approach can be successfully applied to other indigenous South African languages. To investigate the viability of technology recycling for genre classification systems for closely related languages, Dutch test data was classified using an Afrikaans genre classification system and it is shown that this approach works well. A pre-processing step was implemented by using a machine translation system to increase the compatibility between Afrikaans and Dutch by translating the Dutch texts before classification. This results in an f-score of 0.577, indicating that technology recycling between closely related languages has merit. This approach can be used to promote and fast track the development of genre classification systems in a resource scarce environment. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2013
3

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
4

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
5

Skoner en kleiner vertaalgeheues

Wolff, Friedel 10 1900 (has links)
Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings is vertaalgeheuestelsels en masjienvertaalstelsels. By hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n tweetalige versameling vorige vertalings. Hierdie proefskrif bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter. ’n Masjienleerbenadering word gevolg om foutiewe inskrywings in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke in drie kategorieë word aangebied: kenmerke wat verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser bereken word, asook statistiese kenmerke wat met behulp van eksterne data bereken word. Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer nie. In hierdie proefskrif word ’n verskeidenheid probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde evaluasiemetode word ontwikkel. Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder, is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings. Eksperimente dui aan dat so ’n vertaalgeheue beter prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende bewys vir die waarde van ’n skoner vertaalgeheue word ’n verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches are translation memory systems and machine translation systems. With these two technologies a translation memory is used— a bilingual collection of previous translations. This thesis presents methods to improve the quality of a translation memory. A machine learning approach is followed to identify incorrect entries in a translation memory. A variety of learning features in three categories are presented: features associated with text length, features calculated by quality checkers such as translation checkers, a spell checker and a grammar checker, as well as statistical features computed with the help of external data. The evaluation of translation memory systems is not yet standardised. This thesis points out a number of problems with existing evaluation methods, and an improved evaluation method is developed. By removing the incorrect entries in a translation memory, a smaller, cleaner translation memory is available to applications. Experiments demonstrate that such a translation memory results in better performance in a translation memory system. As supporting evidence for the value of a cleaner translation memory, an improvement is also achieved in training a machine translation system. / School of Computing / Ph. D. (Rekenaarwetenskap)

Page generated in 0.0375 seconds