Global ETD Search

11	Neurale netwerke as moontlike woordafkappingstegniek vir Afrikaans Fick, Machteld 09 1900 (has links) Text in Afrikaans / Summaries in Afrikaans and English / In Afrikaans, soos in NederJands en Duits, word saamgestelde woorde aanmekaar geskryf. Nuwe woorde word dus voortdurend geskep deur woorde aanmekaar te haak Dit bemoeilik die proses van woordafkapping tydens teksprosessering, wat deesdae deur rekenaars gedoen word, aangesien die verwysingsbron gedurig verander. Daar bestaan verskeie afkappingsalgoritmes en tegnieke, maar die resultate is onbevredigend. Afrikaanse woorde met korrekte lettergreepverdeling is net die elektroniese weergawe van die handwoordeboek van die Afrikaanse Taal (HAT) onttrek. 'n Neutrale netwerk ( vorentoevoer-terugpropagering) is met sowat. 5 000 van hierdie woorde afgerig. Die neurale netwerk is verfyn deur 'n gcskikte afrigtingsalgoritme en oorfragfunksie vir die probleem asook die optimale aantal verborge lae en aantal neurone in elke laag te bepaal. Die neurale netwerk is met 5 000 nuwe woorde getoets en dit het 97,56% van moontlike posisies korrek as of geldige of ongeldige afkappingsposisies geklassifiseer. Verder is 510 woorde uit tydskrifartikels met die neurale netwerk getoets en 98,75% van moontlike posisies is korrek geklassifiseer. / In Afrikaans, like in Dutch and German, compound words are written as one word. New words are therefore created by simply joining words. Word hyphenation during typesetting by computer is a problem, because the source of reference changes all the time. Several algorithms and techniques for hyphenation exist, but results are not satisfactory. Afrikaans words with correct syllabification were extracted from the electronic version of the Handwoordeboek van die Afrikaans Taal (HAT). A neural network (feedforward backpropagation) was trained with about 5 000 of these words. The neural network was refined by heuristically finding a suitable training algorithm and transfer function for the problem as well as determining the optimal number of layers and number of neurons in each layer. The neural network was tested with 5 000 words not the training data. It classified 97,56% of possible points in these words correctly as either valid or invalid hyphenation points. Furthermore, 510 words from articles in a magazine were tested with the neural network and 98,75% of possible positions were classified correctly. / Computing / M.Sc. (Operasionele Navorsing) Neural networks Backpropagation Feed-forward Training algorithm Transfer function Resilient backpropagation Early termination Encoding Hyphenation Syllabification 410.285 Afrikaans language -- Syllabication Afrikaans language -- Data processing Syllabication -- Data processing Neural networks (Computer science)
12	Phonological and morphological nativisation of english loans in Tonga Zivenge, William 01 1900 (has links) This thesis analyzes the phonological and morphological nativisation of English loans in the Tonga language. The contact situation between English and Tonga, in Zimbabwe, facilitates transference of lexical items between the two languages. From having been one of the most widely used languages of the world, English has developed into the most influential donor of words to other languages such as Tonga. The infiltration of English words into the Tonga lexical inventory led to the adoption and subsequent nativisation of English words by the native Tonga speakers. The main deposit of English words into Tonga is the direct interaction between English and Tonga speakers. However, it is sometimes via other languages like Shona, Ndebele, Venda and Shangani. In the 21st century, English’s contribution to the vocabulary of Tonga became more widely spread, now covering a large proportion of the Tonga language’s lexical inventory. The fact that English is the medium of instruction, in Zimbabwe, language of technology, education, media, new administration, health, music, new religion and economic transactions means that it is regarded as the high variety language with coercive loaning powers. Words from English are then adopted and nativised in the Tonga language, since Tonga asserts itself an independent language that can handle loans on its own. The main focus of this study therefore, is to try and account for the phonological and morphological behavior and changes that take place in English words that enter into Tonga. Analyzing phonological processes that are employed during nativisation of loan words entails analyzing how Tonga speakers handle aspects of English language such as diphthongs, triphthongs, cluster consonants, CVC syllable structure and sounds in repairing unacceptable sequences in Tonga. The research also accounts for the handling of morphological differences between the two languages. This entails looking at how competence and ordered-rule framework are harmonized by Tonga speakers in repairing conflicting features at morphological level. Since the two languages have different morphological patterns, the research analyzes the repairing strategies to handle singular and plural noun prefixes, tenses and particles, which are morphological components of words. The researcher appreciates that the native Tonga speakers have robust intuitions on the proper way to nativise words. / African Languages / D.Litt. et Phil. (African Languages) Phonological nativisation Phonotactic constraints Epenthesis Vowel assimilation Syllabification Distinctive feature matrix Diglossia Bilingualism Monolingualism Vowel harmony 496.39115 Tsonga language (Nyasa) -- Phonetics English language -- Phonetics
13	'n Masjienleerbenadering tot woordafbreking in Afrikaans Fick, Machteld 06 1900 (has links) Text in Afrikaans / Die doel van hierdie studie was om te bepaal tot watter mate ’n suiwer patroongebaseerde benadering tot woordafbreking bevredigende resultate lewer. Die masjienleertegnieke kunsmatige neurale netwerke, beslissingsbome en die TEX-algoritme is ondersoek aangesien dit met letterpatrone uit woordelyste afgerig kan word om lettergreep- en saamgesteldewoordverdeling te doen. ’n Leksikon van Afrikaanse woorde is uit ’n korpus van elektroniese teks genereer. Om lyste vir lettergreep- en saamgesteldewoordverdeling te kry, is woorde in die leksikon in lettergrepe verdeel en saamgestelde woorde is in hul samestellende dele verdeel. Uit elkeen van hierdie lyste van ±183 000 woorde is ±10 000 woorde as toetsdata gereserveer terwyl die res as afrigtingsdata gebruik is. ’n Rekursiewe algoritme is vir saamgesteldewoordverdeling ontwikkel. In hierdie algoritme word alle ooreenstemmende woorde uit ’n verwysingslys (die leksikon) onttrek deur stringpassing van die begin en einde van woorde af. Verdelingspunte word dan op grond van woordlengte uit die samestelling van begin- en eindwoorde bepaal. Die algoritme is uitgebrei deur die tekortkominge van hierdie basiese prosedure aan te spreek. Neurale netwerke en beslissingsbome is afgerig en variasies van beide tegnieke is ondersoek om die optimale modelle te kry. Patrone vir die TEX-algoritme is met die OPatGen-program gegenereer. Tydens toetsing het die TEX-algoritme die beste op beide lettergreep- en saamgesteldewoordverdeling presteer met 99,56% en 99,12% akkuraatheid, respektiewelik. Dit kan dus vir woordafbreking gebruik word met min risiko vir afbrekingsfoute in gedrukte teks. Die neurale netwerk met 98,82% en 98,42% akkuraatheid op lettergreep- en saamgesteldewoordverdeling, respektiewelik, is ook bruikbaar vir lettergreepverdeling, maar dis meer riskant. Ons het bevind dat beslissingsbome te riskant is om vir lettergreepverdeling en veral vir woordverdeling te gebruik, met 97,91% en 90,71% akkuraatheid, respektiewelik. ’n Gekombineerde algoritme is ontwerp waarin saamgesteldewoordverdeling eers met die TEXalgoritme gedoen word, waarna die resultate van lettergreepverdeling deur beide die TEXalgoritme en die neurale netwerk gekombineer word. Die algoritme het 1,3% minder foute as die TEX-algoritme gemaak. ’n Toets op gepubliseerde Afrikaanse teks het getoon dat die risiko vir woordafbrekingsfoute in teks met gemiddeld tien woorde per re¨el ±0,02% is. / The aim of this study was to determine the level of success achievable with a purely pattern based approach to hyphenation in Afrikaans. The machine learning techniques artificial neural networks, decision trees and the TEX algorithm were investigated since they can be trained with patterns of letters from word lists for syllabification and decompounding. A lexicon of Afrikaans words was extracted from a corpus of electronic text. To obtain lists for syllabification and decompounding, words in the lexicon were respectively syllabified and compound words were decomposed. From each list of ±183 000 words, ±10 000 words were reserved as testing data and the rest was used as training data. A recursive algorithm for decompounding was developed. In this algorithm all words corresponding with a reference list (the lexicon) are extracted by string fitting from beginning and end of words. Splitting points are then determined based on the length of reassembled words. The algorithm was expanded by addressing shortcomings of this basic procedure. Artificial neural networks and decision trees were trained and variations of both were examined to find optimal syllabification and decompounding models. Patterns for the TEX algorithm were generated by using the program OPatGen. Testing showed that the TEX algorithm performed best on both syllabification and decompounding tasks with 99,56% and 99,12% accuracy, respectively. It can therefore be used for hyphenation in Afrikaans with little risk of hyphenation errors in printed text. The performance of the artificial neural network was lower, but still acceptable, with 98,82% and 98,42% accuracy for syllabification and decompounding, respectively. The decision tree with accuracy of 97,91% on syllabification and 90,71% on decompounding was found to be too risky to use for either of the tasks A combined algorithm was developed where words are first decompounded by using the TEX algorithm before syllabifying them with both the TEX algoritm and the neural network and combining the results. This algoritm reduced the number of errors made by the TEX algorithm by 1,3% but missed more hyphens. Testing the algorithm on Afrikaans publications showed the risk for hyphenation errors to be ±0,02% for text assumed to have an average of ten words per line. / Decision Sciences / D. Phil. (Operational Research) Woordafbreking Lettergreepverdeling Saamgesteldewoordverdeling Stringpassing Woordvlakakkuraatheid Verdelingsgeleentheidsvlakakkuraatheid Masjienleertegnieke Neurale netwerke Beslissingsbome Algoritme Hyphenation Syllabification Decompounding String fitting Word level accuracy Splitting opportunity level accuracy Machine learning Neural networks Decision trees Algoritm 410.285 Hyphen Afrikaans language -- Syllabication Afrikaans language -- Data processing Syllabication -- Data processing Neural networks (Computer science) Data compression (Computer science) Decision trees Algorithms
14	Masjienleerbenadering tot woordafbreking in Afrikaans Fick, Machteld 06 1900 (has links) Text in Afrikaans / Die doel van hierdie studie was om te bepaal tot watter mate ’n suiwer patroongebaseerde benadering tot woordafbreking bevredigende resultate lewer. Die masjienleertegnieke kunsmatige neurale netwerke, beslissingsbome en die TEX-algoritme is ondersoek aangesien dit met letterpatrone uit woordelyste afgerig kan word om lettergreep- en saamgesteldewoordverdeling te doen. ’n Leksikon van Afrikaanse woorde is uit ’n korpus van elektroniese teks genereer. Om lyste vir lettergreep- en saamgesteldewoordverdeling te kry, is woorde in die leksikon in lettergrepe verdeel en saamgestelde woorde is in hul samestellende dele verdeel. Uit elkeen van hierdie lyste van ±183 000 woorde is ±10 000 woorde as toetsdata gereserveer terwyl die res as afrigtingsdata gebruik is. ’n Rekursiewe algoritme is vir saamgesteldewoordverdeling ontwikkel. In hierdie algoritme word alle ooreenstemmende woorde uit ’n verwysingslys (die leksikon) onttrek deur stringpassing van die begin en einde van woorde af. Verdelingspunte word dan op grond van woordlengte uit die samestelling van begin- en eindwoorde bepaal. Die algoritme is uitgebrei deur die tekortkominge van hierdie basiese prosedure aan te spreek. Neurale netwerke en beslissingsbome is afgerig en variasies van beide tegnieke is ondersoek om die optimale modelle te kry. Patrone vir die TEX-algoritme is met die OPatGen-program gegenereer. Tydens toetsing het die TEX-algoritme die beste op beide lettergreep- en saamgesteldewoordverdeling presteer met 99,56% en 99,12% akkuraatheid, respektiewelik. Dit kan dus vir woordafbreking gebruik word met min risiko vir afbrekingsfoute in gedrukte teks. Die neurale netwerk met 98,82% en 98,42% akkuraatheid op lettergreep- en saamgesteldewoordverdeling, respektiewelik, is ook bruikbaar vir lettergreepverdeling, maar dis meer riskant. Ons het bevind dat beslissingsbome te riskant is om vir lettergreepverdeling en veral vir woordverdeling te gebruik, met 97,91% en 90,71% akkuraatheid, respektiewelik. ’n Gekombineerde algoritme is ontwerp waarin saamgesteldewoordverdeling eers met die TEXalgoritme gedoen word, waarna die resultate van lettergreepverdeling deur beide die TEXalgoritme en die neurale netwerk gekombineer word. Die algoritme het 1,3% minder foute as die TEX-algoritme gemaak. ’n Toets op gepubliseerde Afrikaanse teks het getoon dat die risiko vir woordafbrekingsfoute in teks met gemiddeld tien woorde per re¨el ±0,02% is. / The aim of this study was to determine the level of success achievable with a purely pattern based approach to hyphenation in Afrikaans. The machine learning techniques artificial neural networks, decision trees and the TEX algorithm were investigated since they can be trained with patterns of letters from word lists for syllabification and decompounding. A lexicon of Afrikaans words was extracted from a corpus of electronic text. To obtain lists for syllabification and decompounding, words in the lexicon were respectively syllabified and compound words were decomposed. From each list of ±183 000 words, ±10 000 words were reserved as testing data and the rest was used as training data. A recursive algorithm for decompounding was developed. In this algorithm all words corresponding with a reference list (the lexicon) are extracted by string fitting from beginning and end of words. Splitting points are then determined based on the length of reassembled words. The algorithm was expanded by addressing shortcomings of this basic procedure. Artificial neural networks and decision trees were trained and variations of both were examined to find optimal syllabification and decompounding models. Patterns for the TEX algorithm were generated by using the program OPatGen. Testing showed that the TEX algorithm performed best on both syllabification and decompounding tasks with 99,56% and 99,12% accuracy, respectively. It can therefore be used for hyphenation in Afrikaans with little risk of hyphenation errors in printed text. The performance of the artificial neural network was lower, but still acceptable, with 98,82% and 98,42% accuracy for syllabification and decompounding, respectively. The decision tree with accuracy of 97,91% on syllabification and 90,71% on decompounding was found to be too risky to use for either of the tasks A combined algorithm was developed where words are first decompounded by using the TEX algorithm before syllabifying them with both the TEX algoritm and the neural network and combining the results. This algoritm reduced the number of errors made by the TEX algorithm by 1,3% but missed more hyphens. Testing the algorithm on Afrikaans publications showed the risk for hyphenation errors to be ±0,02% for text assumed to have an average of ten words per line. / Decision Sciences / D. Phil. (Operational Research) Woordafbreking Lettergreepverdeling Saamgesteldewoordverdeling Stringpassing Woordvlakakkuraatheid Verdelingsgeleentheidsvlakakkuraatheid Masjienleertegnieke Neurale netwerke Beslissingsbome Algoritme Hyphenation Syllabification Decompounding String fitting Word level accuracy Splitting opportunity level accuracy Machine learning Neural networks Decision trees Algoritm 410.285 Hyphen Afrikaans language -- Syllabication Afrikaans language -- Data processing Syllabication -- Data processing Neural networks (Computer science) Data compression (Computer science) Decision trees Algorithms

Page generated in 0.1117 seconds