Global ETD Search

11	Outomatiese Setswana lemma-identifisering / Jeanetta Hendrina Brits Brits, Jeanetta Hendrina January 2006 (has links) Within the context of natural language processing, a lemmatiser is one of the most important core technology modules that has to be developed for a particular language. A lemmatiser reduces words in a corpus to the corresponding lemmas of the words in the lexicon. A lemma is defined as the meaningful base form from which other more complex forms (i.e. variants) are derived. Before a lemmatiser can be developed for a specific language, the concept "lemma" as it applies to that specific language should first be defined clearly. This study concludes that, in Setswana, only stems (and not roots) can act independently as words; therefore, only stems should be accepted as lemmas in the context of automatic lemmatisation for Setswana. Five of the seven parts of speech in Setswana could be viewed as closed classes, which means that these classes are not extended by means of regular morphological processes. The two other parts of speech (nouns and verbs) require the implementation of alternation rules to determine the lemma. Such alternation rules were formalised in this study, for the purpose of development of a Setswana lemmatiser. The existing Setswana grammars were used as basis for these rules. Therewith the precision of the formalisation of these existing grammars to lemmatise Setswana words could be determined. The software developed by Van Noord (2002), FSA 6, is one of the best-known applications available for the development of finite state automata and transducers. Regular expressions based on the formalised morphological rules were used in FSA 6 to create finite state transducers. The code subsequently generated by FSA 6 was implemented in the lemmatiser. The metric that applies to the evaluation of the lemmatiser is precision. On a test corpus of 1 000 words, the lemmatiser obtained 70,92%. In another evaluation on 500 complex nouns and 500 complex verbs separately, the lemmatiser obtained 70,96% and 70,52% respectively. Expressed in numbers the precision on 500 complex and simplex nouns was 78,45% and on complex and simplex verbs 79,59%. The quantitative achievement only gives an indication of the relative precision of the grammars. Nevertheless, it did offer analysed data with which the grammars were evaluated qualitatively. The study concludes with an overview of how these results might be improved in the future. / Thesis (M.A. (African Languages))--North-West University, Potchefstroom Campus, 2006. Computational linguistics Setswana grammar Setswana morphology Lemmatisation Stemming Lemma Natural language processing Regular expression Finite state automata Finite state transducer FSA 6
12	The corpus of Greek medical papyri and digital papyrology Reggiani, Nicola 20 April 2016 (has links) (PDF) The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition). griechische medizinische Papyri digitiale Edition mit TEI/EpiDoc mehrschichtige linguistische Annotation Tree-Banking Lemmatisierung greek medical papyri digital encoding with TEI/EpiDoc multi-layer linguistic annotation treebanking lemmatisation ddc:930
13	The corpus of Greek medical papyri and digital papyrology: new perspectives from an ongoing project Reggiani, Nicola January 2016 (has links) The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition). info:eu-repo/classification/ddc/930 ddc:930
14	Vers une plateforme informatique pour l'expérimentation d'outils de classification Bokhabrine, Ayoub January 2019 (has links) (PDF) No description available. Association Classe Classification Descripteur Expérimentation Extraction Hiérarchique Informatique Itemsets K-means K-médoïdes Lemmatisation M-confiance M-support Mot Nettoyage Nombre Outil Partionnement Plateforme Règle Segmentation Texte Vocabulaire
15	Die deelwoord in Afrikaans : perspektiewe vanuit ŉ kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella Butler Butler, Anna Petronella January 2014 (has links) During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations. In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions. Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb. Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions. The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014 Afrikaans Participle Present participle Past participle Cognitive grammar Lemmatisation Part-of-speech tagging Participle form of the verb Deelwoord Onvoltooide deelwoord Voltooide deelwoord Partisipium Kognitiewe grammatika Lemmatisering Woordsoortetikettering Deelwoordvorm van die werkwoord
16	Die deelwoord in Afrikaans : perspektiewe vanuit ŉ kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella Butler Butler, Anna Petronella January 2014 (has links) During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations. In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions. Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb. Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions. The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014 Afrikaans Participle Present participle Past participle Cognitive grammar Lemmatisation Part-of-speech tagging Participle form of the verb Deelwoord Onvoltooide deelwoord Voltooide deelwoord Partisipium Kognitiewe grammatika Lemmatisering Woordsoortetikettering Deelwoordvorm van die werkwoord

Page generated in 0.0695 seconds