Spelling suggestions: "subject:"lemmatisation"" "subject:"grammatisation""
11 |
Outomatiese Setswana lemma-identifisering / Jeanetta Hendrina BritsBrits, Jeanetta Hendrina January 2006 (has links)
Within the context of natural language processing, a lemmatiser is one of the
most important core technology modules that has to be developed for a particular
language. A lemmatiser reduces words in a corpus to the corresponding lemmas
of the words in the lexicon.
A lemma is defined as the meaningful base form from which other more complex
forms (i.e. variants) are derived. Before a lemmatiser can be developed for a
specific language, the concept "lemma" as it applies to that specific language
should first be defined clearly. This study concludes that, in Setswana, only
stems (and not roots) can act independently as words; therefore, only stems
should be accepted as lemmas in the context of automatic lemmatisation for
Setswana.
Five of the seven parts of speech in Setswana could be viewed as closed
classes, which means that these classes are not extended by means of regular
morphological processes. The two other parts of speech (nouns and verbs) require
the implementation of alternation rules to determine the lemma. Such alternation
rules were formalised in this study, for the purpose of development of a
Setswana lemmatiser. The existing Setswana grammars were used as basis for
these rules. Therewith the precision of the formalisation of these existing grammars
to lemmatise Setswana words could be determined.
The software developed by Van Noord (2002), FSA 6, is one of the best-known
applications available for the development of finite state automata and transducers.
Regular expressions based on the formalised morphological rules were
used in FSA 6 to create finite state transducers. The code subsequently generated
by FSA 6 was implemented in the lemmatiser.
The metric that applies to the evaluation of the lemmatiser is precision. On a test
corpus of 1 000 words, the lemmatiser obtained 70,92%. In another evaluation
on 500 complex nouns and 500 complex verbs separately, the lemmatiser obtained
70,96% and 70,52% respectively. Expressed in numbers the precision on
500 complex and simplex nouns was 78,45% and on complex and simplex verbs
79,59%. The quantitative achievement only gives an indication of the relative
precision of the grammars. Nevertheless, it did offer analysed data with which
the grammars were evaluated qualitatively. The study concludes with an overview
of how these results might be improved in the future. / Thesis (M.A. (African Languages))--North-West University, Potchefstroom Campus, 2006.
|
12 |
The corpus of Greek medical papyri and digital papyrologyReggiani, Nicola 20 April 2016 (has links) (PDF)
The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition).
|
13 |
The corpus of Greek medical papyri and digital papyrology: new perspectives from an ongoing projectReggiani, Nicola January 2016 (has links)
The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition).
|
14 |
Vers une plateforme informatique pour l'expérimentation d'outils de classificationBokhabrine, Ayoub January 2019 (has links) (PDF)
No description available.
|
15 |
Die deelwoord in Afrikaans : perspektiewe vanuit ʼn kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella ButlerButler, Anna Petronella January 2014 (has links)
During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations.
In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions.
Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb.
Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions.
The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014
|
16 |
Die deelwoord in Afrikaans : perspektiewe vanuit ʼn kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella ButlerButler, Anna Petronella January 2014 (has links)
During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations.
In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions.
Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb.
Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions.
The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014
|
Page generated in 0.0695 seconds