Global ETD Search

41	[en] A DEPENDENCY TREE ARC FILTER / [pt] UM FILTRO PARA ARCOS EM ÁRVORES DE DEPENDÊNCIA RENATO SAYAO CRYSTALLINO DA ROCHA 13 December 2018 (has links) [pt] A tarefa de Processamento de Linguagem Natural consiste em analisar linguagens naturais de forma computacional, facilitando o desenvolvimento de programas capazes de utilizar dados falados ou escritos. Uma das tarefas mais importantes deste campo é a Análise de Dependência. Tal tarefa consiste em analisar a estrutura gramatical de frases visando extrair aprender dados sobre suas relações de dependência. Em uma sentença, essas relações se apresentam em formato de árvore, onde todas as palavras são interdependentes. Devido ao seu uso em uma grande variedade de aplicações como Tradução Automática e Identificação de Papéis Semânticos, diversas pesquisas com diferentes abordagens são feitas nessa área visando melhorar a acurácia das árvores previstas. Uma das abordagens em questão consiste em encarar o problema como uma tarefa de classificação de tokens e dividi-la em três classificadores diferentes, um para cada sub-tarefa, e depois juntar seus resultados de forma incremental. As sub-tarefas consistem em classificar, para cada par de palavras que possuam relação paidependente, a classe gramatical do pai, a posição relativa entre os dois e a distância relativa entre as palavras. Porém, observando pesquisas anteriores nessa abordagem, notamos que o gargalo está na terceira sub-tarefa, a predição da distância entre os tokens. Redes Neurais Recorrentes são modelos que nos permitem trabalhar utilizando sequências de vetores, tornando viáveis problemas de classificação onde tanto a entrada quanto a saída do problema são sequenciais, fazendo delas uma escolha natural para o problema. Esse trabalho utiliza-se de Redes Neurais Recorrentes, em específico Long Short-Term Memory, para realizar a tarefa de predição da distância entre palavras que possuam relações de dependência como um problema de classificação sequence-to-sequence. Para sua avaliação empírica, este trabalho segue a linha de pesquisas anteriores e utiliza os dados do corpus em português disponibilizado pela Conference on Computational Natural Language Learning 2006 Shared Task. O modelo resultante alcança 95.27 por cento de precisão, resultado que é melhor do que o obtido por pesquisas feitas anteriormente para o modelo incremental. / [en] The Natural Language Processing task consists of analyzing the grammatical structure of a sentence written in natural language aiming to learn, identify and extract information related to its dependency structure. This data can be structured like a tree, since every word in a sentence has a head-dependent relation to another word from the same sentence. Since Dependency Parsing is used in many applications like Machine Translation, Semantic Role Labeling and Part-Of-Speech Tagging, researchers aiming to improve the accuracy on their models are approaching this task in many different ways. One of the approaches consists in looking at this task as a token classification problem, using different classifiers for each sub-task and joining them in an incremental way. These sub-tasks consist in classifying, for each head-dependent pair, the Part-Of-Speech tag of the head, the relative position between the two words and the distance between them. However, previous researches using this approach show that the bottleneck lies in the distance classifier. Recurrent Neural Networks are a kind of Neural Network that allows us to work using sequences of vectors, allowing for classification problems where both our input and output are sequences, making them a great choice for the problem at hand. This work studies the use of Recurrent Neural Networks, in specific Long Short-Term Memory networks, for the head-dependent distance classifier sub-task as a sequence-to-sequence classification problem. To evaluate its efficiency, this work follows the line of previous researches and makes use of the Portuguese corpus of the Conference on Computational Natural Language Learning 2006 Shared Task. The resulting model attains 95.27 percent precision, which is better than the previous results obtained using incremental models. [pt] CLASSIFICACAO [en] CLASSIFICATION [pt] REDES NEURAIS RECORRENTES [en] RECURRENT NEURAL NETWORKS [pt] LONG SHORT-TERM MEMORY [en] LONG SHORT-TERM MEMORY [pt] ARVORES DE DEPENDENCIA [en] DEPENDENCY TREES [pt] CLASSE GRAMATICAL [en] PART-OF-SPEECH
42	Modèles exponentiels et contraintes sur les espaces de recherche en traduction automatique et pour le transfert cross-lingue / Log-linear Models and Search Space Constraints in Statistical Machine Translation and Cross-lingual Transfer Pécheux, Nicolas 27 September 2016 (has links) La plupart des méthodes de traitement automatique des langues (TAL) peuvent être formalisées comme des problèmes de prédiction, dans lesquels on cherche à choisir automatiquement l'hypothèse la plus plausible parmi un très grand nombre de candidats. Malgré de nombreux travaux qui ont permis de mieux prendre en compte la structure de l'ensemble des hypothèses, la taille de l'espace de recherche est généralement trop grande pour permettre son exploration exhaustive. Dans ce travail, nous nous intéressons à l'importance du design de l'espace de recherche et étudions l'utilisation de contraintes pour en réduire la taille et la complexité. Nous nous appuyons sur l'étude de trois problèmes linguistiques — l'analyse morpho-syntaxique, le transfert cross-lingue et le problème du réordonnancement en traduction — pour mettre en lumière les risques, les avantages et les enjeux du choix de l'espace de recherche dans les problèmes de TAL.Par exemple, lorsque l'on dispose d'informations a priori sur les sorties possibles d'un problème d'apprentissage structuré, il semble naturel de les inclure dans le processus de modélisation pour réduire l'espace de recherche et ainsi permettre une accélération des traitements lors de la phase d'apprentissage. Une étude de cas sur les modèles exponentiels pour l'analyse morpho-syntaxique montre paradoxalement que cela peut conduire à d'importantes dégradations des résultats, et cela même quand les contraintes associées sont pertinentes. Parallèlement, nous considérons l'utilisation de ce type de contraintes pour généraliser le problème de l'apprentissage supervisé au cas où l'on ne dispose que d'informations partielles et incomplètes lors de l'apprentissage, qui apparaît par exemple lors du transfert cross-lingue d'annotations. Nous étudions deux méthodes d'apprentissage faiblement supervisé, que nous formalisons dans le cadre de l'apprentissage ambigu, appliquées à l'analyse morpho-syntaxiques de langues peu dotées en ressources linguistiques.Enfin, nous nous intéressons au design de l'espace de recherche en traduction automatique. Les divergences dans l'ordre des mots lors du processus de traduction posent un problème combinatoire difficile. En effet, il n'est pas possible de considérer l'ensemble factoriel de tous les réordonnancements possibles, et des contraintes sur les permutations s'avèrent nécessaires. Nous comparons différents jeux de contraintes et explorons l'importance de l'espace de réordonnancement dans les performances globales d'un système de traduction. Si un meilleur design permet d'obtenir de meilleurs résultats, nous montrons cependant que la marge d'amélioration se situe principalement dans l'évaluation des réordonnancements plutôt que dans la qualité de l'espace de recherche. / Most natural language processing tasks are modeled as prediction problems where one aims at finding the best scoring hypothesis from a very large pool of possible outputs. Even if algorithms are designed to leverage some kind of structure, the output space is often too large to be searched exaustively. This work aims at understanding the importance of the search space and the possible use of constraints to reduce it in size and complexity. We report in this thesis three case studies which highlight the risk and benefits of manipulating the seach space in learning and inference.When information about the possible outputs of a sequence labeling task is available, it may seem appropriate to include this knowledge into the system, so as to facilitate and speed-up learning and inference. A case study on type constraints for CRFs however shows that using such constraints at training time is likely to drastically reduce performance, even when these constraints are both correct and useful at decoding.On the other side, we also consider possible relaxations of the supervision space, as in the case of learning with latent variables, or when only partial supervision is available, which we cast as ambiguous learning. Such weakly supervised methods, together with cross-lingual transfer and dictionary crawling techniques, allow us to develop natural language processing tools for under-resourced languages. Word order differences between languages pose several combinatorial challenges to machine translation and the constraints on word reorderings have a great impact on the set of potential translations that is explored during search. We study reordering constraints that allow to restrict the factorial space of permutations and explore the impact of the reordering search space design on machine translation performance. However, we show that even though it might be desirable to design better reordering spaces, model and search errors seem yet to be the most important issues. Traduction automatique Contraintes de réordonnancements Étiquetage morpho-Syntaxique Transfert cross-Lingue Apprentissage faiblement supervisé Champs markoviens aléatoires Statistical Machine Translation Reordering Contraints Part-Of-Speech Tagging Cross-Lingual Transfer Weakly Supervised Learning Conditional Random Fields
43	Genetic Algorithms in the Brill Tagger : Moving towards language independence Bjerva, Johannes January 2013 (has links) The viability of using rule-based systems for part-of-speech tagging was revitalised when a simple rule-based tagger was presented by Brill (1992). This tagger is based on an algorithm which automatically derives transformation rules from a corpus, using an error-driven approach. In addition to performing on par with state of the art stochastic systems for part-of-speech tagging, it has the advantage that the automatically derived rules can be presented in a human-readable format. In spite of its strengths, the Brill tagger is quite language dependent, and performs much better on languages similar to English than on languages with richer morphology. This issue is addressed in this paper through defining rule templates automatically with a search that is optimised using Genetic Algorithms. This allows the Brill GA-tagger to search a large search space for templates which in turn generate rules which are appropriate for various target languages, which has the added advantage of removing the need for researchers to define rule templates manually. The Brill GA-tagger performs significantly better (p<0.001) than the standard Brill tagger on all 9 target languages (Chinese, Japanese, Turkish, Slovene, Portuguese, English, Dutch, Swedish and Icelandic), with an error rate reduction of between 2% -- 15% for each language. / Da Brill (1992) presenterte sin enkle regelbaserte ordklasse-tagger ble det igjen aktuelt å bruke regelbaserte system for tagging av ordklasser. Taggerens grunnlag er en algoritme som automatisk lærer seg transformasjonsregler fra et korpus. I tillegg til at taggeren yter like bra som moderne stokastiske metoder for ordklasse-tagging har Brill-taggeren den fordelen at reglene den lærer seg kan presenteres i et format som lett kan oppfattes av mennesker. Til tross for sine styrker er Brill-taggeren relativt språkavhengig ettersom den fungerer mye bedre for språk som ligner engelsk enn språk med rikere morfologi. Denne oppgaven forsøker å løse dette problemet gjennom å definere regelmaler automatisk med et søk som er optimert med Genetiske Algoritmer. Dette lar Brill GA-taggeren søke gjennom et mye større område enn den ellers kunne ha gjort etter maler som i sin tur genererer regler som er tilpasset målspråket, hvilket også har fordelen at forskere ikke trenger å definere regelmaler manuelt. Brill GA-taggeren yter signifikant bedre (p<0.001) enn Brill-taggeren på alle 9 målspråk (Kinesisk, Japansk, Tyrkisk, Slovensk, Portugisisk, Engelsk, Nederlandsk, Svensk og Islandsk), med en feilprosent som er mellom 2% og 15% lavere i alle språk. / När Brill (1992) presenterade sin enkla regelbaserade ordklasstaggare blev det återigen aktuellt att använda regelbaserade system för taggning av ordklasser. Taggaren är baserad på en algoritm som automatiskt lär sig transformationsregler från en korpus. Bortsett från att taggaren fungerar lika bra som moderna stokastiska metoder för ordklasstaggning har den också fördelen att reglerna som den lär sig kan presenteras i ett format som lätt kan läsas av människor. Trots sina styrkor är Brill-taggeren relativt språkberoende i och med att den fungerar mycket bättre för språk som liknar engelska än för språk med rikare morfologi. Den här uppsatsen försöker att lösa detta problem genom att definiera regelmallar automatiskt med en sökning som är optimerad med Genetiska Algoritmer. Detta gör att Brill GA-taggaren kan söka genom ett mycket större område än den annars skulle ha kunnat göra efter mallar som i sin tur genererar regler som är anpassade för målspråket. Detta har också fördelen att forskare inte behöver definiera regelmallar manuellt. Brill GA-taggeren får signifikant bättre träffsäkerhet (p<0.001) än Brill-taggeren på alla 9 målspråken (Kinesiska, Japanska, Turkiska, Slovenska, Portugisiska, Engelska, Nederländska, Svenska och Isländska), med en felprocent som är mellan 2% och 15% lägre för alla språk. Genetic Algorithms Transformation-Based Learning Genetiska Algoritmer Språkoberoende Ordklasstaggning Transformationsbaserad Inlärning Genetiske Algoritmer Språkuavhengig Ordklasstagging Transformasjonsbasert Innlæring General Language Studies and Linguistics
44	Od slovesa ke jménu a předložkám Departicipiální formy v češtině: forma, funkce, konkurence / From Verbs to Nouns and Prepositions. Departicipial Forms in Czech: Form, Function, Complementarity Richterová, Olga January 2017 (has links) From Verbs to Nouns and Prepositions Departicipial Forms in Czech: Form, Function, Complementarity Olga Richterová Abstract The present work gives a rough overall picture of the behaviour of departicipial forms ending in - ící/oucí (e.g. vedoucí - leading or leader, or fungující - functioning, working) in synchronous written Czech. In literature, these forms are called participial adjectives, deverbal adjectives or derivatives of the present transgressive. The main focus of the dissertation is on the word-class categorization of the analyzed forms, defined by the variety of functions which departicipial forms fulfill. Part-of-speech membership recommendations belong to the main outcomes of the work, the description of verbal, nominal or even preposition-like behaviour of the analyzed forms being one of the most prominent goals of the whole effort. The analysis is centered around the most frequent form, vedoucí (including the lexical unit vedoucí k - leading to). The preservation of prepositional valency was identified as one of the criteria of possible prepositionalization of these forms. Given the absence of reliable tagging, the work is mainly based on manual analyses of random corpora samples. Furthermore, it makes an innovative use of a tool called 'p-kolokace' (p- collocations), which is based on two...
45	Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data Fosteri, Iliana January 2023 (has links) Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance. dependency parsing part-of-speech tagging low-resource languages transcribed speech large language models cross-lingual learning transfer learning multi-task learning Universal Dependencies
46	A Comparative Analysis of Text Usage and Composition in Goscinny's <em>Le petit Nicolas</em>, Goscinny's <em>Astérix</em>, and Albert Uderzo's <em>Astérix</em> Meyer, Dennis Scott 05 March 2012 (has links) (PDF) The goal of this thesis is to analyze the textual composition of René Goscinny’s Astérix and Le petit Nicolas, demonstrating how they differ and why. Taking a statistical look at the comparative qualities of each series of works, the structural differences and similarities in language use in these two series and their respective media are highlighted and compared. Though one might expect more complicated language use in traditional text by virtue of its format, analysis of average word length, average sentence length, lexical diversity, the prevalence of specific forms (the passé composé, possessive pronouns, etc.), and preferred collocations (ils sont fous, ces romains !) shows interesting results. Though Le petit Nicolas has longer sentences and more relative pronouns (and hence more clauses per sentence on average), Astérix has longer words and more lexical diversity. A similar comparison of the albums of Astérix written by Goscinny to those of Uderzo, paying additional attention to the structural elements of each album (usage of narration and sound effects, for example) shows that Goscinny's love of reusing phrases is far greater than Uderzo's, and that the two have very different ideas of timing as expressed in narration boxes. René Goscinny Albert Uderzo Astérix Le petit Nicolas lexical diversity TreeTagger part of speech tagging lemmatization average word length average sentence length verb tense choice preferred collocations comics les bandes dessinées Italian Language and Literature
47	NATURAL LANGUAGE PROCESSING-BASED AUTOMATED INFORMATION EXTRACTION FROM BUILDING CODES TO SUPPORT AUTOMATED COMPLIANCE CHECKING Xiaorui Xue (13171173) 29 July 2022 (has links) <p> </p> <p>Traditional manual code compliance checking process is a time-consuming, costly, and error-prone process that has many shortcomings (Zhang & El-Gohary, 2015). Therefore, automated code compliance checking systems have emerged as an alternative to traditional code compliance checking. However, computer software cannot directly process regulatory information in unstructured building code texts. To support automated code compliance checking, building codes need to be transformed to a computer-processable, structured format. In particular, the problem that most automated code compliance checking systems can only check a limited number of building code requirements stands out.</p> <p>The transformation of building code requirements into a computer-processable, structured format is a natural language processing (NLP) task that requires highly accurate part-of-speech (POS) tagging results on building codes beyond the state of the art. To address this need, this dissertation research was conducted to provide a method to improve the performance of POS taggers by error-driven transformational rules that revise machine-tagged POS results. The proposed error-driven transformational rules fix errors in POS tagging results in two steps. First, error-driven transformational rules locate errors in POS tagging by their context. Second, error-driven transformational rules replace the erroneous POS tag with the correct POS tag that is stored in the rule. A dataset of POS tagged building codes, namely the Part-of-Speech Tagged Building Codes (PTBC) dataset (Xue & Zhang, 2019), was published in the Purdue University Research Repository (PURR). Testing on the dataset illustrated that the method corrected 71.00% of errors in POS tagging results for building codes. As a result, the POS tagging accuracy on building codes was increased from 89.13% to 96.85%.</p> <p>This dissertation research was conducted to provide a new POS tagger that is tailored to building codes. The proposed POS tagger utilized neural network models and error-driven transformational rules. The neural network model contained a pre-trained model and one or more trainable neural layers. The neural network model was trained and fine-tuned on the PTBC (Xue & Zhang, 2019) dataset, which was published in the Purdue University Research Repository (PURR). In this dissertation research, a high-performance POS tagger for building codes using one bidirectional Long-short Term Memory (LSTM) Recurrent Neural Network (RNN) trainable layer, a BERT-Cased-Base pre-trained model, and 50 epochs of training was discovered. This model achieved 91.89% precision without error-driven transformational rules and 95.11% precision with error-driven transformational rules, outperforming the otherwise most advanced POS tagger’s 89.82% precision on building codes in the state of the art.</p> <p>Other automated information extraction methods were also developed in this dissertation. Some automated code compliance checking systems represented building codes in logic clauses and used pattern matching-based rules to convert building codes from natural language text to logic clauses (Zhang & El-Gohary 2017). A ruleset expansion method that can expand the range of checkable building codes of such automated code compliance checking systems by expanding their pattern matching-based ruleset was developed in this dissertation research. The ruleset expansion method can guarantee: (1) the ruleset’s backward compatibility with the building codes that the ruleset was already able to process, and (2) forward compatibility with building codes that the ruleset may need to process in the future. The ruleset expansion method was validated on Chapters 5 and 10 of the International Building Code 2015 (IBC 2015). The Chapter 10 of IBC 2015 was used as the training dataset and the Chapter 5 of the IBC 2015 was used as the testing dataset. A gold standard of logic clauses was published in the Logic Clause Representation of Building Codes (LCRBC) dataset (Xue & Zhang, 2021). Expanded pattern matching-based rules were published in the dissertation (Appendix A). The expanded ruleset increased the precision, recall, and f1-score of the logic clause generation at the predicate-level by 10.44%, 25.72%, and 18.02%, to 95.17%, 96.60%, and 95.88%, comparing to the baseline ruleset, respectively. </p> <p>Most of the existing automated code compliance checking research focused on checking regulatory information that was stored in textual format in building code in text. However, a comprehensive automated code compliance checking process should be able to check regulatory information stored in other parts, such as, tables. Therefore, this dissertation research was conducted to provide a semi-automated information extraction and transformation method for tabular information processing in building codes. The proposed method can semi-automatically detect the layouts of tables and store the extracted information of a table in a database. Automated code compliance checking systems can then query the database for regulatory information in the corresponding table. The algorithm’s initial implementation accurately processed 91.67 % of the tables in the testing dataset composed of tables in Chapter 10 of IBC 2015. After iterative upgrades, the updated method correctly processed all tables in the testing dataset. </p> Construction engineering Natural language processing techniques Artificial Intelligence (AI) Automated compliance checking Automated information extraction Natural language processing Part-of-speech tagging Building design review Machine Learning
48	Die deelwoord in Afrikaans : perspektiewe vanuit ŉ kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella Butler Butler, Anna Petronella January 2014 (has links) During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations. In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions. Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb. Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions. The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014 Afrikaans Participle Present participle Past participle Cognitive grammar Lemmatisation Part-of-speech tagging Participle form of the verb Deelwoord Onvoltooide deelwoord Voltooide deelwoord Partisipium Kognitiewe grammatika Lemmatisering Woordsoortetikettering Deelwoordvorm van die werkwoord
49	Die deelwoord in Afrikaans : perspektiewe vanuit ŉ kognitiewe gebruiksgebaseerde beskrywingsraamwerk / Anna Petronella Butler Butler, Anna Petronella January 2014 (has links) During an annotation project of 60 000 Afrikaans tokens by CTexT (North-West University), the developers had to answer difficult questions with regard to the annotation of the participle specifically. One of the main reasons for this difficulty is that the different sources that offer descriptions of the participle in Afrikaans are conflicting in such descriptions and, depending on which source is consulted, would provide different annotations. In order to clarify the uncertainty of how the participle in Afrikaans should be annotated, the available literature was surveyed to determine the exact nature of the participle in Afrikaans. The descriptions of the participle in Afrikaans were further situated in the context of how participles are described in English and Dutch. The conclusion that was reached is that the participle form of the verb in Afrikaans should be distinguished from the periphrastic construction form of the verb that appears in the past and the passive constructions. Furthermore, this study determined to what extent a cognitive usage-based descriptive framework could contribute towards a better understanding of the participle in Afrikaans. The first conclusion that was reached is that a characterisation of the participle within this framework enables one to make conceptual sense of the morphological structure of the participle. The study shows how the morphological structure of the participle is responsible for the fact that the verbal character of the participle stays intact while the participle functions as another word class. Another conclusion that was reached regarding the characterisation of the past and passive constructions from a cognitive usage-based descriptive framework is that the framework makes it possible to distinguish conceptually between the periphrastic form of the verb and the participle form of the verb. Lastly, the study determined to what extent new insights into the participle in Afrikaans could lead to alternative lemmatisation and part-of-speech-tagging of participles in the NCHLT-corpus. The conclusion that was reached is that participles are primarily lemmatised satisfactorily. Proposals that are made in order to improve the lemmatisation protocol, include: (i) distinguishing in the protocol between periphrastic forms of the verb and the participle form of the verb; (ii) repeating the guideline for the lemmatisation of compound verbs that was provided for verb lemmatisation under the lemmatisation guidelines for participles; (iii) adding more lexicalised adjectives to the existing list in the protocol; and (iv) suggesting a guideline that would allow one to consistently distinguish between participles that could function as adverbs as well as participles that could function as prepositions. The conclusion that was reached after the analysis of the part-of-speech protocol is that the part-of-speech tag set in Afrikaans does not allow for the specific attributes and values of participles to be taken into account. Participles in the Afrikaans tag set are tagged strictly according to the function of the word. Although such an approach is very practical, it results in a linguistically poorer part-of-speech tag that ignores the verbal character of the participle. An alternative strategy is therefore suggested for the part-of-speech tagging of participles in which the attributes and values of the verb tag are adapted. / MA (Linguistics and Literary Theory), North-West University, Potchefstroom Campus, 2014 Afrikaans Participle Present participle Past participle Cognitive grammar Lemmatisation Part-of-speech tagging Participle form of the verb Deelwoord Onvoltooide deelwoord Voltooide deelwoord Partisipium Kognitiewe grammatika Lemmatisering Woordsoortetikettering Deelwoordvorm van die werkwoord
50	Apolonios Dyskolos a jeho spis Peri antonymias. Úvodní studie a komentovaný překlad části textu. / Apolonios Dyskolos and his Treatise On pronouns. Introduction and Translation of the Part of Text with Commentary. Hřibal, Jan January 2012 (has links) The introductory study presents overall Apollonius Dyscolus, the most significant ancient greek grammarian. It deals with his person, important technical issues of his work, and particularly with his grammar study, focusing on fundamental grammar classifications and thougt structure. The introductory study is accopanied by the translation of a few introductory chapters (GG II,1,1,3.1-17.17) of Apollonius' treatise Περὶ ἀντωνυμίας (Peri antonymias, On Pronouns), and by scholarly commentary to the translation.

Search results