Spelling suggestions: "subject:"categorization"" "subject:"subcategorias""
1 |
Minimize Exponence: Economy Effects on a Model of the Morphosyntactic Component of the GrammarSiddiqi, Daniel A. January 2006 (has links)
Working within the morphosyntactic framework of Distributed Morphology (DM, Halle and Marantz 1993, 1994) within the Minimalist Program (Chomsky 1995), this dissertation proposes a new economy constraint on the grammar, MINIMIZE EXPONENCE, which selects the derivation that realizes all its interpretable features with the fewest morphemes. The purpose of this proposal is to capture the conflicting needs of the grammar to be both maximally contrastive and maximally efficient.I show that the constraint MINIMIZE EXPONENCE has a number of effects on analyses of morphosyntactic phenomena. I propose that, in order to satisfy MINIMIZE EXPONENCE, the roots in a derivation fuse with the functional heads projected above them, resulting in a simplex head that contains both a root and interpretable features. Following the tenets of DM, this head is now a target for the process of Vocabulary insertion. Since the target node contains both content and functional information, so too can Vocabulary Items (VIs) be specified for both types of information. This allows VIs such as eat and ate to compete with each other. This competition of forms linked to the same root allows for a new model of root allomorphy within the framework of DM. In this model of root allomorphy, following proposals by Pfau (2000), VIs that realize roots participate in competition in the same was as do VIs that realize abstract morphemes. Since root VIs are participating in competition and are specified for both content and formal features, the need for licensing through secondary exponence as proposed by Harley and Noyer (2000) is removed from the framework. Further, since eat and ate in this model are different VIs with different specifications that compete with each other for insertion, this model of root allomorphy also eliminates the need for readjustment rules as proposed by Halle and Marantz (1993, 1994) and elaborated on by Marantz (1997). This new model of root allomorphy allows for an account of the blocking of regular inflection in English nominal compounds (e.g. *rats-catcher), which was problematic for theorists working with DM, given the tenets of the framework.I also show that the fusion of roots and functional elements driven by MINIMIZE EXPONENCE allows for a new account of subcategorization. The model of subcategorization presented here falls out of the following facts: 1) arguments are introduced by functional heads; 2) those heads fuse with the root they are projected above, resulting in the node containing both the root and the features of the functional heads; 3) since the root now contains both the root and the formal features, the corresponding VI can be specified for both; 4) VIs that realize roots can also be specified for compatibility or incompatibility of the features of the functional heads that license argument structure. The result here is an underspecification model of subcategorization that predicts a number of behaviors of verbs with respect to their argument structure that it is difficult for a full specification model to account for. Those include polysemy (I ran the ball to Mary) and structural coercion (I thought the book to Mary).
|
2 |
Automatic syntactic analysis of learner EnglishHuang, Yan January 2019 (has links)
Automatic syntactic analysis is essential for extracting useful information from large-scale learner data for linguistic research and natural language processing (NLP). Currently, researchers use standard POS taggers and parsers developed on native language to analyze learner language. Investigation of how such systems perform on learner data is needed to develop strategies for minimizing the cross-domain effects. Furthermore, POS taggers and parsers are developed for generic NLP purposes and may not be useful for identifying specific syntactic constructs such as subcategorization frames (SCFs). SCFs have attracted much research attention as they provide unique insight into the interplay between lexical and structural information. An automatic SCF identification system adapted for learner language is needed to facilitate research on L2 SCFs. In this thesis, we first provide a comprehensive evaluation of standard POS taggers and parsers on learner and native English. We show that the common practice of constructing a gold standard by manually correcting the output of a system can introduce bias to the evaluation, and we suggest a method to control for the bias. We also quantitatively evaluate the impact of fine-grained learner errors on POS tagging and parsing, identifying the most influential learner errors. Furthermore, we show that the performance of probabilistic POS taggers and parsers on native English can predict their performance on learner English. Secondly, we develop an SCF identification system for learner English. We train a machine learning model on both native and learner English data. The system can label individual verb occurrences in learner data for a set of 49 distinct SCFs. Our evaluation shows that the system reaches an accuracy of 84\% F1 score. We then demonstrate that the level of accuracy is adequate for linguistic research. We design the first multidimensional SCF diversity metrics and investigate how SCF diversity changes with L2 proficiency on a large learner corpus. Our results show that as L2 proficiency develops, learners tend to use more diverse SCF types with greater taxonomic distance; more advanced learners also use different SCF types more evenly and locate the verb tokens of the same SCF type further away from each other. Furthermore, we demonstrate that the proposed SCF diversity metrics contribute a unique perspective to the prediction of L2 proficiency beyond existing syntactic complexity metrics.
|
3 |
Polyvalent VerbsVogel, Ralf 13 July 1998 (has links)
Polyvalente Verben koennen mit unterschiedlichen Konstituentenmengen kombiniert sein, wobei deren Zahl und Art variieren. In den meisten Grammatikschulen sind Verben zentral fuer syntaktische Gestalt und semantische Interpretation von Saetzen. Sie bestimmen ueber ihre Subkategorisierungsrahmen, wieviele Komplemente welchen Typs im Satz realisiert werden. Daher ist Polyvalenz ein unerwartetes Phaenomen. Eine Diskussion verschiedener Ansaetze der generativen Grammatik ergibt, dass Subkategorisierung fuer die Erklaerung von Polyvalenz ungeeignet ist. Im zweiten Kapitel wird ein Modell fuer die konzeptuell-semantische Interpretation von Verben und Saetzen entwickelt, das dem Rechnung traegt: In Saetzen mit polyvalenten Verben bedingen die Komplemente des Verbs zusammen mit dem Verb die konzeptuell-semantische Interpretation. Die thematische Interpretation wird als inferentieller Prozess angesehen, der keinen Spezialfall allgemeiner konzeptuell-semantischer Interpretationsprozesse darstellt, sondern vielmehr in diese eingebunden ist. / Polyvalent verbs can be combined with different sets of complements. The variation concerns both number and type of the complements. In most grammar theoretical frameworks, verbs are of crucial importance for the syntactic structure and semantic interpretation of clauses. They determine via subcategorization frames how many complements of which type are realized. Polyvalence is therefore an unexpected phenomenon. A discussion of several approaches in generative grammar results in the claim that subcategorization is not very useful for the explanation of polyvalence. In the second chapter, a model for the conceptual-semantic interpretation of verbs and clauses is developed that takes polyvalence into account: the conceptual-semantic interpretation of clauses with polyvalent verbs is determined by the verb and complements together. Thematic interpretation is viewed as an inferential process that is embedded within the general conceptual-semantic interpretation processes, not their prerequisite.
|
4 |
Αυτόματη μάθηση συντακτικών εξαρτήσεων και ανάπτυξη γραμματικών της ελληνικής γλώσσας / Learning of syntactic dependencies and development of modern Greek grammarsΚερμανίδου, Κάτια Λήδα 25 June 2007 (has links)
Η παρούσα διατριβή έχει ως σκοπό της, πρώτον, την ανάκτηση συντακτικής πληροφορίας (αναγνώριση συμπληρωμάτων ρημάτων, ανάκτηση πλαισίων υποκατηγοριοποίησης (ΠΥ) ρημάτων, αναγνώριση των ορίων και του είδους των προτάσεων) αυτόματα μέσα από ελληνικά και αγγλικά σώματα κειμένων με την χρήση ποικίλων και καινοτόμων τεχνικών μηχανικής μάθησης και, δεύτερον, την θεωρητική περιγραφή της ελληνικής σύνταξης μέσω τυπικών γλωσσολογικών φορμαλισμών, όπως η γραμματική Ενοποίησης και η γραμματική Φραστικής Δομής Οδηγούμενη από τον Κύριο Όρο. Η διατριβή κινήθηκε πάνω στους εξής καινοτόμους άξονες: 1. Η προεπεξεργασία των σωμάτων κειμένων βασίστηκε σε ελάχιστους γλωσσολογικούς πόρους για να είναι δυνατή η μεταφορά των μεθόδων σε γλώσσες φτωχές σε υποδομή. 2. Η αντιμετώπιση του θορύβου που υπεισέρχεται στα δεδομένα εξ αιτίας της χρήσης ελάχιστων πόρων πραγματοποιείται με Μονόπλευρη Δειγματοληψία. Εντοπίζονται αυτόματα παραδείγματα δεδομένων που δεν προσφέρουν στην μάθηση και αφαιρούνται. Τα τελικά δεδομένα είναι πιο καθαρά και η απόδοση της μάθησης βελτιώνεται πολύ. 3. Αποδεικνύεται η χρησιμότητα της εξαχθείσας πληροφορίας. Η χρησιμότητα των συμπληρωμάτων φαίνεται από την αύξηση της απόδοσης της διαδικασίας ανάκτησης ΠΥ με την χρήση τους. Η χρησιμότητα των εξαγόμενων ΠΥ φαίνεται από την αύξηση της απόδοσης ενός ρηχού συντακτικού αναλυτή με την χρήση τους. 4. Οι μέθοδοι εφαρμόζονται και στα Αγγλικά και στα Ελληνικά για να φανεί η μεταφερσιμότητά τους σε διαφορετικές γλώσσες και για να πραγματοποιηθεί μια ενδιαφέρουσα σχετική σύγκριση ανάμεσα στις δύο γλώσσες. Τα αποτελέσματα είναι πολύ ενθαρρυντικά, συγκρίσιμα με, και σε πολλές περιπτώσεις καλύτερα από, προσεγγίσεις που χρησιμοποιούν εξελιγμένα εργαλεία προεπεξεργασίας. / The thesis aims firstly at the acquisition of syntactic information (detection of verb complements, acquisition of verb subcategorization frames (SF), detection of the boundaries and the semantic type of clauses) automatically from Modern Greek and English text corpora with the use of various state-of-the-art and novel machine learning techniques, and, secondly, at the theoretical description of the Greek syntax through formal grammatical theories like Unification Grammar and Head-driven Phrase Structure Grammar. The thesis has been based on the following novel axes: 1. Corpus pre-processing has been limited to the use of minimum linguistic resources to ensure the portability of the presented methodologies to languages that are poorly equipped with resources. 2. Due to the low pre-processing level, a significant amount of noise appears in the data, which is dealt with One-sided Sampling. Examples that do not contribute to the learning process are detected and removed. The final data set is clean and learning performance improves significantly. 3. The importance of the acquired information is proven. The importance of complements is shown by the improvement in the performance of the SF acquisition process after the incorporation of complement information. The importance of the acquired SF lexicon is shown by its incorporation in a shallow syntactic parser and the increase of the performance of the latter. 4. The methods are applied on Modern Greek and on English to show their portability across different languages and to allow for an interesting rough comparison between the two languages. The results are very satisfactory, comparable to, and in some cases better than, approaches utilizing sophisticated resources for pre-processing.
|
Page generated in 0.1181 seconds