Noun phrases convey key information in communication and are of interest in NLP tasks. A base NP is defined as the headword and left-hand side modifiers of a noun phrase. In this thesis, we identify base NPs in Universal Dependencies treebanks in English and French using an RNN architecture.The data of this thesis consist of three multi-layered treebanks in which each sentence is annotated in both constituency and dependency formalisms. To build our training data, we find base NPs in the constituency layers and project them onto the dependency layer by labeling corresponding tokens. For input features, we devised 18 configurations of features available in UD annotation. We train RNN models with LSTM and GRU cells with different numbers of epochs on these configurations of features.Tested on monolingual and bilingual test sets, our models delivered satisfactory token-based F1 scores (92.70% on English, 94.87% on French, 94.29% on bilingual test set). The most predicative configuration of features is found out to be pos_dep_parent_child_morph, which covers 1) dependency relations between the current token, its syntactic head, its leftmost and rightmost syntactic dependents; 2) PoS tags of these tokens; and 3) morphological features of the current token.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-412778 |
Date | January 2020 |
Creators | Wang, Tonghe |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0016 seconds