Global ETD Search

61	Toward Annotation Efficiency in Biased Learning Settings for Natural Language Processing Effland, Thomas January 2023 (has links) The goal of this thesis is to improve the feasibility of building applied NLP systems for more diverse and niche real-world use-cases of extracting structured information from text. A core factor in determining this feasibility is the cost of manually annotating enough unbiased labeled data to achieve a desired level of system accuracy, and our goal is to reduce this cost. We focus on reducing this cost by making contributions in two directions: (1) easing the annotation burden by leveraging high-level expert knowledge in addition to labeled examples, thus making approaches more annotation-efficient; and (2) mitigating known biases in cheaper, imperfectly labeled real-world datasets so that we may use them to our advantage. A central theme of this thesis is that high-level expert knowledge about the data and task can allow for biased labeling processes that focus experts on only manually labeling aspects of the data that cannot be easily labeled through cheaper means. This combination allows for more accurate models with less human effort. We conduct our research on this general topic through three diverse problems with immediate applications to real-world settings. First, we study an applied problem in biased text classification. We encounter a rare-event text classification system that has been deployed for several years. We are tasked with improving this system's performance using only the severely biased incidental feedback provided by the experts over years of system use. We develop a method that combines importance weighting and an unlabeled data imputation scheme that exploits the selection-bias of the feedback to train an unbiased classifier without requiring additional labeled data. We experimentally demonstrate that this method considerably improves the system performance. Second, we tackle an applied problem in named entity recognition (NER) concerning learning tagging models from data that have very low recall for annotated entities. To solve this issue we propose a novel loss, the Expected Entity Ratio (EER), that uses an uncertain estimate of the proportion of entities in the data to counteract the false-negative bias in the data, encouraging the model to have the correct ratio of entities in expectation. We justify the principles of our approach by providing theory that shows it recovers the true tagging distribution under mild conditions. Additionally we provide extensive empirical results that show it to be practically useful. Empirically, we find that it meets or exceeds performance of state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. We also show that, when combined with our approach, a novel sparse annotation scheme can outperform exhaustive annotation for modest annotation budgets. Third, we study the challenging problem of syntactic parsing in low-resource languages. We approach the problem from a cross-lingual perspective, building on a state-of-the-art transfer-learning approach that underperforms on ``distant'' languages that have little to no representation in the training corpus. Motivated by the field of syntactic typology, we introduce a general method called Expected Statistic Regularization (ESR) to regularize the parser on distant languages according to their expected typological syntax statistics. We also contribute general approaches for estimating the loss supervision parameters from the task formalism or small amounts of labeled data. We present seven broad classes of descriptive statistic families and provide extensive experimental evidence showing that using these statistics for regularization is complementary to deep learning approaches in low-resource transfer settings. In conclusion, this thesis contributes approaches for reducing the annotation cost of building applied NLP systems through the use of high-level expert knowledge to impart additional learning signal on models and cope with cheaper biased data. We publish implementations of our methods and results, so that they may facilitate future research and applications. It is our hope that the frameworks proposed in this thesis will help to democratize access to NLP for producing structured information from text in wider-reaching applications by making them faster and cheaper to build. Computer science Statistics Artificial intelligence Grammar, Comparative and general--Syntax
62	Perception and predication : a synchronic and diachronic analysis of Dutch descriptive perception verbs as evidential copular verbs Poortvliet, Marjolein January 2018 (has links) Descriptive perception verbs have failed to receive a uniform analysis in previous verb classifications (cf. Chomsky 1965, Rogers 1974, Hengeveld 1992, Levin 1993, Van Eynde et al. 2014). This thesis argues that the descriptive perception verbs in Dutch (i.e. eruitzien 'look', klinken 'sound', voelen 'feel', ruiken 'smell', and smaken 'taste') should be classified as copular verbs, much like lijken 'seem' and schijnen 'seem'. This classification is supported by both the synchronic and diachronic behaviour of these verbs in Dutch. Synchronically, proposing that Germanic copular verbs (as opposed to copulas) are defined by their syntax rather than their (empty) semantics, I discuss that the Dutch descriptive perception verbs behave like stereotypical copular verbs: they require a predicative complement, usually in the form of an adjective. Semantically, the Dutch descriptive perception verbs are much like the copular verbs blijken 'turn out', lijken 'seem' and schijnen 'seem' in terms of epistemicity and evidentiality. Diachronically, I hypothesize that the Dutch descriptive perception verbs have evolved from one of the following two origins: either from intransitive verbs (as is the case for klinken and ruiken), much like English remain, through grammaticalization processes of semantic bleaching and reanalysis; or from cognitive perception verbs (as is the case of eruitzien and voelen), as found in Latin, Japanese and Zulu, through the process of argument reordering. The origin of smaken is not clear, and is left for future research. I show that other Germanic evidential copular verbs (i.e. lijken, schijnen 'seem', scheinen 'seem', seem) have developed diachronically in a uniform fashion, suggesting the following grammaticalization path: from a lexical verb to a copular verb, to taking a that-complement, an infinitival complement or a like-complement, and eventually being used in parenthetical constructions. The results of this thesis indicate that the Dutch descriptive perception verbs are only at the beginning of this grammaticalization path, but are on their way to becoming grammaticalized evidential copular verbs.
63	The acquisition of English articles by Mandarin-speaking learners: an optimality-theoretic syntax account Hu, Yuxiu, Lucille., 胡玉秀. January 2011 (has links) published_or_final_version / Linguistics / Doctoral / Doctor of Philosophy Optimality theory (Linguistics) English language - Article.
64	A minimalist analysis of expletive daar (“there”) and dit (“it”) constructions in Afrikaans De Bruin, Jeané 03 1900 (has links) Thesis (MA (General Linguistics))--University of Stellenbosch, 2011. / Bibliography / ENGLISH ABSTRACT: This study deals with syntactic aspects of expletive daar (“there”) and dit (“it”) constructions in Afrikaans. Previous analyses of these constructions have mostly been of a non-formalistic nature (e.g. Barnes 1984; Donaldson 1993; Du Plessis 1977; Ponelis 1979, 1993). The present study investigates the properties of Afrikaans expletive constructions within the broad theoretical framework of Minimalist Syntax. Four recent minimalist analyses of expletive constructions in English, Dutch and German are set out, namely those proposed by Bowers (2002), Felser and Rupp (2001), Richards and Biberauer (2005), and Radford (2009). Against this background, an analysis is proposed of transitive, non-passive unaccusative, passive unaccusative, and unergative expletive constructions in Afrikaans. Throughout, the focus is on whether the devices available within Minimalist Syntax, and specifically the Expletive Conditions proposed by Radford (2009), provide an adequate framework in which the relevant facts of Afrikaans can be described and explained. Where required, modifications to the devices in question are proposed. / AFRIKAANSE OPSOMMING: Hierdie studie handel oor sintaktiese aspekte van ekspletiewe daar- en dit-konstruksies in Afrikaans. Vorige analises van dié konstruksies was grootliks nie-formalisties van aard (bv. Barnes 1984; Donaldson 1993; Du Plessis 1977; Ponelis 1979, 1993). Die huidige studie ondersoek die eienskappe van Afrikaanse ekspletiewe konstruksies binne die breë teoretiese raamwerk van Minimalistiese Sintaksis. Vier onlangse minimalistiese analises van ekspletiewe konstruksies in Engels, Nederlands en Duits word uiteengesit, naamlik dié wat voorgestel is deur Bowers (2002), Felser en Rupp (2001), Richards en Biberauer (2005), en Radford (2009). Teen hierdie agtergrond word ’n analise voorgestel van transitiewe, nie-passiewe onakkusatiewe, passiewe onakkusatiewe, en onergatiewe ekspletiewe konstruksies in Afrikaans. Die fokus is deurgaans op die vraag of die meganismes wat beskikbaar is binne Minimalistiese Sintaksis, en spesifiek die drie Ekspletiewe Voorwaardes wat voorgestel word deur Radford (2009), ’n toereikende raamwerk bied waarbinne die tersaaklike feite van Afrikaans beskryf en verklaar kan word. Waar nodig, word aanpassings aan die betrokke meganismes voorgestel. Afrikaans syntax Expletive constructions Minimalism Theses -- Linguistics Dissertations -- Linguistics Minimalist theory (Linguistics) General linguistics
65	「主之謂」: 上古漢語動詞名物化研究. / 主之謂: 上古漢語動詞名物化研究 / 上古漢語動詞名物化研究 / CUHK electronic theses & dissertations collection / "Zhu zhi wei": shang gu Han yu dong ci ming wu hua yan jiu. / Zhu zhi wei: shang gu Han yu dong ci ming wu hua yan jiu / Shang gu Han yu dong ci ming wu hua yan jiu January 2013 (has links) 陳遠秀. / "2013年9月". / "2013 nian 9 yue". / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 90-94). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract in Chinese and English. / Chen Yuanxiu. Grammar, Comparative and general--Verb Grammar, Comparative and general--Noun Grammar, Comparative and general--Syntax Typology (Linguistics) Chinese language Chinese language--To 600
66	Locality principles and the acquisition of syntactic knowledge Berwick, Robert Cregar January 1982 (has links) Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1982. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Bibliography: leaves 439-451. / by Robert Cregar Berwick. / Ph.D. Parsifal (Computer program) Computational linguistics Grammar, Comparative and general Syntax LISP (Computer program language) Relational grammar Language acquisition
67	Scrambling in Afrikaans. Louw, Frederik Wilhelm. January 2012 (has links) ‘Scrambling’ languages allow arguments in a given sentence to be ordered in a variety of ways while leaving the grammatical roles of these arguments unchanged. West Germanic languages like German, Dutch, Yiddish, and West Flemish exhibit, to different extents, scrambling properties (Haider, 2006; Grewendorf, 2005; De Hoop, 2003). One well established assumption is that a prerequisite for scrambling is a rich (overt) case morphology: Grammatical relations need to be overtly marked on arguments in order for them to freely permute (Haider, 2006; Mahajan, 2003). Afrikaans, like other West Germanic languages, also allows a certain degree of flexibility (Molnárfi, 2002; Biberauer & Richards 2006; Conradie, 2007 Huddlestone, 2010). Generally, however, it is assumed to be much more rigid than a richly inflected language like German, in part because Afrikaans is the most morphologically ‘impoverished’ of all the West Germanic languages (Molnárfi, 2002; Biberauer & Richards, 2006; Huddlestone, 2010). In this thesis, I draw attention to certain double object constructions in Afrikaans that allow German-like flexibility without German-like morphology. Afrikaans allows the indirect and direct object of particular verbs to optionally invert their canonical order in finite embedded sentences without V-raising. I propose an analysis within a minimalist framework that accounts for the flexibility exhibited by these constructions. / Thesis (M.A.)-University of KwaZulu-Natal, Durban, 2012. Generative grammar. Language and languages--Variation. Afrikaans language--Syntax. Germanic languages--Syntax. Theses--Linguistics.
68	From Alfred Schutz to Machine Learning: Temporal Orientation, Meaning and Social Action Cleveland, Jonathan January 2023 (has links) This dissertation offers a novel quantitative method for assessing an actor's subjective temporal orientation. Our method involves the use of supervised machine learning techniques in concert with natural language processing tools and linguistic principles. We suggest our method may offer a clandestine technique for extracting aspects of an actor’s temporal orientations from right behind their back. This capacity occurs because of the unique ways time references are reflected in language syntax. This reflection does not simply occur in face-to-face spoken interactions, but also resides in recorded vocal transcripts and within textual documents articulated by speakers for a social audience (e.g., political speeches). . From a social theory point of view, we argue that our technique can help objectify some of the major links theorists have long made between the temporal features of mind, subjective meaning, and social processes. Temporal orientation has long been defined as a tripartite mental process. Edmund Husserl famously defined this process as involving retention (a mental focus on past), presentation (a mental focus on the present) or protention (a mental envisioning of the future). From a pure phenomenology perspective, Husserl’s innovation was to link this mental interlocking process with meaning-making. For Husserl, it was directly through an actor’s temporal orientation that meaning became variably constituted and the problem of subjectivity emerged. From a sociological point of view, it is primarily through Alfred Schutz’s formulation of social phenomenology that Husserl’s tripartite system was opened to accommodate the influence of the social in meaning-making. This opening has possessed a long-standing contradiction. For Schutz, endogenous social structure could affect where an actor temporally orients. The resulting implication is that social structure could have a direct effect on how actors assign specific meanings in social systems. Even more, social structure could facilitate shared temporal orientations among actors. However, Schutz also promoted the idea that different temporal orientations could explain how different meanings could be assigned to the same social object by disparate actors. This possibility served as the centerpiece of Schutz’s well-known methodological critique of Max Weber’s direct linkage between subjective meaning, motive, and empathetic based interpretations of social action. To carry out our efforts to quantify how the subjective processes of temporal orientation appear to be influenced by endogenous social processes, we employed our algorithm on three different text-based data sets. We suggest these datasets possess strong reflections of the social world. The first dataset entails a collection of matched twitter tweets that correspond to Trump’s reelection bid and Biden’s challenge during the 2020 period. In this dataset, our method illustrates how both candidates appear to have different temporal orientations despite being bounded by a similar social event. We suggest this finding may reflect the relationship between what Schütz called inner duration and the influence of external stocks of knowledge (i.e., external structures.) The second dataset corresponds to a recorded conversational transcript of the Cuban missile crisis, taken from President Kennedy’s Executive Committee of the National Security Council (ExComm) on the 6th of October in 1962. Using our algorithm, we offer objective measures of homogenous temporal orientations of committee members that are consistent with meso-group conformity. We suggest that our method may offer a novel way of measuring group conformity in general. The third dataset consists of the State of the Union Corpora (SOU). In this dataset, we apply our algorithm to identify changes in temporal orientation occurring among a single President’s entire collection of SOU speeches. Furthermore, we compare the average temporal orientation of the Presidents in relation to various social categories, such as party affiliation and societal events. The scope of the Presidents inventoried for temporal orientation is restricted from Eisenhower to Biden. Sociology Machine learning Grammar, Comparative and general--Syntax Social classes Social structure Subjectivity Cuban Missile Crisis (1962) Schutz, Alfred, 1899-1959 Weber, Max, 1864-1920 Husserl, Edmund, 1859-1938
69	Projection principle as a source of constituent agreement in syntax : the case of Tshivenda Govhola, Annah Thomani January 2022 (has links) Thesis(M.A. (Translation and Linguistics Studies)) -- University of Limpopo, 2022 / The aim of this study was to examine the notion of projection, as underpinned by the Projection Principle, between the subject, the verb, the object, the adjective and the adverb in Tshivenḓa. Data were collected through participant observation, wherein the researcher collected data in the form of clauses and sentences in Tshivenḓa. This study found that verbs and subject prefixes are predicates which project arguments in sentences. These arguments are characterised both linguistically and in the form of word realities. The study further found that Tshivenḓa is a pro drop language because the adjectival argument prefix can locate the subject argument in absentia. In turn, subject arguments and adjectival arguments carry the same class nominal prefix. The projection of elements of a sentence in Tshivenḓa identifies grammatical relations between constituents. Lastly, it is recommended that studies of a similar nature should be conducted in other African languages to establish how elements of a clause or sentence cohere as informed by the Projection principle. Syntax Predicates Constituents Principle Pro drop Agreement Prefix Argument Projection Nomonals TshiVenda language
70	Estudo morfossintatico do Asurini do Xingu / Study of the morphosyntax of the Asurini of Xingu language Pereira, Antonia Alves 13 August 2018 (has links) Orientador: Lucy Seki / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-13T08:50:10Z (GMT). No. of bitstreams: 1 Pereira_AntoniaAlves_D.pdf: 1878111 bytes, checksum: e27694a7bee34a2eea7f6a09902ec93b (MD5) Previous issue date: 2009 / Resumo: Esta tese é um estudo da morfossintaxe da língua Asuriní do Xingu (família Tupi- Guaraní), falada pelos asuriní que residem no Posto Indígena Kwatinemu, no município de Altamira, estado do Pará. A análise pretendeu dar uma visão geral da língua e apresentar aspectos socioculturais de seu povo. Dessa forma, além da morfologia e da sintaxe, partes centrais da tese, procuramos também apresentar a fonologia no nível segmental, pois essa parte era essencial para a continuidade do estudo da língua nos níveis morfológicos e sintáticos. Em conformidade com nossos objetivos, a tese encontra-se dividida em seis capítulos. O capítulo 1 trata de aspectos históricos e socioculturais do grupo, o 2 trata da fonologia no nível segmental, o capítulo 3 discute as classes de palavras da língua, apresentando os critérios para a sua divisão. O capítulo 4 trata de fenômenos relacionados a subconstituintes da oração, nele são discutidos aspectos como a marcação de caso na língua, a oposição nome /verbo x argumento/ predicado, além disso, é mostrada a estrutura dos sintagmas nominal e verbal da língua. O capítulo 5 trata das orações independentes e de como é feita sua classificação. E o capítulo 6 trata das sentenças complexas, que compreendem as coordenadas e as subordinadas. / Abstract: This thesis is a study of the morphosyntax of the Asuriní of Xingu language (Tupi-Guarani family), spoken by the Asuriní who reside at the Posto Indígena Kwatinemu in the municipality of Altamira, Pará State, Brazil. Chapter 1 summarizes the historical and sociological background of the group. Chapter 2 presents the segmental phonology of the language. Chapter 3 discusses word classes and gives criteria for class division. Chapter 4 deals with phenomena related to sentence constituents, including case marking, the noun/verb vs. argument/predicate opposition, and the structure of noun and verb phrases. Chapter 5 deals with independent clauses and their classification. Chapter 6 describes coordination and subordination in complex sentences. Complex sentences are classified into sub-types, and their morphological and syntactic structure is described. / Doutorado / Linguas Indigenas / Doutor em Linguística Lingua asurini - Gramática Gramática comparada e geral - Fonologia Gramática comparada e geral - Sintaxe Asurini Language

Search results