Spelling suggestions: "subject:"anguage anda information"" "subject:"anguage ando information""
1 |
Identifying gender ideology in web content debates about feminism /Martinson, Anna M. January 2009 (has links)
Thesis (Ph.D.)--Indiana University, School of Library Information Science, 2009. / Title from PDF t.p. (viewed on Feb. 4, 2010). Source: Dissertation Abstracts International, Volume: 70-04, Section: A, page: 1075. Adviser: Susan C. Herring.
|
2 |
Computational approaches to linguistic consensus /Wang, Jun, January 2006 (has links)
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2006. / Source: Dissertation Abstracts International, Volume: 68-02, Section: A, page: 0386. Adviser: Les Gasser. Includes bibliographical references (leaves 102-105) Available on microfilm from Pro Quest Information and Learning.
|
3 |
Leveraging Structure for Effective Question AnsweringBonadiman, Daniele 25 September 2020 (has links)
In this thesis, we focus on Answer Sentence Selection (A2S) that is the core task of retrieval based question answering. A2S consists of selecting the sentences that answer user queries from a collection of documents retrieved by a search engine. Over more than two decades, several solutions based on machine learning have been proposed to solve this task, starting from simple approaches based on manual feature engineering to more complex Structural Tree Kernels models, and recently Neural Network architectures.
In particular, the latter requires little human effort as they can automatically extract relevant features from plain text. The development of neural architectures brought improvements in many areas of A2S, reaching unprecedented results. They substantially increase accuracy on almost all benchmark datasets for A2S. However, this has come with the cost of a huge increase in the number of parameters and computational costs of the models. A large number of parameters has led to two drawbacks. The model requires a massive amount of data to train effectively, and huge computational power to maintain an acceptable transaction per second in a production environment. Current state-of-the-art techniques for A2S use huge Transformer architectures, having up to 340 million parameters, pre-trained on a massive amount of data, e.g., BERT. The latter and related models in the same family, such as RoBERTa, are general architectures, i.e., they can be applied to many tasks of NLP without any architectural change.
In contrast to the trend above, we focus on specialized architectures for A2S that can effectively encode the local structure of the question and answer candidate and global information, i.e., the structure of the task and the context in which the answer candidate appears.
In particular, we propose solutions to effectively encode both the local and the global structure of A2S in efficient neural network models. (i) We encode syntactic information in a fast CNN architecture exploiting the capabilities of Structural Tree Kernel to encode the syntactic structure. (ii) We propose an efficient model that can use semantic relational information between question and answer candidates by pretraining word representations on a relational knowledge base. (iii) This efficient approach is further extended to encode each answer candidate's contextual information, encoding all answer candidates in the original context. Lastly, (iv) we propose a solution to encode task-specific structure that is available, for example, available on the community Question Answering task.
The final model, which encodes different aspects of the task, achieves state-of-the-art performance on A2S compared with other efficient architectures. The proposed model is more efficient than attention based architectures and outperforms BERT by two orders of magnitude in terms of transaction per second during training and testing, i.e., it processes 700 questions per second compared to 6 questions per second for BERT when training on a single GPU.
|
4 |
An Evaluation of Existing Light Stemming Algorithms for Arabic Keyword SearchesBrittany E. Rogerson 17 November 2008 (has links)
The field of Information Retrieval recognizes the importance of stemming in improving retrieval effectiveness. This same tool, when applied to searches conducted in the Arabic language, increases the relevancy of documents returned and expands searches to encompass the general meaning of a word instead of the word itself. Since the Arabic language relies mainly on triconsonantal roots for verb forms and derives nouns by adding affixes, words with similar consonants are closely related in meaning. Stemming allows a search term to focus more on the meaning of a term and closely related terms and less on specific character matches. This paper discusses the strengths of light stemming, the best techniques, and components for algorithmic affix-based stemmers used in keyword searching in the Arabic language.
|
5 |
Multi-stage modeling of HTML documentsLevering, Ryan Reed. January 2004 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Department of Computer Science, 2004. / Includes bibliographical references.
|
6 |
Word length and the principle of least effort : language as an evolving, efficient code for information transferKanwal, Jasmeen Kaur January 2018 (has links)
In 1935 the linguist George Kingsley Zipf made a now classic observation about the relationship between a word's length and its frequency: the more frequent a word is, the shorter it tends to be. He claimed that this 'Law of Abbreviation' is a universal structural property of language. The Law of Abbreviation has since been documented in a wide range of human languages, and extended to animal communication systems and even computer programming languages. Zipf hypothesised that this universal design feature arises as a result of individuals optimising form-meaning mappings under competing pressures to communicate accurately but also efficiently - his famous Principle of Least Effort. In this thesis, I present a novel set of studies which provide direct experimental evidence for this explanatory hypothesis. Using a miniature artificial language learning paradigm, I show in Chapter 2 that language users optimise form-meaning mappings in line with the Law of Abbreviation only when pressures for accuracy and efficiency both operate during a communicative task. These results are robust across different methods of data collection: one version of the experiment was run in the lab, and another was run online, using a novel method I developed which allows participants to partake in dyadic interaction through a web-based interface. In Chapter 3, I address the growing body of work suggesting that a word's predictability in context may be an even stronger determiner of its length than its frequency alone. For instance, Piantadosi et al. (2011) show that shorter words have a lower average surprisal (i.e., tend to appear in more predictive contexts) than longer words, in synchronic corpora across many languages. We hypothesise that the same communicative pressures posited by the Principle of Least Effort, when acting on speakers in situations where context manipulates the information content of words, can give rise to these lexical distributions. Adapting the methodology developed in Chapter 2, I show that participants use shorter words in more predictive contexts only when subject to the competing pressures for accurate and efficient communication. In a second experiment, I show that participants are more likely to use shorter words for meanings with a lower average surprisal. These results suggest that communicative pressures acting on individuals during language use can lead to the re-mapping of a lexicon to align with 'Uniform Information Density', the principle that information content ought to be evenly spread across an utterance, such that shorter linguistic units carry less information than longer ones. Over generations, linguistic behaviour such as that observed in the experiments reported here may bring entire lexicons into alignment with the Law of Abbreviation and Uniform Information Density. For this to happen, a diachronic process which leads to permanent lexical change is necessary. However, crucial evidence for this process - decreasing word length as a result of increasing frequency over time - has never before been systematically documented in natural language. In Chapter 4, I conduct the first large-scale diachronic corpus study investigating the relationship between word length and frequency over time, using the Google Books Ngrams corpus and three different word lists covering both English and French. Focusing on words which have both long and short variants (e.g., info/information), I show that the frequency of a word lemma may influence the rate at which the shorter variant gains in popularity. This suggests that the lexicon as a whole may indeed be gradually evolving towards greater efficiency. Taken together, the behavioural and corpus-based evidence presented in this thesis supports the hypothesis that communicative pressures acting on language-users are at least partially responsible for the frequency-length and surprisal-length relationships found universally across lexicons. More generally, the approach taken in this thesis promotes a view of language as, among other things, an evolving, efficient code for information transfer.
|
7 |
Der Einfluss der Informationsstruktur auf das Verständnis von Aktiv- und Passivsätzen im ungestörten Spracherwerb / The influcene of information structure on German-speaking children's comprehension of active and passive sentencesMeinhardt, Miriam January 2010 (has links)
Kinder erwerben Passivstrukturen später als die meisten anderen syntaktischen Strukturen. Die vorliegende Studie beschäftigt sich mit der Frage, ob dies auf informationsstrukturelle Faktoren zurückzuführen sein könnte. Probleme beim Erwerb von Passivsätzen wurden in vorhergehenden Studien unter anderem auf ihre geringe Inputfrequenz oder bestimmte syntaktische Charakteristika von Passivsätzen zurückgeführt. Jedoch konnte bisher keiner dieser Ansätze ihr spätes Erwerbsalter umfassend erklären.
Während Aktivsätze, die kanonische, unmarkierte Satzstruktur im Deutschen, in jeglichem Diskurskontext verwendet werden können, werden Passivsätze fast ausschließlich dann verwendet, wenn der Patiens der beschriebenen Handlung schon vorerwähnt war und/ oder als Topik eines Satzes fungieren soll. Passivsätze sind also nicht in jedem Kontext informationsstrukturell adäquat.
Kinder haben im Gegensatz zu Erwachsenen aufgrund ihrer geringeren syntaktischen Fähigkeiten Probleme, Sätze zu verarbeiten, die nicht in einem adäquaten Kontext stehen. Der Einfluss dieser Kontextbedingungen auf das Satzverständnis wurde in der vorliegenden Studie bei deutschsprachigen Kindern untersucht. Kindern zwischen 3;0 und 4;11 Jahren wurden Aktiv- oder Passivsätze präsentiert, denen informationsstrukturell adäquate, inadäquate oder neutrale Kontextsätze vorangingen. Wie erwartet verstanden die Kinder Aktivsätze besser als Passivsätze und 4-jährige Kinder zeigten bessere Leistungen als 3-jährige. Es gab Tendenzen, dass die 3-jährigen Kinder Passivsätze besser, aber Aktivsätze schlechter verstanden, wenn ihr Subjekt vorerwähnt wurde. Statistisch signifikante Kontexteffekte fanden sich jedoch im Gegensatz zu einer vergleichbaren Studie mit englischsprachigen Kindern (Gourley und Catlin, 1978) in keiner Testbedingung. Außerdem zeigte sich, dass die Kinder Passivsätze insgesamt besser und Aktivsätze insgesamt schlechter verstanden als englischsprachige Kinder in anderen Studien.
Die Ergebnisse werden mit dem Competition Modell (Mac Whinney und Bates, 1987) und einer Sprachverarbeitungstheorie von Stromswold (2002) erklärt. Außerdem wird diskutiert, warum die deutschsprachigen Kinder in der vorliegenden Studie andere Sprachverständnisleistungen zeigten als englischsprachige Kinder. / Children acquire passive constructions later than most other syntactic structures. The purpose of the present study was to investigate whether this phenomenon can be explained with an information-structural account. In former studies problems in the acquisition of the passive voice have often been attributed to its low input frequency or to its specific syntactic characteristics. However, none of these theories could sufficiently explain the late age of acquisition of passive structures.
Sentences in the active voice, the canonical, unmarked, structure in German can be used in any discourse context while passive sentences are almost always used if the patient of the described action is GIVEN in the context and/ or serves as the TOPIC of the sentence. Therefore passive sentences cannot be used in any context without violating information structural constraints.
It is more difficult for children – due to their less developed syntactic abilities – than for adults to process sentences which do not occur in an information structurally appropriate context. The present study examines the influence of the context on sentence comprehension abilities of German speaking children. Children at the age of 3;0 – 4;11 years were presented active or passive sentences in an information structurally appropriate, inappropriate or neutral context.
As expected, children comprehended active sentences better than passive sentences, and 4-year olds performed better than 3-year olds. There was a tendency that 3-year olds comprehended passive sentences better but active sentences worse if the subject of the sentence was GIVEN in the context. However, there were no statistically significant context effects, in contrast to a similar study with English-speaking children (Gourley and Catlin, 1978). In addition, it could be shown that German-speaking children comprehended passive sentences better than English-speaking children in other studies.
The results are explained with the Competition Model (Mac Whinney and Bates, 1987) and Stromswold’s (2002) theory of language processing. It is also discussed why German-speaking children showed different language comprehension abilities than English-speaking children.
|
8 |
Set-valued extensions of fuzzy logic classification theorems /Ornelas, Gilbert, January 2007 (has links)
Thesis (M.S.)--University of Texas at El Paso, 2007. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
|
9 |
The role of syntactic appropriateness and frequency in word recognitionFarrar, William Thomas. January 1900 (has links)
Thesis (Ph. D.)--University of California, Santa Cruz, 1993. / Typescript. Includes bibliographical references (68-71).
|
10 |
Extrakce informací z biomedicínských textů / Information Extraction from Biomedical TextsKnoth, Petr January 2008 (has links)
Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable. In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.
|
Page generated in 0.1058 seconds