1 |
The Effects of Cue Diagnosticity on Accuracy of Judgments of Text Learning: Evidence Regarding the Cue Utilization Hypothesis and Momentary AccessibilityBaker, Julie Marie 15 July 2008 (has links)
No description available.
|
2 |
The Impact of the Retrieval Text Set for Text Sentiment Classification With the Retrieval-Augmented Language Model REALM / Effekten av hämtningstextsetet för sentimenttextklassificering med den hämtningsförstärkta språkmodellen REALMBlommegård, Oscar January 2023 (has links)
Large Language Models (LLMs) have demonstrated impressive results across various language technology tasks. By training on large corpora of diverse text collections from the internet, these models learn to process text effectively, allowing them to acquire comprehensive world knowledge. However, this knowledge is stored implicitly in the parameters of the model, and it is necessary to train ever-larger networks to capture more information. Retrieval-augmented language models have been proposed as a way of improving the interpretability and adaptability of normal language models by utilizing a separate retrieval text set during application time. These models have demonstrated state-of-the-art results on knowledge-intensive tasks such as question-answering and fact-checking. However, their effectiveness in text classification remains unexplored. This study investigates the impact of the retrieval text set on the performance of the retrieval-augmented language model REALM model for sentiment text classification tasks. The results indicate that the addition of retrieval text data fails to improve the prediction capabilities of REALM for sentiment text classification tasks. This outcome is mainly due to the difference in functionality of the retrieval mechanisms during pre-training and fine-tuning. During pre-training, the neural knowledge retriever focuses on retrieving factual knowledge such as dates, cities and names to enhance the prediction of the model. During fine-tuning, the retriever aims to retrieve texts that can strengthen the prediction of the text sentiment classification task. The findings suggest that retrieval models may hold limited potential to enhance performance for text sentiment classification tasks. / Stora språkmodeller har visat imponerande resultat inom många olika språkteknologiska uppgifter. Genom att träna på stora textmängder från internet lär sig dessa modeller att effektivt processa text, vilket gör att de kan förvärva omfattande världskunskap. Denna kunskap lagras emellertid implicit i modellernas parametrar, och det är nödvändigt att träna allt större nätverk för att fånga mer information. Hämtningsförstärkta språkmodeller (retrieval-augmented language models) har föreslagits som ett sätt att förbättra tolknings- och anpassningsförmågan hos språkmodeller genom att använda en separat hämtningstextmängd (retrieval text set) vid prediktion. Dessa modeller har visat imponerande resultat på kunskapsintensiva uppgifter som frågebesvarande (question-answering) och faktakontroll. Deras effektivitet för textklassificering är dock outforskad. Denna studie undersöker effekten av hämtningstextmängden på prestandan för den hämtningsförstärkta språkmodellen REALM för sentimenttextklassificeringsuppgifter. Resultaten indikerar att användning av hämtningstextmängd vid predicering inte lyckas förbättra REALM prediktionsförmåga för sentimenttextklassificeringsuppgifter. Detta beror främst på skillnaden i funktionalitet hos hämtningsmekanismen under förträning och finjustering. Under förträningen fokuserar hämtningsmekanismen på att hämta fakta som datum, städer och namn för att förbättra modellens predicering. Under finjusteringen syftar hätmningsmekanismen till att hämta texter som kan stärka förutsägelsen av sentimenttextklassificeringsuppgiften. Resultaten tyder på att hämtningsförstärkta modeller kan ha begränsad potential att förbättra prestandan för sentimenttextklassificeringsuppgifter.
|
3 |
Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from TextsDiesner, Jana 01 September 2012 (has links)
This thesis is motivated by the need for scalable and reliable methods and technologies that support the construction of network data based on information from text data. Ultimately, the resulting data can be used for answering substantive and graph-theoretical questions about socio-technical networks.
One main limitation with constructing network data from text data is that the validation of the resulting network data can be hard to infeasible, e.g. in the cases of covert, historical and large-scale networks. This thesis addresses this problem by identifying the impact of coding choices that must be made when extracting network data from text data on the structure of networks and network analysis results. My findings suggest that conducting reference resolution on text data can alter the identity and weight of 76% of the nodes and 23% of the links, and can cause major changes in the value of commonly used network metrics. Also, performing reference resolution prior to relation extraction leads to the retrieval of completely different sets of key entities in comparison to not applying this pre-processing technique. Based on the outcome of the presented experiments, I recommend strategies for avoiding or mitigating the identified issues in practical applications.
When extracting socio-technical networks from texts, the set of relevant node classes might go beyond the classes that are typically supported by tools for named entity extraction. I address this lack of technology by developing an entity extractor that combines an ontology for sociotechnical networks that originates from the social sciences, is theoretically grounded and has been empirically validated in prior work, with a supervised machine learning technique that is based on probabilistic graphical models. This thesis does not stop at showing that the resulting prediction models achieve state of the art accuracy rates, but I also describe the process of integrating these models into an existing and publically available end-user product. As a result, users can apply these models to new text data in a convenient fashion.
While a plethora of methods for building network data from information explicitly or implicitly contained in text data exists, there is a lack of research on how the resulting networks compare with respect to their structure and properties. This also applies to networks that can be extracted by using the aforementioned entity extractor as part of the relation extraction process. I address this knowledge gap by comparing the networks extracted by using this process to network data built with three alternative methods: text coding based on thesauri that associate text terms with node classes, the construction of network data from meta-data on texts, such as key words and index terms, and building network data in collaboration with subject matter experts. The outcomes of these comparative analyses suggest that thesauri generated with the entity extractor developed for this thesis need adjustments with respect to particular categories and types of errors. I am providing tools and strategies to assist with these refinements. My results also show that once these changes have been made and in contrast to manually constructed thesauri, the prediction models generalize with acceptable accuracy to other domains (news wire data, scientific writing, emails) and writing styles (formal, casual). The comparisons of networks constructed with different methods show that ground truth data built by subject matter experts are hardly resembled by any automated method that analyzes text bodies, and even less so by exploiting existing meta-data from text corpora. Thus, aiming to reconstruct social networks from text data leads to largely incomplete networks. Synthesizing the findings from this work, I outline which types of information on socio-technical networks are best captured by what network data construction method, and how to best combine these methods in order to gain a more comprehensive view on a network.
When both, text data and relational data, are available as a source of information on a network, people have previously integrated these data by enhancing social networks with content nodes that represent salient terms from the text data. I present a methodological advancement to this technique and test its performance on the datasets used for the previously mentioned evaluation studies. By using this approach, multiple types of behavioral data, namely interactions between people as well as their language use, can be taken into account. I conclude that extracting content nodes from groups of structurally equivalent agents can be an appropriate strategy for enabling the comparison of the content that people produce, perceive or disseminate. These equivalence classes can represent a variety of social roles and social positions that network members occupy. At the same time, extracting content nodes from groups of structurally coherent agents can be suitable for enabling the enhancement of social networks with content nodes. The results from applying the latter approach to text data include a comparison of the outcome of topic modeling; an efficient and unsupervised information extraction technique, to the outcomes of alternative methods, including entity extraction based on supervised machine learning. My findings suggest that key entities from meta-data knowledge networks might serve as proper labels for unlabeled topics. Also, unsupervised and supervised learning leads to the retrieval of similar entities as highly likely members of highly likely topics, and key nodes from text-based knowledge networks, respectively.
In summary, the contributions made with this thesis help people to collect, manage and analyze rich network data at any scale. This is a precondition for asking substantive and graph-theoretical questions, testing hypotheses, and advancing theories about networks. This thesis uses an interdisciplinary and computationally rigorous approach to work towards this goal; thereby advancing the intersection of network analysis, natural language processing and computing.
|
4 |
Knihovna algoritmů pro šifrování textu / Library of Algorithms for Text CipheringVozák, Petr January 2011 (has links)
p, li { white-space: pre-wrap; } p, li { white-space: pre-wrap; } This thesis deals with text ciphering. The presented paper describes at first basic theoretical background of cryptology and basic distribution of cryptographic algorithms. Then it describes a brief history of encryption from beginning to present. Theoretical description of ciphering methods and its implementation details are discussed here. All basic types of conventional encryption algorithms and also some modern ciphering methods are included; these are substitution, transposition, steganographic or combinations encryption systems. The result of this thesis is the library of algorithms for text ciphering in Java with a sample application, which demonstrates its funcionality.
|
Page generated in 0.0435 seconds