Spelling suggestions: "subject:"[een] NLP"" "subject:"[enn] NLP""
171 |
Lights, Camera, BERT! : Autonomizing the Process of Reading andInterpreting Swedish Film ScriptsHenzel, Leon January 2023 (has links)
In this thesis, the autonomization of extracting information from PDFs of Swedish film scriptsthrough various machine learning techniques and named entity recognition (NER) is explored.Furthermore, it is explored if labeled data needed for the NER tasks can be reduced to some degreewith the goal of saving time. The autonomization process is split into two subsystems, one forextracting larger chunks of text and one for extracting relevant information through named entitiesfrom some of the larger text-chunks using NER. The methods explored for accelerating the labelingtime for NER are active learning and self learning. For active learning, three methods are explored:Logprob and Word Entropy as uncertainty based active learning methods, and active learning byprocessing surprisal (ALPS) as a diversity based method. For self learning, Logprob and WordEntropy are used as they are uncertainty based sampling methods. The results find that ALPS isthe highest performing active learning method when it comes to saving time on labeling data forNER. For Self learning Word Entropy proved a successful method, whereas Logprob could notsufficiently be used for self learning. The entire script reading system is evaluated by competingagainst a human extracting information from a film script, where the human and system competeson time and accuracy. Accuracy is defined a custom F1-score based on the F1-score for NER.Overall the system performs magnitudes faster than human level, while still retaining fairly highaccuracy. The system for extracting named entities had quite low accuracy, which is hypothesisedto mainly be due to high data imbalance and too little diversity in the training data.Teknisk-naturvetenskapliga fakultetenUppsala universitet, Utgivningsort UppsalaHandledare: Björn Mosten Ämnesgranskare: Maria Andrína Fransisco Rodriguez
|
172 |
NLP i sökmotorer : Informationssökning och språkteknologi / NLP in search engines : Information retrieval and language technologyFriberg, Jens January 2024 (has links)
Sökmotorer har blivit en allt viktigare del i människors informationshantering för att uppfylla olika behov. I en pressad situation ställs detta på sin spets, antingen det rör sig om en akut kris eller bara den moderna människans tidspress. I en sådan situation är det viktigt att enkelt kunna hitta rättinformation.I det här arbetet undersöktes hur tre västsvenska kommuners sökmotorer presterar för användare som försöker hitta tillbaka till ett bestämt dokument, en av kommunens webbsidor. Det var ett tänkt scenario som handlade om användare som redan var bekanta med innehållet, till exempel för attnågon behjälplig bekant läste delar av texten för dem, eller att de hade sidan utskriven från tidigare, eller att användaren hade besökt sidan tidigare och mindes delar av innehållet. Oavsett bakgrund tänktes en situation där informationen blev relevant och användaren var i behov av uppdaterad detaljinformation.Syftet var att jämföra de kommunala sökmotorernas prestation jämfört med en egenkonstruerad sökmotor där NLP-metoder användes. NLP – Natural Language Processing – handlar om attspråkvetenskapliga metoder används i systemutvecklingen. Undersökningen utfördes kvantitativt genom att sätta en poängskala för hur högt upp i sökresultaten den eftersökta webbsidan hamnade. Resultaten visade hur de respektive kommunerna presterade parallellt med resultaten från den egna sökmotorn. Genom att särskilja sökningarna beroende på användningen av olika NLP-metoder (separat eller i kombination) kunde de specifika metodernas effektivitet förtydligas.Resultaten visade tydliga brister hos samtliga av de undersökta kommunernas sökmotorer, medan den NLP-stödda sökmotorn konsekvent presterade bättre. De undersökta kommunernas sökmotorer gav likartade resultat, både gällande den generella prestationen och i fråga om vilka sökord som fungerade bättre samt andra detaljer. Arbetet visade att NLP-metoder kan ge bättre resultat, framför allt på grund av att de kommunala sökmotorerna var bristfälliga. / <p>Opponeringsdatum</p>
|
173 |
Knowledge Extraction from Biomedical Literature with Symbolic and Deep Transfer Learning MethodsRamponi, Alan 30 June 2021 (has links)
The available body of biomedical literature is increasing at a high pace, exceeding the ability of researchers to promptly leverage this knowledge-rich amount of information. Although the outstanding progress in natural language processing (NLP) we observed in the past few years, current technological advances in the field mainly concern newswire and web texts, and do not directly translate in good performance on highly specialized domains such as biomedicine due to linguistic variations along surface, syntax and semantic levels. Given the advances in NLP and the challenges the biomedical domain exhibits, and the explosive growth of biomedical knowledge being currently published, in this thesis we contribute to the biomedical NLP field by providing efficient means for extracting semantic relational information from biomedical literature texts. To this end, we made the following contributions towards the real-world adoption of knowledge extraction methods to support biomedicine: (i) we propose a symbolic high-precision biomedical relation extraction approach to reduce the time-consuming manual curation efforts of extracted relational evidence (Chapter 3), (ii) we conduct a thorough cross-domain study to quantify the drop in performance of deep learning methods for biomedical edge detection shedding lights on the importance of linguistic varieties in biomedicine (Chapter 4), and (iii) we propose a fast and accurate end-to-end solution for biomedical event extraction, leveraging sequential transfer learning and multi-task learning, making it a viable approach for real-world large-scale scenarios (Chapter 5). We then outline the conclusions by highlighting challenges and providing future research directions in the field.
|
174 |
COGNITION, MODALITY, AND LANGUAGE IN HEALTHY YOUNG ADULTSFinley, Ann, 0000-0003-3368-3912 12 1900 (has links)
Measures drawn from language samples (e.g., discourse measures) are used in clinical and research settings as a functional measure of language and cognitive abilities. In narrative elicitation tasks, discourse measures reliably vary by the type of prompt used to collect a language sample. Additionally, language features tend to very along with communicative context, topic, and modality (e.g., oral vs. written). However, until recent years, technology had not advanced sufficiently to support large-scale study of spoken language data. In this project, we used natural language processing and machine learning methods to examine the intersection of discourse measures, language modality, and cognition (i.e., working memory) in healthy young adults. In Experiment 1, we used a computational approach to examine discourse measures in spoken and written English. We achieved >90% accuracy in binary classification (e.g., spoken/written). In Experiment 2, we took a behavioral approach, studying working memory and narrative discourse measures in a cohort of healthy young adults. We predicted that working memory would predict informativity in participants’ narrative language samples. We found mixed results for our two measures of informativity (e.g., the Measure of Textual Lexical Diversity and Shannon entropy). We attributed the observed differences in these two measures to the fact that, while both serve to measure new or unique information, MTLD indexes additional linguistic information (e.g., semantic, lexical). In contrast, Shannon entropy is based on word co-occurrence statistics. We interpret our overall results as support for the potential utility of machine learning in language research and potential for future research and clinical implementations. / Communication Sciences
|
175 |
Evaluation of BERT-like models for small scale ad-hoc information retrieval / Utvärdering av BERT-liknande modeller för småskalig ad-hoc informationshämtningRoos, Daniel January 2021 (has links)
Measuring semantic similarity between two sentences is an ongoing research field with big leaps being taken every year. This thesis looks at using modern methods of semantic similarity measurement for an ad-hoc information retrieval (IR) system. The main challenge tackled was answering the question "What happens when you don’t have situation-specific data?". Using encoder-based transformer architectures pioneered by Devlin et al., which excel at fine-tuning to situationally specific domains, this thesis shows just how well the presented methodology can work and makes recommendations for future attempts at similar domain-specific tasks. It also shows an example of how a web application can be created to make use of these fast-learning architectures.
|
176 |
MEDICAL EVENT TIMELINE GENERATION FROM CLINICAL NARRATIVESRaghavan, Preethi 05 September 2014 (has links)
No description available.
|
177 |
Algorithms and Resources for Scalable Natural Language GenerationPfeil, Jonathan W. 01 September 2016 (has links)
No description available.
|
178 |
Enhancing Text Readability Using Deep Learning TechniquesAlkaldi, Wejdan 20 July 2022 (has links)
In the information era, reading becomes more important to keep up with the growing
amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels.
|
179 |
Stock Price Movement Prediction Using Sentiment Analysis and Machine LearningWang, Jenny Zheng 01 June 2021 (has links) (PDF)
Stock price prediction is of strong interest but a challenging task to both researchers and investors. Recently, sentiment analysis and machine learning have been adopted in stock price movement prediction. In particular, retail investors’ sentiment from online forums has shown their power to influence the stock market. In this paper, a novel system was built to predict stock price movement for the following trading day. The system includes a web scraper, an enhanced sentiment analyzer, a machine learning engine, an evaluation module, and a recommendation module. The system can automatically select the best prediction model from four state-of-the-art machine learning models (Long Short-Term Memory, Support Vector Machine, Random Forest, and Extreme Boost Gradient Tree) based on the acquired data and the models’ performance. Moreover, stock market lexicons were created using large-scale text mining on the Yahoo Finance Conversation boards and natural language processing. Experiments using the top 30 stocks on the Yahoo users’ watchlists and a randomly selected stock from NASDAQ were performed to examine the system performance and proposed methods. The experimental results show that incorporating sentiment analysis can improve the prediction for stocks with a large daily discussion volume. Long Short-Term Memory model outperformed other machine learning models when using both price and sentiment analysis as inputs. In addition, the Extreme Boost Gradient Tree (XGBoost) model achieved the highest accuracy using the price-only feature on low-volume stocks. Last but not least, the models using the enhanced sentiment analyzer outperformed the VADER sentiment analyzer by 1.96%.
|
180 |
Explainable Neural Claim Verification Using RationalizationGurrapu, Sai Charan 15 June 2022 (has links)
The dependence on Natural Language Processing (NLP) systems has grown significantly in the last decade. Recent advances in deep learning have enabled language models to generate high-quality text at the same level as human-written text. If this growth continues, it can potentially lead to increased misinformation, which is a significant challenge. Although claim verification techniques exist, they lack proper explainability. Numerical scores such as Attention and Lime and visualization techniques such as saliency heat maps are insufficient because they require specialized knowledge. It is inaccessible and challenging for the nonexpert to understand black-box NLP systems. We propose a novel approach called, ExClaim for explainable claim verification using NLP rationalization. We demonstrate that our approach can predict a verdict for the claim but also justify and rationalize its output as a natural language explanation (NLE). We extensively evaluate the system using statistical and Explainable AI (XAI) metrics to ensure the outcomes are valid, verified, and trustworthy to help reinforce the human-AI trust. We propose a new subfield in XAI called Rational AI (RAI) to improve research progress on rationalization and NLE-based explainability techniques. Ensuring that claim verification systems are assured and explainable is a step towards trustworthy AI systems and ultimately helps mitigate misinformation. / Master of Science / The dependence on Natural Language Processing (NLP) systems has grown significantly in the last decade. Recent advances in deep learning have enabled text generation models to generate high-quality text that is at the same level as human-written text. If this growth continues, it can potentially lead to increased misinformation, which is a major societal challenge. Although claim verification techniques exist, they lack proper explainability. It is difficult for the average user to understand the model's decision-making process. Numerical scores and visualization techniques exist to provide explainability, but they are insufficient because they require specialized domain knowledge. This makes it inaccessible and challenging for the nonexpert to understand black-box NLP systems. We propose a novel approach called, ExClaim for explainable claim verification using NLP rationalization. We demonstrate that our approach can predict a verdict for the claim but also justify and rationalize its output as a natural language explanation (NLE). We extensively evaluate the system using statistical and Explainable AI (XAI) metrics to ensure the outcomes are valid, verified, and trustworthy to help reinforce the human-AI trust. We propose a new subfield in XAI called Rational AI (RAI) to improve research progress on rationalization and NLE-based explainability techniques. Ensuring that claim verification systems are assured and explainable is a step towards trustworthy AI systems and ultimately helps mitigate misinformation.
|
Page generated in 0.0447 seconds