Global ETD Search

171	Lights, Camera, BERT! : Autonomizing the Process of Reading andInterpreting Swedish Film Scripts Henzel, Leon January 2023 (has links) In this thesis, the autonomization of extracting information from PDFs of Swedish film scriptsthrough various machine learning techniques and named entity recognition (NER) is explored.Furthermore, it is explored if labeled data needed for the NER tasks can be reduced to some degreewith the goal of saving time. The autonomization process is split into two subsystems, one forextracting larger chunks of text and one for extracting relevant information through named entitiesfrom some of the larger text-chunks using NER. The methods explored for accelerating the labelingtime for NER are active learning and self learning. For active learning, three methods are explored:Logprob and Word Entropy as uncertainty based active learning methods, and active learning byprocessing surprisal (ALPS) as a diversity based method. For self learning, Logprob and WordEntropy are used as they are uncertainty based sampling methods. The results find that ALPS isthe highest performing active learning method when it comes to saving time on labeling data forNER. For Self learning Word Entropy proved a successful method, whereas Logprob could notsufficiently be used for self learning. The entire script reading system is evaluated by competingagainst a human extracting information from a film script, where the human and system competeson time and accuracy. Accuracy is defined a custom F1-score based on the F1-score for NER.Overall the system performs magnitudes faster than human level, while still retaining fairly highaccuracy. The system for extracting named entities had quite low accuracy, which is hypothesisedto mainly be due to high data imbalance and too little diversity in the training data.Teknisk-naturvetenskapliga fakultetenUppsala universitet, Utgivningsort UppsalaHandledare: Björn Mosten Ämnesgranskare: Maria Andrína Fransisco Rodriguez NLP Active Learning Transformers BERT
172	NLP i sökmotorer : Informationssökning och språkteknologi / NLP in search engines : Information retrieval and language technology Friberg, Jens January 2024 (has links) Sökmotorer har blivit en allt viktigare del i människors informationshantering för att uppfylla olika behov. I en pressad situation ställs detta på sin spets, antingen det rör sig om en akut kris eller bara den moderna människans tidspress. I en sådan situation är det viktigt att enkelt kunna hitta rättinformation.I det här arbetet undersöktes hur tre västsvenska kommuners sökmotorer presterar för användare som försöker hitta tillbaka till ett bestämt dokument, en av kommunens webbsidor. Det var ett tänkt scenario som handlade om användare som redan var bekanta med innehållet, till exempel för attnågon behjälplig bekant läste delar av texten för dem, eller att de hade sidan utskriven från tidigare, eller att användaren hade besökt sidan tidigare och mindes delar av innehållet. Oavsett bakgrund tänktes en situation där informationen blev relevant och användaren var i behov av uppdaterad detaljinformation.Syftet var att jämföra de kommunala sökmotorernas prestation jämfört med en egenkonstruerad sökmotor där NLP-metoder användes. NLP – Natural Language Processing – handlar om attspråkvetenskapliga metoder används i systemutvecklingen. Undersökningen utfördes kvantitativt genom att sätta en poängskala för hur högt upp i sökresultaten den eftersökta webbsidan hamnade. Resultaten visade hur de respektive kommunerna presterade parallellt med resultaten från den egna sökmotorn. Genom att särskilja sökningarna beroende på användningen av olika NLP-metoder (separat eller i kombination) kunde de specifika metodernas effektivitet förtydligas.Resultaten visade tydliga brister hos samtliga av de undersökta kommunernas sökmotorer, medan den NLP-stödda sökmotorn konsekvent presterade bättre. De undersökta kommunernas sökmotorer gav likartade resultat, både gällande den generella prestationen och i fråga om vilka sökord som fungerade bättre samt andra detaljer. Arbetet visade att NLP-metoder kan ge bättre resultat, framför allt på grund av att de kommunala sökmotorerna var bristfälliga. / <p>Opponeringsdatum</p> Sökmotorer språkteknologi datalingvistik NLP information retrieval Ale Härryda Mölndal Information Systems
173	Knowledge Extraction from Biomedical Literature with Symbolic and Deep Transfer Learning Methods Ramponi, Alan 30 June 2021 (has links) The available body of biomedical literature is increasing at a high pace, exceeding the ability of researchers to promptly leverage this knowledge-rich amount of information. Although the outstanding progress in natural language processing (NLP) we observed in the past few years, current technological advances in the field mainly concern newswire and web texts, and do not directly translate in good performance on highly specialized domains such as biomedicine due to linguistic variations along surface, syntax and semantic levels. Given the advances in NLP and the challenges the biomedical domain exhibits, and the explosive growth of biomedical knowledge being currently published, in this thesis we contribute to the biomedical NLP field by providing efficient means for extracting semantic relational information from biomedical literature texts. To this end, we made the following contributions towards the real-world adoption of knowledge extraction methods to support biomedicine: (i) we propose a symbolic high-precision biomedical relation extraction approach to reduce the time-consuming manual curation efforts of extracted relational evidence (Chapter 3), (ii) we conduct a thorough cross-domain study to quantify the drop in performance of deep learning methods for biomedical edge detection shedding lights on the importance of linguistic varieties in biomedicine (Chapter 4), and (iii) we propose a fast and accurate end-to-end solution for biomedical event extraction, leveraging sequential transfer learning and multi-task learning, making it a viable approach for real-world large-scale scenarios (Chapter 5). We then outline the conclusions by highlighting challenges and providing future research directions in the field. Settore INF/01 - Informatica
174	COGNITION, MODALITY, AND LANGUAGE IN HEALTHY YOUNG ADULTS Finley, Ann, 0000-0003-3368-3912 12 1900 (has links) Measures drawn from language samples (e.g., discourse measures) are used in clinical and research settings as a functional measure of language and cognitive abilities. In narrative elicitation tasks, discourse measures reliably vary by the type of prompt used to collect a language sample. Additionally, language features tend to very along with communicative context, topic, and modality (e.g., oral vs. written). However, until recent years, technology had not advanced sufficiently to support large-scale study of spoken language data. In this project, we used natural language processing and machine learning methods to examine the intersection of discourse measures, language modality, and cognition (i.e., working memory) in healthy young adults. In Experiment 1, we used a computational approach to examine discourse measures in spoken and written English. We achieved >90% accuracy in binary classification (e.g., spoken/written). In Experiment 2, we took a behavioral approach, studying working memory and narrative discourse measures in a cohort of healthy young adults. We predicted that working memory would predict informativity in participants’ narrative language samples. We found mixed results for our two measures of informativity (e.g., the Measure of Textual Lexical Diversity and Shannon entropy). We attributed the observed differences in these two measures to the fact that, while both serve to measure new or unique information, MTLD indexes additional linguistic information (e.g., semantic, lexical). In contrast, Shannon entropy is based on word co-occurrence statistics. We interpret our overall results as support for the potential utility of machine learning in language research and potential for future research and clinical implementations. / Communication Sciences Language Cognitive psychology Computer science Discourse Language Linguistics Machine learning Nlp Working memory
175	Evaluation of BERT-like models for small scale ad-hoc information retrieval / Utvärdering av BERT-liknande modeller för småskalig ad-hoc informationshämtning Roos, Daniel January 2021 (has links) Measuring semantic similarity between two sentences is an ongoing research field with big leaps being taken every year. This thesis looks at using modern methods of semantic similarity measurement for an ad-hoc information retrieval (IR) system. The main challenge tackled was answering the question "What happens when you don’t have situation-specific data?". Using encoder-based transformer architectures pioneered by Devlin et al., which excel at fine-tuning to situationally specific domains, this thesis shows just how well the presented methodology can work and makes recommendations for future attempts at similar domain-specific tasks. It also shows an example of how a web application can be created to make use of these fast-learning architectures. NLP Transformers BERT Information retrieval Semantic similarity
176	MEDICAL EVENT TIMELINE GENERATION FROM CLINICAL NARRATIVES Raghavan, Preethi 05 September 2014 (has links) No description available. Computer Science
177	Algorithms and Resources for Scalable Natural Language Generation Pfeil, Jonathan W. 01 September 2016 (has links) No description available. Computer Science Language natural language generation sentence generation linguistic resources discourse planning nlp nlg scaling
178	Enhancing Text Readability Using Deep Learning Techniques Alkaldi, Wejdan 20 July 2022 (has links) In the information era, reading becomes more important to keep up with the growing amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels. NLP Text Simplification Text Classification Deep Learning Reinforcement Learning Data Augmentation Natural Language Processing
179	Functional linguistic based motivations for a conversational software agent Panesar, Kulvinder 07 October 2020 (has links) No / This chapter discusses a linguistically orientated model of a conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP) concepts and the levels of adequacy of a functional linguistic theory (LT). We discuss the relationship between NLP and knowledge representation (KR), and connect this with the goals of a linguistic theory (Van Valin and LaPolla 1997), in particular Role and Reference Grammar (RRG) (Van Valin Jr 2005). We debate the advantages of RRG and consider its fitness and computational adequacy. We present a design of a computational model of the linking algorithm that utilises a speech act construction as a grammatical object (Nolan 2014a, Nolan 2014b) and the sub-model of belief, desire and intentions (BDI) (Rao and Georgeff 1995). This model has been successfully implemented in software, using the resource description framework (RDF), and we highlight some implementation issues that arose at the interface between language and knowledge representation (Panesar 2017). / The full-text of this article will be released for public view at the end of the publisher embargo on 27 Sep 2024. Linguistic based motivations Conversational software agent (CSA) Knowledge representation (KR)
180	Stock Price Movement Prediction Using Sentiment Analysis and Machine Learning Wang, Jenny Zheng 01 June 2021 (has links) (PDF) Stock price prediction is of strong interest but a challenging task to both researchers and investors. Recently, sentiment analysis and machine learning have been adopted in stock price movement prediction. In particular, retail investors’ sentiment from online forums has shown their power to influence the stock market. In this paper, a novel system was built to predict stock price movement for the following trading day. The system includes a web scraper, an enhanced sentiment analyzer, a machine learning engine, an evaluation module, and a recommendation module. The system can automatically select the best prediction model from four state-of-the-art machine learning models (Long Short-Term Memory, Support Vector Machine, Random Forest, and Extreme Boost Gradient Tree) based on the acquired data and the models’ performance. Moreover, stock market lexicons were created using large-scale text mining on the Yahoo Finance Conversation boards and natural language processing. Experiments using the top 30 stocks on the Yahoo users’ watchlists and a randomly selected stock from NASDAQ were performed to examine the system performance and proposed methods. The experimental results show that incorporating sentiment analysis can improve the prediction for stocks with a large daily discussion volume. Long Short-Term Memory model outperformed other machine learning models when using both price and sentiment analysis as inputs. In addition, the Extreme Boost Gradient Tree (XGBoost) model achieved the highest accuracy using the price-only feature on low-volume stocks. Last but not least, the models using the enhanced sentiment analyzer outperformed the VADER sentiment analyzer by 1.96%. Stock Price Movement Prediction Machine Learning NLP sentiment analysis LSTM SVM RF XGBoost Computer and Systems Architecture

Search results