• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 929
  • 156
  • 74
  • 55
  • 27
  • 23
  • 18
  • 13
  • 10
  • 9
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1601
  • 1601
  • 1601
  • 622
  • 565
  • 464
  • 383
  • 376
  • 266
  • 256
  • 245
  • 228
  • 221
  • 208
  • 204
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
671

How to improve Swedish sentiment polarityclassification using context analysis

Nilsson, Ludvig, Djerf, Olle January 2021 (has links)
This thesis considers sentiment polarity analysis in Swedish. De-spite being the most widely spoken of the Nordic languages less re-search in sentiment has been conducted in this area compared toneighboring languages. As such this is a largely exploratory projectusing techniques that have shown positive results for other languages.We perform a comparison of techniques applied to a CNN to existingSwedish and multilingual variations of the state of the art BERTmodel. We find that the preprocessing techniques do in fact bene-fit our CNN model, but still do not match the results of fine-tuned BERT models. We conclude that a Swedish specific BERT modelcan outperform the generic multilingual ones, but only under certainconditions.
672

Understand me, do you? : An experiment exploring the natural language understanding of two open source chatbots

Olofsson, Linnéa, Patja, Heidi January 2021 (has links)
What do you think of when you hear the word chatbot? A helpful assistant when booking flight tickets? Maybe a frustrating encounter with a company’s customer support, or smart technologies that will eventually take over your job? The field of chatbots is under constant development and bots are more and more taking a place in our everyday life, but how well do they really understand us humans?  The objective of this thesis is to investigate how capable two open source chatbots are in understanding human language when given input containing spelling errors, synonyms or faulty syntax. The study will further investigate if the bots get better at identifying what the user’s intention is when supplied with more training data to base their analysis on.  Two different chatbot frameworks, Botpress and Rasa, were consulted to execute this experiment. The two bots were created with basic configurations and trained using the same data. The chatbots underwent three rounds of training and testing, where they were given additional training and asked control questions to see if they managed to interpret the correct intent. All tests were documented and scores were calculated to create comparable data. The results from these tests showed that both chatbots performed well when it came to simpler spelling errors and syntax variations. Their understanding of more complex spelling errors were lower in the first testing phase but increased with more training data. Synonyms followed a similar pattern, but showed a minor tendency towards becoming overconfident and producing incorrect results with a high confidence in the last phase. The scores pointed to both chatbots getting better at understanding the input when receiving additional training. In conclusion, both chatbots showed signs of understanding language variations when given minimal training, but got significantly better results when provided with more data. The potential to create a bot with a substantial understanding of human language is evident with these results, even for developers who are previously not experienced with creating chatbots, also taking into consideration the vast possibilities to customise your chatbot.
673

Enhancing Text Readability Using Deep Learning Techniques

Alkaldi, Wejdan 20 July 2022 (has links)
In the information era, reading becomes more important to keep up with the growing amount of knowledge. The ability to read a document varies from person to person depending on their skills and knowledge. It also depends on the readability level of the text, whether it matches the reader’s level or not. In this thesis, we propose a system that uses state-of-the-art technology in machine learning and deep learning to classify and simplify a text taking into consideration the reader’s level of reading. The system classifies any text to its equivalent readability level. If the text readability level is higher than the reader’s level, i.e. too difficult to read, the system performs text simplification to meet the desired readability level. The classification and simplification models are trained on data annotated with readability levels from in the Newsela corpus. The trained simplification model performs at sentence level, to simplify a given text to match a specific readability level. Moreover, the trained classification model is used to classify more unlabelled sentences using Wikipedia Corpus and Mechanical Turk Corpus in order to enrich the text simplification dataset. The augmented dataset is then used to improve the quality of the simplified sentences. The system generates simplified versions of a text based on the desired readability levels. This can help people with low literacy to read and understand any documents they need. It can also be beneficial to educators who assist readers with different reading levels.
674

Studies on Fundamental Problems in Event-Level Language Analysis / イベントレベルの言語解析における基礎的課題に関する研究

Kiyomaru, Hirokazu 23 March 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24029号 / 情博第785号 / 新制||情||133(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 河原 達也, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
675

Enhancing Morphological Analysis and Example Sentence Extraction for Japanese Language Learning / 日本語学習のための形態素解析と例文抽出の高度化

Tolmachev, Arseny 23 March 2022 (has links)
付記する学位プログラム名: デザイン学大学院連携プログラム / 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24033号 / 情博第789号 / 新制||情||134(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 黒橋 禎夫, 教授 河原 達也, 教授 楠見 孝 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
676

NLP-based Failure log Clustering to Enable Batch Log Processing in Industrial DevOps Setting

Homayouni, Ali January 2022 (has links)
The rapid development, updating, and maintenance of industrial software systems have increased the necessity for software artifact testing. Some medium and large industries are forced to automate the test analysis process due to the proliferation of test data. The examination of test results can be automated by grouping them into subsets comprised of comparable test outcomes and their batch analysis. In this instance, the first step is to identify a precise and reliable categorization mechanism based on structural similarities and error categories. In addition, since errors and the number of subgroups are not specified, a method that does not require prior knowledge of the target subsets should be implemented. Clustering is one of the appropriate methods for separating test results, given this description. This work presents an appropriate approach for grouping test results and accelerating the test analysis process by implementing multiple clustering algorithms (K-means, Agglomerative, DBSCAN, Fuzzy-c-means, and Spectral) on test results from industrial contexts and comparing their time and efficiency in outputs. The lack of organization and textual character of the test findings is one of the primary obstacles in this study, necessitating the implementation of feature selection methods. Consequently, this study employs three distinct approaches to feature selection (TF-IDF, FastText, and Bert). This research was conducted by implementing a series of trials in a controlled and isolated environment, with the assistance of Westermo Technologies AB's test process results, as part of the AIDOaRT Project, in order to establish an acceptable way for clustering industrial test results. The conclusion of this thesis shows that K-means and Agglomerative yield the highest performance and evaluation scores; however, the K-means is superior in terms of execution time and speed. In addition, by organizing a Focus Group meeting to qualitatively examine the results from the perspective of engineers and experts, it can be determined that, from their perspective, clustering results increases the speed of test analysis and decreases the review workload.
677

Descriptive Labeling of Document Clusters / Deskriptiv märkning av dokumentkluster

Österberg, Adam January 2022 (has links)
Labeling is the process of giving a set of data a descriptive name. This thesis dealt with documents with no additional information and aimed at clustering them using topic modeling and labeling them using Wikipedia as a second source. Labeling documents is a new field with many potential solutions. This thesis examined one method in a practical setting. Unstructured data was preprocessed and clustered using a topic model. Frequent words from each cluster were used to generate a search query sent to Wikipedia, where titles and categories from the most relevant pages were stored as candidate labels. Each candidate label was evaluated based on the frequency of common cluster words among the candidate labels. The frequency was weighted proportional to the relevance of the original Wikipedia article. The relevance was based on the order of appearance in the search results. The five labels with the highest scores were chosen to describe the cluster. The clustered documents consisted of exam questions that students use to practice before a course exam. Each question in the cluster was scored by someone experienced in the relevant topic by evaluating if one of the five labels correctly described the content. The method proved unreliable, with only one course receiving labels considered descriptive for most of its questions. A significant problem was the closely related data with all documents belonging to one overarching category instead of a dataset containing independent topics. However, for one dataset, 80 % of the documents received a descriptive label, indicating that labeling using secondary sources has potential, but needs to be investigated further. / Märkning handlar om att ge okända data en beskrivning. I denna uppsats behandlas data i form av dokument som utan ytterligare information klustras med temamodellering samt märks med hjälp av Wikipedia som en sekundär källa. Märkning av dokument är ett nytt forskningsområde med flera tänkbara vägar framåt. I denna uppsats undersöks en möjlig metod i en praktisk miljö. Dokumenten förbehandlas och grupperas i kluster med hjälp av en temamodell. Vanliga ord från varje kluster används sedan för att generera en sökfråga som skickas till Wikipedia där titlar och kategorier från de mest relevanta sidorna lagras som kandidater. Varje kandidat utvärderas sedan baserat på frekvensen av kandidatordet bland titlarna i klustret och relevansen av den ursprungliga Wikipedia-artikeln. Relevansen av artiklarna baserades på i vilken ordning de dök upp i sökresultatet. De fem märkningarna med högst poäng valdes ut för att beskriva klustret. De klustrade dokumenten bestod av tentamensfrågor som studenter använder sig av för att träna inför ett prov. Varje fråga i klustret utvärderades av någon med erfarenhet av det i frågan behandlade ämnet. Utvärderingen baserades på om någon av de fem märkningarna ansågs beskriva innehållet. Metoden visade sig vara opålitlig med endast en kurs som erhöll märkningar som ansågs beskrivande för majoriteten av dess frågor. Ett stort problem var att data var nära relaterad med alla dokument tillhörande en övergripande kategori i stället för oberoende ämnen. För en datamängd fick dock 80 % av dokumenten en beskrivande etikett. Detta visar att märkning med hjälp av sekundära källor har potential, men behöver undersökas ytterligare.
678

Hierarchical Joint Entity Recognition and Relation Extraction of Contextual Entities in Family History Records

Segrera, Daniel 08 March 2023 (has links) (PDF)
Entity extraction is an important step in document understanding. Higher accuracy entity extraction on fine-grained entities can be achieved by combining the utility of Named Entity Recognition (NER) and Relation Extraction (RE) models. In this paper, a cascading model is proposed that implements NER and Relation extraction. This model utilizes relations between entities to infer context-dependent fine-grain named entities in text corpora. The RE module runs independent of the NER module, which reduces error accumulation from sequential steps. This process improves on the fine-grained NER F1-score of existing state-of-the-art from .4753 to .8563 on our data, albeit on a strictly limited domain. This provides the potential for further applications in historical document processing. These applications will enable automated searching of historical documents, such as those used in economics research and family history.
679

An Evaluation of a Linguistically Motivated Conversational Software Agent Framework

Panesar, Kulvinder 05 October 2020 (has links)
yes / This paper presents a critical evaluation framework for a linguistically motivated conversational software agent (CSA). The CSA prototype investigates the integration, intersection and interface of the language, knowledge, and speech act constructions (SAC) based on a grammatical object, and the sub-model of belief, desires and intention (BDI) and dialogue management (DM) for natural language processing (NLP). A long-standing issue within NLP CSA systems is refining the accuracy of interpretation to provide realistic dialogue to support human-to-computer communication. This prototype constitutes three phase models: (1) a linguistic model based on a functional linguistic theory – Role and Reference Grammar (RRG), (2) an Agent Cognitive Model with two inner models: (a) a knowledge representation model, (b) a planning model underpinned by BDI concepts, intentionality and rational interaction, and (3) a dialogue model. The evaluation strategy for this Java-based prototype is multi-approach driven by grammatical testing (English language utterances), software engineering and agent practice. A set of evaluation criteria are grouped per phase model, and the testing framework aims to test the interface, intersection and integration of all phase models. The empirical evaluations demonstrate that the CSA is a proof-of-concept, demonstrating RRG’s fitness for purpose for describing, and explaining phenomena, language processing and knowledge, and computational adequacy. Contrastingly, evaluations identify the complexity of lower level computational mappings of NL – agent to ontology with semantic gaps, and further addressed by a lexical bridging solution.
680

Improving Automatic Transcription Using Natural Language Processing

Kiefer, Anna 01 March 2024 (has links) (PDF)
Digital Democracy is a CalMatters and California Polytechnic State University initia-tive to promote transparency in state government by increasing access to the Califor-nia legislature. While Digital Democracy is made up of many resources, one founda-tional step of the project is obtaining accurate, timely transcripts of California Senateand Assembly hearings. The information extracted from these transcripts providescrucial data for subsequent steps in the pipeline. In the context of Digital Democracy,upleveling is when humans verify, correct, and annotate the transcript results afterthe legislative hearings have been automatically transcribed. The upleveling processis done with the assistance of a software application called the Transcription Tool.The human upleveling process is the most costly and time-consuming step of the Dig-ital Democracy pipeline. In this thesis, we hypothesize that we can make significantreductions to the time needed for upleveling by using Natural Language Processing(NLP) systems and techniques. The main contribution of this thesis is engineeringa new automatic transcription pipeline. Specifically, this thesis integrates a new au-tomatic speech recognition service, a new speaker diarization model, additional textpost-processing changes, and a new process for speaker identification. To evaluate the system’s improvements, we measure the accuracy and speed of the newly integrated features and record editor upleveling time both before and after the additions.

Page generated in 0.0953 seconds