• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 929
  • 156
  • 74
  • 55
  • 27
  • 23
  • 18
  • 13
  • 10
  • 9
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1601
  • 1601
  • 1601
  • 622
  • 565
  • 464
  • 383
  • 376
  • 266
  • 256
  • 245
  • 228
  • 221
  • 208
  • 204
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
571

Enhancing Relevant Region Classifying

Karlsson, Thomas January 2011 (has links)
In this thesis we present a new way of extracting relevant data from texts. We use the method presented in the paper by Patwardhan and Rilo (2007), with improvements of our own. Our approach modifes the input to the support vector machine, to construct a self-trained relevant sentence classi er. This classffer is used to identify relevant sentences on the MUC-4 terrorism corpus.We modify the input by removing stopwords, converting words to its stem and only using words that occur at least three times in the corpus. We also changed how each word is weighted, using TF x IDF as weighting function. By using the relevant sentence classiffer together with domain relevant extraction patterns, we achieved higher performance on the MUC-4 terrorism corpus than the original model.
572

Gamers with the Purpose of Language Resource Acquisition : Personas and Scenarios for the players of Language Resourcing Games-With-A-Purpose

Droutsas, Nikolaos January 2021 (has links)
Ethical, cheap, and scalable, purposeful games leverage player entertainment to incentivise contributors in language resourcing. However, discourse is scarce around the enjoyability of these games, whose playerbases are divided between a tiny minority of reliable contributors and a vast majority of inconsistent contributors. This study aims to deepen the discourse around design possibilities tailored to the unevenly contributing playerbases of such games by building on player-reported data to create three engaging personas and narrative scenarios. Using Pruitt and Grudin’s way of weighing feature suitability in persona-focused design, social incentives and majority voting are indicated as the most and least prominent features, respectively. Indeed, the weight of the primary persona, representing 3.5% of the playerbase, is 72%, more than double the combined weight, 56%, of the remaining 96.5% of the playerbase. Sticking to the original definition of purposeful games is essential for any gaming approach to crowdsourced data collection to remain ethical, cheap, and scalable.
573

Multipurpose Case-Based Reasoning System, Using Natural Language Processing

Augustsson, Christopher January 2021 (has links)
Working as a field technician of any sort can many times be a challenging task. Often you find yourself alone, with a machine you have limited knowledge about, and the only support you have are the user manuals. As a result, it is not uncommon for companies to aid the technicians with a knowledge base that often revolves around some share point. But, unfortunately, the share points quickly get cluttered with too much information that leaves the user overwhelmed. Case-based reasoning (CBR), a form of problem-solving technology, uses previous cases to help users solve new problems they encounter, which could benefit the field technician. But for a CBR system to work with a wide variety of machines, the system must have a dynamic nature and handle multiple data types. By developing a prototype focusing on case retrieval, based on .Net core and MySql, this report sets the foundation for a highly dynamic CBR system that uses natural language processing to map case attributes during case retrieval. In addition, using datasets from UCI and Kaggle, the system's accuracy is validated, and by using a dataset created explicitly for this report, the system manifest to be robust.
574

Improving Solr search with Natural Language Processing : An NLP implementation for information retrieval in Solr / Att förbättra Solr med Natural Language Processing

Lager, Adam January 2021 (has links)
The field of AI is emerging fast and institutions and companies are pushing the limits of impossibility. Natural Language Processing is a branch of AI where the goal is to understand human speech and/or text. This technology is used to improve an inverted index,the full text search engine Solr. Solr is open source and has integrated OpenNLP makingit a suitable choice for these kinds of operations. NLP-enabled Solr showed great results compared to the Solr that’s currently running on the systems, where NLP-Solr was slightly worse in terms of precision, it excelled at recall and returning the correct documents.
575

Autoformalization of Mathematical Proofs from Natural Language to Proof Assistants

Cunningham, Garett 04 May 2022 (has links)
No description available.
576

Email Classification with Machine Learning and Word Embeddings for Improved Customer Support

Rosander, Oliver, Ahlstrand, Jim January 2018 (has links)
Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together. In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network. The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails.
577

Simulating Expert Clinical Comprehension: Adapting Latent Semantic Analysis to Accurately Extract Clinical Concepts From Psychiatric Narrative

Cohen, Trevor, Blatter, Brett, Patel, Vimla 01 December 2008 (has links)
Cognitive studies reveal that less-than-expert clinicians are less able to recognize meaningful patterns of data in clinical narratives. Accordingly, psychiatric residents early in training fail to attend to information that is relevant to diagnosis and the assessment of dangerousness. This manuscript presents cognitively motivated methodology for the simulation of expert ability to organize relevant findings supporting intermediate diagnostic hypotheses. Latent Semantic Analysis is used to generate a semantic space from which meaningful associations between psychiatric terms are derived. Diagnostically meaningful clusters are modeled as geometric structures within this space and compared to elements of psychiatric narrative text using semantic distance measures. A learning algorithm is defined that alters components of these geometric structures in response to labeled training data. Extraction and classification of relevant text segments is evaluated against expert annotation, with system-rater agreement approximating rater-rater agreement. A range of biomedical informatics applications for these methods are suggested.
578

Locating SQL Injection Vulnerabilities in Java Byte Code Using Natural Language Techniques

Jackson, Kevin A., Bennett, Brian T. 01 October 2018 (has links)
With so much our daily lives relying on digital devices like personal computers and cell phones, there is a growing demand for code that not only functions properly, but is secure and keeps user data safe. However, ensuring this is not such an easy task, and many developers do not have the required skills or resources to ensure their code is secure. Many code analysis tools have been written to find vulnerabilities in newly developed code, but this technology tends to produce many false positives, and is still not able to identify all of the problems. Other methods of finding software vulnerabilities automatically are required. This proof-of-concept study applied natural language processing on Java byte code to locate SQL injection vulnerabilities in a Java program. Preliminary findings show that, due to the high number of terms in the dataset, using singular decision trees will not produce a suitable model for locating SQL injection vulnerabilities, while random forest structures proved more promising. Still, further work is needed to determine the best classification tool.
579

Symbolic Semantic Memory in Transformer Language Models

Morain, Robert Kenneth 16 March 2022 (has links)
This paper demonstrates how transformer language models can be improved by giving them access to relevant structured data extracted from a knowledge base. The knowledge base preparation process and modifications to transformer models are explained. We evaluate these methods on language modeling and question answering tasks. These results show that even simple additional knowledge augmentation leads to a reduction in validation loss by 73%. These methods also significantly outperform common ways of improving language models such as increasing the model size or adding more data.
580

Automatic language identification of short texts

Avenberg, Anna January 2020 (has links)
The world is growing more connected through the use of online communication, exposing software and humans to all the world's languages. While devices are able to understand and share the raw data between themselves and with humans, the information itself is not expressed in a monolithic format. This causes issues both in the human to computer interaction and human to human communication. Automatic language identification (LID) is a field within artificial intelligence and natural language processing that strives to solve a part of these issues by identifying languages from text, sign language and speech. One of the challenges is to identify the short pieces of text that can be found online, such as messages, comments and posts on social media. This is due to the small amount of information they carry. The goal of this thesis has been to build a machine learning model that can identify the language for these short pieces of text. A long short-term memory (LSTM) machine learning model was built and benchmarked towards Facebook's fastText model. The results show how the LSTM model reached an accuracy of around 95% and the fastText model used as comparison reached an accuracy of 97%. The LSTM model struggled more when identifying texts shorter than 50 characters than with longer text. The classification performance of the LSTM model was also relatively poor in cases where languages were similar, like Croatian and Serbian. Both the LSTM model and the fastText model reached accuracy's above 94% which can be considered high, depending on how it is evaluated. There are however many improvements and possible future work to be considered; looking further into texts shorter than 50 characters, evaluating the model's softmax output vector values and how to handle similar languages.

Page generated in 0.153 seconds