Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
571 |
A thesis that writes itself : On the threat of AI-generated essays within academiaOlsson, August, Engelbrektsson, Oscar January 2022 (has links)
Historically, cheating in universities has been limited to smuggling notes into exams, unauthorized cooperation, plagiarism and using ghost writers. New improvements in natural language processing now allow students to easily generate text, that is both unique and, in many ways, indistinguishable from what a human would create. These texts can then be submitted with little to no risk of getting caught by anti-cheating software. There are currently a multitude of such text generators online, which vary in ease of use, cost and capabilities. They are capable enough to generate unique text which will evade plagiarism-tools employed by universities. If you combine relatively cheap pricing, ease of use, pressure to perform well in school and low risk of detection. It is not too difficult to imagine that students will use tools like these to cheat. This thesis mainly focuses on whether humans can differentiate AI-generated essays from human written ones and what countermeasures can be used to hinder its use. By giving teachers at Halmstad University human and AI-generated text; then asking them to guess the source of text presented. The experiment concluded that teachers' ability to differentiate AI-generated text from human written text could not be proven. This thesis also surveys the currently available detection methods for AI-generated text and determines that they are not sufficient in their current form. Lastly, this thesis showcases alternative examination methods that could be used instead of essay-style examinations.
|
572 |
Summarization and keyword extraction on customer feedback data : Comparing different unsupervised methods for extracting trends and insight from textSkoghäll, Therése, Öhman, David January 2022 (has links)
Polestar has during the last couple of months more than doubled its amount of customer feedback, and the forecast for the future is that this amount will increase even more. Manually reading this feedback is expensive and time-consuming, and for this reason there's a need to automatically analyse the customer feedback. The company wants to understand the customer and extract trends and topics that concerns the consumer in order to improve the customer experience. Over the last couple of years as Natural Language Processing developed immensely, new state of the art language models have pushed the boundaries in all type of benchmark tasks. In this thesis have three different extractive summarization models and three different keyword extraction methods been tested and evaluated based on two different quantitative measures and human evaluation to extract information from text. This master thesis has shown that extractive summarization models with a Transformer-based text representation are best at capturing the context in a text. Based on the quantitative results and the company's needs, Textrank with a Transformer-based embedding was chosen as the final extractive summarization model. For Keywords extraction was the best overall model YAKE!, based on the quantitative measure and human validation
|
573 |
Information extraction from text recipes in a web format / Informationsextraktion ur textrecept i webbformatStorby, Johan January 2016 (has links)
Searching the Internet for recipes to find interesting ideas for meals to prepare is getting increasingly popular. It can however be difficult to find a recipe for a dish that can be prepared with the items someone has available at home. In this thesis a solution to a part of that problem will be presented. This thesis will investigate a method for extracting the various parts of a recipe from the Internet in order to save them and build a searchable database of recipes where users can search for recipes based on the ingredients they have available. The system works for both English and Swedish and is able identify both languages. This is a problem within Natural Language Processing and the subfield Information Extraction. To solve the Information Extraction problem rule-based techniques based on Named Entity Recognition, Content Extraction and general rule-based extraction are used. The results indicate a generally good but not flawless functionality. For English the rule-based algorithm achieved an F1-score of 83.8% for ingredient identification, 94.5% for identification of cooking instructions and an accuracy of 88.0% and 96.4% for cooking time and number of portions respectively. For Swedish the ingredient identification worked slightly better but the other parts worked slightly worse. The results are comparable to the results of other similar methods and can hence be considered good, they are however not good enough for the system to be used independently without a supervising human. / Att söka på Internet efter recept för att hitta intressanta idéer till måltider att laga blir allt populärare. Det kan dock vara svårt att hitta ett recept till en maträtt som kan tillagas med råvarorna som finns hemma. I detta examensarbete kommer en lösning på en del av detta problem att presenteras. Detta examensarbete undersöker en metod för att extrahera de olika delarna av ett recept från Internet för att spara dem och fylla en sökbar databas av recept där användarna kan söka efter recept baserat på de ingredienser som de har till förfogande. Systemet fungerar för både engelska och svenska och kan identifiera båda språken. Detta är ett problem inom språkteknologi och delfältet informationsextraktion. För att lösa informationsextraktionsproblemet använder vi regelbaserade metoder baserade på entitetsigenkänning, metoder för extraktion av brödtext samt allmäna regelbaserade extraktionsmetoder. Resultaten visar på en generellt bra men inte felfri funktionalitet. För engelska har den regelbaserade algoritmen uppnått ett F1-värde av 83,8 % för ingrediensidentifiering, 94,5 % för identifiering av tillagningsinstruktioner och en träffsäkerhet på 88,0 % och 96,4 % för tillagningstid och antal portioner. För svenska fungerade ingrediensidentifieringen något bättre än för engelska men de andra delarna fungerade något sämre. Resultaten är jämförbara med resultaten för andra liknande metoder och kan därmed betraktas som goda, de är dock inte tillräckligt bra för att systemet skall kunna användas självständigt utan en övervakande människa.
|
574 |
Efficiency analysis of verbal radio communication in air combat simulation / Effektivitetsanalys av verbal radiokommunikation i luftstridssimuleringLilja, Hanna January 2016 (has links)
Efficient communication is an essential part of cooperative work, and no less so in the case of radio communication during air combat. With time being a limited resource and the consequences of a misunderstanding potentially fatal there is little room for negligence. This work is an exploratory study which combines data mining, machine learning, natural language processing and visual analytics in an effort to investigate the possibilities of using radio traffic data from air combat simulations for human performance evaluation. Both temporal and linguistic properties of the communication were analyzed, with several promising graphical results. Additionally, utterance classification was successfully attempted with mean precision and recall both over 0.9. It is hoped that more complex and to a larger extent automated data based communication analysis can be built upon the results presented in this report. / Effektiv kommunikation är en grundläggande del av god samarbetsförmåga, inte minst när det gäller radiokommunikation under luftstrid. När tid är en begränsad resurs och ett missförstånd kan få fatala följder finns inte mycket utrymme för slarv. Det här arbetet är en utforskande studie som kombinerar data mining, maskininlärning, natural language processing och visuell dataanalys i syfte att undersöka hur radiotrafikdata från luftstridssimulering skulle kunna användas för prestationsutvärdering. Såväl tidsrelaterade som språkliga egenskaper hos kommunikationen har analyserats och flera av visualiseringarna ser lovande ut. Vidare prövades med framgång att klassificera yttranden, med genomsnittlig precision och täckning över 0.9. Förhoppningen är att de resultat som presenteras i rapporten ska kunna användas som grund för vidareutveckling av mer djupgående och i större utsträckning automatiserad databaserad kommunikationsanalys.
|
575 |
Tool for linguistic quality evaluation of student texts / Verktyg för språklig utvärdering av studenttexterKärde, Wilhelm January 2015 (has links)
Spell checkers are nowadays a common occurrence in most editors. A student writing an essay in school will often have the availability of a spell checker. However, the feedback from a spell checker seldom correlates with the feedback from a teacher. A reason for this being that the teacher has more aspects on which it evaluates a text. The teacher will, as opposed to the the spell checker, evaluate a text based on aspects such as genre adaptation, structure and word variation. This thesis evaluates how well those aspects translate to NLP (Natural Language Processing) and implements those who translate well into a rule based solution called Granska. / Grammatikgranskare finns numera tillgängligt i de flesta ordbehandlare. En student som skriver en uppsats har allt som oftast tillgång till en grammatikgranskare. Dock så skiljer det sig mycket mellan den återkoppling som studenten får från grammatikgranskaren respektive läraren. Detta då läraren ofta har fler aspekter som den använder sig av vid bedömingen utav en elevtext. Läraren, till skillnad från grammatikgranskaren, bedömmer en text på aspekter så som hur väl texten hör till en viss genre, dess struktur och ordvariation. Denna uppsats utforskar hur pass väl dessa aspekter går att anpassas till NLP (Natural Language Processing) och implementerar de som passar väl in i en regelbaserad lösning som heter Granska.
|
576 |
Compositional Matrix-Space Models: Learning Methods and EvaluationAsaadi, Shima 13 October 2020 (has links)
There has been a lot of research on machine-readable representations of words for natural language processing (NLP). One mainstream paradigm for the word meaning representation comprises vector-space models obtained from the distributional information of words in the text. Machine learning techniques have been proposed to produce such word representations for computational linguistic tasks. Moreover, the representation of multi-word structures, such as phrases, in vector space can arguably be achieved by composing the distributional representation of the constituent words. To this end, mathematical operations have been introduced as composition methods in vector space. An alternative approach to word representation and semantic compositionality in natural language has been compositional matrix-space models. In this thesis, two research directions are considered. In the first, considering compositional matrix-space models, we explore word meaning representations and semantic composition of multi-word structures in matrix space. The main motivation for working on these models is that they have shown superiority over vector-space models regarding several properties. The most important property is that the composition operation in matrix-space models can be defined as standard matrix multiplication; in contrast to common vector space composition operations, this is sensitive to word order in language. We design and develop machine learning techniques that induce continuous and numeric representations of natural language in matrix space. The main goal in introducing representation models is enabling NLP systems to understand natural language to solve multiple related tasks. Therefore, first, different supervised machine learning approaches to train word meaning representations and capture the compositionality of multi-word structures using the matrix multiplication of words are proposed. The performance of matrix representation models learned by machine learning techniques is investigated in solving two NLP tasks, namely, sentiment analysis and compositionality detection. Then, learning techniques for learning matrix-space models are proposed that introduce generic task-agnostic representation models, also called word matrix embeddings. In these techniques, word matrices are trained using the distributional information of words in a given text corpus. We show the effectiveness of these models in the compositional representation of multi-word structures in natural language.
The second research direction in this thesis explores effective approaches for evaluating the capability of semantic composition methods in capturing the meaning representation of compositional multi-word structures, such as phrases. A common evaluation approach is examining the ability of the methods in capturing the semantic relatedness between linguistic units. The underlying assumption is that the more accurately a method of semantic composition can determine the representation of a phrase, the more accurately it can determine the relatedness of that phrase with other phrases. To apply the semantic relatedness approach, gold standard datasets have been introduced. In this thesis, we identify the limitations of the existing datasets and develop a new gold standard semantic relatedness dataset, which addresses the issues of the existing datasets. The proposed dataset allows us to evaluate meaning composition in vector- and matrix-space models.
|
577 |
Enhancing Relevant Region ClassifyingKarlsson, Thomas January 2011 (has links)
In this thesis we present a new way of extracting relevant data from texts. We use the method presented in the paper by Patwardhan and Rilo (2007), with improvements of our own. Our approach modifes the input to the support vector machine, to construct a self-trained relevant sentence classi er. This classffer is used to identify relevant sentences on the MUC-4 terrorism corpus.We modify the input by removing stopwords, converting words to its stem and only using words that occur at least three times in the corpus. We also changed how each word is weighted, using TF x IDF as weighting function. By using the relevant sentence classiffer together with domain relevant extraction patterns, we achieved higher performance on the MUC-4 terrorism corpus than the original model.
|
578 |
Gamers with the Purpose of Language Resource Acquisition : Personas and Scenarios for the players of Language Resourcing Games-With-A-PurposeDroutsas, Nikolaos January 2021 (has links)
Ethical, cheap, and scalable, purposeful games leverage player entertainment to incentivise contributors in language resourcing. However, discourse is scarce around the enjoyability of these games, whose playerbases are divided between a tiny minority of reliable contributors and a vast majority of inconsistent contributors. This study aims to deepen the discourse around design possibilities tailored to the unevenly contributing playerbases of such games by building on player-reported data to create three engaging personas and narrative scenarios. Using Pruitt and Grudin’s way of weighing feature suitability in persona-focused design, social incentives and majority voting are indicated as the most and least prominent features, respectively. Indeed, the weight of the primary persona, representing 3.5% of the playerbase, is 72%, more than double the combined weight, 56%, of the remaining 96.5% of the playerbase. Sticking to the original definition of purposeful games is essential for any gaming approach to crowdsourced data collection to remain ethical, cheap, and scalable.
|
579 |
Multipurpose Case-Based Reasoning System, Using Natural Language ProcessingAugustsson, Christopher January 2021 (has links)
Working as a field technician of any sort can many times be a challenging task. Often you find yourself alone, with a machine you have limited knowledge about, and the only support you have are the user manuals. As a result, it is not uncommon for companies to aid the technicians with a knowledge base that often revolves around some share point. But, unfortunately, the share points quickly get cluttered with too much information that leaves the user overwhelmed. Case-based reasoning (CBR), a form of problem-solving technology, uses previous cases to help users solve new problems they encounter, which could benefit the field technician. But for a CBR system to work with a wide variety of machines, the system must have a dynamic nature and handle multiple data types. By developing a prototype focusing on case retrieval, based on .Net core and MySql, this report sets the foundation for a highly dynamic CBR system that uses natural language processing to map case attributes during case retrieval. In addition, using datasets from UCI and Kaggle, the system's accuracy is validated, and by using a dataset created explicitly for this report, the system manifest to be robust.
|
580 |
Improving Solr search with Natural Language Processing : An NLP implementation for information retrieval in Solr / Att förbättra Solr med Natural Language ProcessingLager, Adam January 2021 (has links)
The field of AI is emerging fast and institutions and companies are pushing the limits of impossibility. Natural Language Processing is a branch of AI where the goal is to understand human speech and/or text. This technology is used to improve an inverted index,the full text search engine Solr. Solr is open source and has integrated OpenNLP makingit a suitable choice for these kinds of operations. NLP-enabled Solr showed great results compared to the Solr that’s currently running on the systems, where NLP-Solr was slightly worse in terms of precision, it excelled at recall and returning the correct documents.
|
Page generated in 0.0839 seconds