Spelling suggestions: "subject:"[een] NLP"" "subject:"[enn] NLP""
141 |
Evaluation of the correlation between test cases dependency and their semantic text similarityAndersson, Filip January 2020 (has links)
An important step in developing software is to test the system thoroughly. Testing software requires a generation of test cases that can reach large numbers and is important to be performed in the correct order. Certain information is critical to know to schedule the test cases incorrectly order and isn’t always available. This leads to a lot of required manual work and valuable resources to get correct. By instead analyzing their test specification it could be possible to detect the functional dependencies between test cases. This study presents a natural language processing (NLP) based approach and performs cluster analysis on a set of test cases to evaluate the correlation between test case dependencies and their semantic similarities. After an initial feature selection, the test cases’ similarities are calculated through the Cosine distance function. The result of the similarity calculation is then clustered using the HDBSCAN clustering algorithm. The clusters would represent test cases’ relations where test cases with close similarities are put in the same cluster as they were expected to share dependencies. The clusters are then validated with a Ground Truth containing the correct dependencies. The result is an F-Score of 0.7741. The approach in this study is used on an industrial testing project at Bombardier Transportation in Sweden.
|
142 |
Analýza textových používateľských hodnotení vybranej skupiny produktovValovič, Roman January 2019 (has links)
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
|
143 |
Will Svenska Akademiens Ordlista Improve Swedish Word Embeddings?Ahlberg, Ellen January 2022 (has links)
Unsupervised word embedding methods are frequently used for natural language processing applications. However, the unsupervised methods overlook known lexical relations that can be of value to capture accurate semantic word relations. This thesis aims to explore if Swedish word embeddings can benefit from prior known linguistic information. Four knowledge graphs extracted from Svenska Akademiens ordlista (SAOL) are incorporated during the training process using the Probabilistic Word Embeddings with Laplacian Priors (PELP) model. The four implemented PELP models are compared with baseline results to evaluate the use of side information. The results suggest that various lexical relations in SAOL are of interest to generate more accurate Swedish word embeddings.
|
144 |
Nyhetsaggregator med sentimentanalysCarlsson, Claude, Germer, Edvin January 2022 (has links)
Eftersom mental ohälsa stiger i samhället och forskningen inte har ett tydligt svar så har vi i detta projekt formulerat en egen hypotes om varför vi ser den här trenden. Eftersom nyhetstjänster tjänar på att publicera negativa artiklar så leder det till att fler konsumerar negativa nyheter. Målet med projektet är att ta fram en nyhetsaggregator som utför sentimentanalys påaktuella nyheter från Aftonbladet, Expressen, DN och SVT där nyheterna kategoriseras i positiva, neutrala och negativa nyheter. Nyheterna samlas in med en egenutvecklad webskrapare som hämtar nyheterna från respektive källa. Sedan laddas nyheterna upp på en databas och bearbetas sedan för maskininlärning. För klassificering av nyhetsartiklar har vi tränat ett neuralt nätverk som utför klassificering av nyheter i det allmänna nyhetsflödet. Vi har även utvecklat en egen lexikonbaserad modell som är unik för varje användare för att kunna predicera användarspecifika sentiment. Resultat är en egendesignad hemsida med ett allmänt nyhetsflöde, samt ett anpassat flöde för registrerade användare, där man med ett reglage kan reglera vilken typ av nyheter och från vilka nyhetssajter som man vill se nyheter. På hemsidan presenteras även statistik över bland annat hur fördelningen av positiva, neutrala och negativa nyheter är på de olika nyhetssajterna
|
145 |
Towards the creation of a Clinical SummarizerGunnarsson, Axel January 2022 (has links)
While Electronic Medical Records provide extensive information about patients, the vast amounts of data cause issues in attempts to quickly retrieve valuable information needed to make accurate assumptions and decisions directly concerned with patients’ health. This search process is naturally time-consuming and forces health professionals to focus on a labor intensive task that diverts their attention from the main task of applying their knowledge to save lives. With the general aim of potentially relieving the professionals from this task of finding information needed for an operational decision, this thesis explores the use of a general BERT model for extractive summarization of Swedish medical records to investigate its capability in extracting sentences that convey important information to MRI physicists. To achieve this, a domain expert evaluation of medical histories was performed, creating the references summaries that were used for model evaluation. Three implementations are included in this study and one of which is TextRank, a prominent unsupervised approach to extractive summarization. The other two are based on clustering and rely on BERT to encode the text. The implementations are then evaluated using ROUGE metrics. The results support the use of a general BERT model for extractive summarization on medical records. Furthermore, the results are discussed in relation to the collected reference summaries, leading to a discussion about potential improvements to be made with regards to the domain expert evaluation, as well as the possibilities for future work on the topic of summarization of clinical documents.
|
146 |
Information extraction from text recipes in a web format / Informationsextraktion ur textrecept i webbformatStorby, Johan January 2016 (has links)
Searching the Internet for recipes to find interesting ideas for meals to prepare is getting increasingly popular. It can however be difficult to find a recipe for a dish that can be prepared with the items someone has available at home. In this thesis a solution to a part of that problem will be presented. This thesis will investigate a method for extracting the various parts of a recipe from the Internet in order to save them and build a searchable database of recipes where users can search for recipes based on the ingredients they have available. The system works for both English and Swedish and is able identify both languages. This is a problem within Natural Language Processing and the subfield Information Extraction. To solve the Information Extraction problem rule-based techniques based on Named Entity Recognition, Content Extraction and general rule-based extraction are used. The results indicate a generally good but not flawless functionality. For English the rule-based algorithm achieved an F1-score of 83.8% for ingredient identification, 94.5% for identification of cooking instructions and an accuracy of 88.0% and 96.4% for cooking time and number of portions respectively. For Swedish the ingredient identification worked slightly better but the other parts worked slightly worse. The results are comparable to the results of other similar methods and can hence be considered good, they are however not good enough for the system to be used independently without a supervising human. / Att söka på Internet efter recept för att hitta intressanta idéer till måltider att laga blir allt populärare. Det kan dock vara svårt att hitta ett recept till en maträtt som kan tillagas med råvarorna som finns hemma. I detta examensarbete kommer en lösning på en del av detta problem att presenteras. Detta examensarbete undersöker en metod för att extrahera de olika delarna av ett recept från Internet för att spara dem och fylla en sökbar databas av recept där användarna kan söka efter recept baserat på de ingredienser som de har till förfogande. Systemet fungerar för både engelska och svenska och kan identifiera båda språken. Detta är ett problem inom språkteknologi och delfältet informationsextraktion. För att lösa informationsextraktionsproblemet använder vi regelbaserade metoder baserade på entitetsigenkänning, metoder för extraktion av brödtext samt allmäna regelbaserade extraktionsmetoder. Resultaten visar på en generellt bra men inte felfri funktionalitet. För engelska har den regelbaserade algoritmen uppnått ett F1-värde av 83,8 % för ingrediensidentifiering, 94,5 % för identifiering av tillagningsinstruktioner och en träffsäkerhet på 88,0 % och 96,4 % för tillagningstid och antal portioner. För svenska fungerade ingrediensidentifieringen något bättre än för engelska men de andra delarna fungerade något sämre. Resultaten är jämförbara med resultaten för andra liknande metoder och kan därmed betraktas som goda, de är dock inte tillräckligt bra för att systemet skall kunna användas självständigt utan en övervakande människa.
|
147 |
ClarQue: Chatbot Recognizing Ambiguity in the Conversation and Asking Clarifying QuestionsMody, Shreeya Himanshu 31 July 2020 (has links)
Recognizing when we need more information and asking clarifying questions are integral to communication in our day to day life. It helps us complete our mental model of the world and eliminate confusion. Chatbots need this technique to meaningfully collaborate with humans. We have investigated a process to generate an automated system that mimics human communication behavior using knowledge graphs, weights, an ambiguity test, and a response generator. It can take input dialog text and based on the chatbot's knowledge about the world and the user it can decide if it has enough information or if it requires more. Based on that decision, the chatbot generates a dialog output text which can be an answer if a question is asked, a statement if there are no doubts or if there is any ambiguity, it generates a clarifying question. The effectiveness of these features has been backed up by an empirical study which suggests that they are very useful in a chatbot not only for crucial information retrial but also for keeping the flow and context of the conversation intact.
|
148 |
Unsupervised Topic Modeling to Improve Stormwater InvestigationsArvidsson, David January 2022 (has links)
Stormwater investigations are an important part of the detail plan that is necessary for companies and industries to write. The detail plan is used to show that an area is well suited for among other things, construction. Writing these detail plans is a costly and time consuming process and it is not uncommon they get rejected. This is because it is difficult to find information about the criteria you need to meet and what you need to address within the investigation. This thesis aims to make this problem less ambiguous by applying the topic modeling algorithm LDA (latent Dirichlet allocation) in order to identify the structure of stormwater investigations. Moreover, sentences that contain words from the topic modeling will be extracted to give each word a perspective of how it can be used in the context of writing a stormwater investigation. Finally a knowledge graph will be created with the extracted topics and sentences. The result of this study indicates that topic modeling and NLP (natural language processing) can be used to identify the structure of stormwater investigations. Furthermore it can also be used to extract useful information that can be used as a guidance when learning and writing stormwater investigations.
|
149 |
Integración de fuentes de información formales e informales para la identificación del foco geográfico en el textoPeregrino Torregrosa, Fernando 23 September 2016 (has links)
No description available.
|
150 |
Word Vector Representations using Shallow Neural NetworksAdewumi, Oluwatosin January 2021 (has links)
This work highlights some important factors for consideration when developing word vector representations and data-driven conversational systems. The neural network methods for creating word embeddings have gained more prominence than their older, count-based counterparts.However, there are still challenges, such as prolonged training time and the need for more data, especially with deep neural networks. Shallow neural networks with lesser depth appear to have the advantage of less complexity, however, they also face challenges, such as sub-optimal combination of hyper-parameters which produce sub-optimal models. This work, therefore, investigates the following research questions: "How importantly do hyper-parameters influence word embeddings’ performance?" and "What factors are important for developing ethical and robust conversational systems?" In answering the questions, various experiments were conducted using different datasets in different studies. The first study investigates, empirically, various hyper-parameter combinations for creating word vectors and their impact on a few natural language processing (NLP) downstream tasks: named entity recognition (NER) and sentiment analysis (SA). The study shows that optimal performance of embeddings for downstream \acrshort{nlp} tasks depends on the task at hand.It also shows that certain combinations give strong performance across the tasks chosen for the study. Furthermore, it shows that reasonably smaller corpora are sufficient or even produce better models in some cases and take less time to train and load. This is important, especially now that environmental considerations play prominent role in ethical research. Subsequent studies build on the findings of the first and explore the hyper-parameter combinations for Swedish and English embeddings for the downstream NER task. The second study presents the new Swedish analogy test set for evaluation of Swedish embeddings. Furthermore, it shows that character n-grams are useful for Swedish, a morphologically rich language. The third study shows that broad coverage of topics in a corpus appears to be important to produce better embeddings and that noise may be helpful in certain instances, though they are generally harmful. Hence, relatively smaller corpus can show better performance than a larger one, as demonstrated in the work with the smaller Swedish Wikipedia corpus against the Swedish Gigaword. The argument is made, in the final study (in answering the second question) from the point of view of the philosophy of science, that the near-elimination of the presence of unwanted bias in training data and the use of foralike the peer-review, conferences, and journals to provide the necessary avenues for criticism and feedback are instrumental for the development of ethical and robust conversational systems.
|
Page generated in 0.0558 seconds