Global ETD Search

331	Natural Language Processing for Swedish Nuclear Power Plants : A study of the challenges of applying Natural language processing in Operations and Maintenance and how BERT can be used in this industry Kåhrström, Felix January 2022 (has links) In this study, the current use of natural language processing in Swedish and international nuclear power plants has been investigated through semi-structured interviews. Furthermore, natural language processing techniques have been studied to find out how text data can be analyzed and utilized to aid operations and maintenance in the Swedish nuclear power plant industry. The state-of-the-art transformers model BERT was used to analyze text data from operations at a Swedish nuclear power plant. This study has not managed to find any current implementations of natural language processing techniques for operations and maintenance in Swedish nuclear power plants. Natural language processing does exist in examples such as embedded search functionalities internally or chatbots on the customer side, but it does not relate to the scope of this project. Some international actors have successfully implemented natural language processing for the classification of text data such as corrective action programs. Furthermore, it was observed that the lingo and jargon in the nuclear power plant industry differ between utilities as well as from the native language. To tackle this, models further trained on domain-specific data could be beneficial to better analyze the text data and solve natural language processing tasks. As the data used in this study was unlabeled, expert input from the nuclear domain is required for a proper analysis of the results. Working for a more data-driven industry would be valuable for the implementation of natural language processing. / I denna studie har den nuvarande användningen av Natural language processing (NLP) i svenska och internationella kärnkraftverk undersökts genom semistrukturerade intervjuer. Vidare har NLP studerats för att ta reda på hur textdata kan analyseras och användas för att underlätta drift och underhåll i den svenska kärnkraftsindustrin. Transformersmodellen BERT användes för att analysera textdata från driften vid ett svenskt kärnkraftverk. Denna studie har inte lyckats hitta några aktuella implementeringar av NLP för drift och underhåll i svenska kärnkraftverk. NLP finns som inbäddade sökfunktioner internt eller chatbottar på kundsidan, men dessa omfattas inte av detta projekt. Vissa internationella aktörer har framgångsrikt implementerat NLP för klassificering av textdata som t.ex. avhjälpande underhåll (Corrective action programs). Vidare observerades att språket och jargongen inom kärnkraftsindustrin skiljer sig mellan olika kraftverk och från det vanliga språket. Genom att träna modellerna på domänspecifik data skulle modellerna kunna prestera bättre. Eftersom data som användes i denna studie var omärkt (unlabeled), krävs expertinput från kärnkraftsområdet för en korrekt analys av resultaten. Att arbeta för en mer datadriven industri skulle vara värdefullt för implementeringen av NLP / Feasibility Study on Artificial Intelligence Technologies in Nuclear Applications NLP Natural language processing Machine learning BERT Nuclear power plant Computer Sciences Datavetenskap (datalogi)
332	Incels: Frustrated and Angry due to Deprivation of Intimacy : A Case Study of the Radicalisation Trajectories of an Online Community on a Fringe Social Media Platform Kiss, Aron January 2022 (has links) Technological advancements and affordability enable voicing of social injustice, feelings of deprivation, and oppression. Spatial barriers no longer pose obstacles to connecting with like-minded (or dissimilar) others to define and refine ingroup and outgroup. Some scholars anticipate that the internet liberates the discussion of opinions, others claim social networking platforms play a role in the polarisation of the public by creating echo chambers. However, it is recognised that ideas, ideologies, and social movements spread across the internet at an unprecedented pace. Connecting with others with whom one shares deprivation in a support network offers a sense of belonging. Broad scholarly literature addresses opinion polarisation and potential radicalisation in online social media platforms. However, quantifying radicalisation trajectories in fringe online communities like the misogynist incels are still to be done. In this thesis I study the online presence of the incel community. Incels are mostly young men who feel stigmatised and need to hide their incel existence. Incels voice their feelings of deprivation of a relationship and sex with a willing partner. This unfulfilled masculinity and sense of entitlement to sex cause frustration and anger which are vented in online forums blaming primarily women and feminism. Calls for action to social change, even for violence is common. However, incels do not unanimously consider violence a solution, many demonstrate the tame side of the so-called blackpilled mindset, the acceptance of powerlessness, and nihilism. Regardless, some scholars view the community as potentially dangerous to society, labelling them as terrorists. This study investigates whether participating registered users of the Incels.is website display increasing tendency toward expressing utterances with the themes of misogyny, harassment, nihilism, and moral outrage in their posted messages, and whether users gradually become more aligned with the general perception of incels in previous scholarly work. In other words, this work tests whether active participation increases the frequency of utterances of misogyny, harassment, and moral outrage, thus demonstrating a radicalisation tendency or increased nihilism. To answer the research question, I first scraped the Incels.is website, and retained ~5.38M posts published over 4 years for analysis. Next, a subset of posts was manually labelled to train a supervised text classification model (BERT). Finally, the results of the classification task were complemented with Ordinary Least Squares regression (n = 4623). The analyses uncover temporal user-level radicalisation trajectories, and increased nihilism. More specifically, the duration of active participation (in days) and the number of posted messages positively predict the count of moral outrage, misogynistic, harassing, and nihilistic content. fringe social media platforms echo chambers marginalised groups unfulfilled masculinity polarisation radicalisation incel NLP Social Sciences Interdisciplinary
333	Granskning av examensarbetesrapporter med IBM Watson molntjänster Eriksson, Patrik, Wester, Philip January 2018 (has links) Cloud services are one of the fast expanding fields of today. Companies such as Amazon, Google, Microsoft and IBM offer these cloud services in various forms. As this field progresses, the natural question occurs ”What can you do with the technology today?”. The technology offers scalability for hardware usage and user demands, that is attractive to developers and companies. This thesis tries to examine the applicability of cloud services, by combining it with the question: ”Is it possible to make an automated thesis examiner?” By narrowing down the services to IBM Watson web services, this thesis main question reads ”Is it possible to make an automated thesis examiner using IBM Watson?”. Hence the goal of this thesis was to create an automated thesis examiner. The project used a modified version of Bunge’s technological research method. Where amongst the first steps, a definition of an software thesis examiner for student theses was created. Then an empirical study of the Watson services, that seemed relevant from the literature study, proceeded. These empirical studies allowed a deeper understanding about the services’ practices and boundaries. From these implications and the definition of a software thesis examiner for student theses, an idea of how to build and implement an automated thesis examiner was created. Most of IBM Watson’s services were thoroughly evaluated, except for the service Machine Learning, that should have been studied further if the time resources would not have been depleted. This project found the Watson web services useful in many cases but did not find a service that was well suited for thesis examination. Although the goal was not reached, this thesis researched the Watson web services and can be used to improve understanding of its applicability, and for future implementations that face the provided definition. / Molntjänster är ett av de områden som utvecklas snabbast idag. Företag såsom Amazon, Google, Microsoft och IBM tillhandahåller dessa tjänster i flera former. Allteftersom utvecklingen tar fart, uppstår den naturliga frågan ”Vad kan man göra med den här tekniken idag?”. Tekniken erbjuder en skalbarhet mot använd hårdvara och antalet användare, som är attraktiv för utvecklare och företag. Det här examensarbetet försöker svara på hur molntjänster kan användas genom att kombinera det med frågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare?”. Genom att avgränsa undersökningen till IBM Watson molntjänster försöker arbetet huvudsakligen svara på huvudfrågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare med Watson molntjänster?”. Därmed var målet med arbetet att skapa en automatiserad examensarbetesrapportsgranskare. Projektet följde en modifierad version av Bunge’s teknologiska undersökningsmetod, där det första steget var att skapa en definition för en mjukvaruexamensarbetesrapportsgranskare följt av en utredning av de Watson molntjänster som ansågs relevanta från litteratur studien. Dessa undersöktes sedan vidare i empirisk studie. Genom de empiriska studierna skapades förståelse för tjänsternas tillämpligheter och begränsningar, för att kunna kartlägga hur de kan användas i en automatiserad examensarbetsrapportsgranskare. De flesta tjänster behandlades grundligt, förutom Machine Learning, som skulle behövt vidare undersökning om inte tidsresurserna tog slut. Projektet visar på att Watson molntjänster är användbara men inte perfekt anpassade för att granska examensarbetesrapporter. Även om inte målet uppnåddes, undersöktes Watson molntjänster, vilket kan ge förståelse för deras användbarhet och framtida implementationer för att möta den skapade definitionen. Natural Language Processing; NLP Computer and Information Sciences Data- och informationsvetenskap
334	Predictive maintenance using NLP and clustering support messages Yilmaz, Ugur January 2022 (has links) Communication with customers is a major part of customer experience as well as a great source of data mining. More businesses are engaging with consumers via text messages. Before 2020, 39% of businesses already use some form of text messaging to communicate with their consumers. Many more were expected to adopt the technology after 2020[1]. Email response rates are merely 8%, compared to a response rate of 45% for text messaging[2]. A significant portion of this communication involves customer enquiries or support messages sent in both directions. According to estimates, more than 80% of today’s data is stored in an unstructured format (suchas text, image, audio, or video) [3], with a significant portion of it being stated in ambiguous natural language. When analyzing such data, qualitative data analysis techniques are usually employed. In order to facilitate the automated examination of huge corpora of textual material, researchers have turned to natural language processing techniques[4]. Under the light of shared statistics above, Billogram[5] has decided that support messages between creditors and recipients can be mined for predictive maintenance purposes, such as early identification of an outlier like a bug, defect, or wrongly built feature. As one sentence goal definition, Billogram is looking for an answer to ”why are people reaching out to begin with?” This thesis project discusses implementing unsupervised clustering of support messages by benefiting from natural language processing methods as well as performance metrics of results to answer Billogram’s question. The research also contains intent recognition of clustered messages in two different ways, one automatic and one semi-manual, the results have been discussed and compared. LDA and manual intent assignment approach of the first research has 100 topics and a 0.293 coherence score. On the other hand, the second approach produced 158 clusters with UMAP and HDBSCAN while intent recognition was automatic. Creating clusters will help identifying issues which can be subjects of increased focus, automation, or even down-prioritizing. Therefore, this research lands in the predictive maintenance[9] area. This study, which will get better over time with more iterations in the company, also contains the preliminary work for ”labeling” or ”describing”clusters and their intents. Predictive maintenance support messages NLP unsupervised clustering intent recognition LDA UMAP HDBSCAN BERT Swedish BERT(KB-BERT) Billogram
335	Deep Learning Classification and Model Explainability for Prediction of Mental Health Patients Emergency Department Visit / Emergency Department Resource Prediction Using Explainable Deep Learning Rashidiani, Sajjad January 2022 (has links) The rate of Emergency Department (ED) visits due to mental health and drug abuse among children and youth has been increasing for more than a decade and is projected to become the leading cause of ED visits. Identifying high-risk patients well before an ED visit will enable mental health care providers to better predict ED resource utilization, improve their service, and ultimately reduce the risk of a future ED visit. Many studies in the literature utilized medical history to predict future hospitalization. However, in mental health care, the medical history of new patients is not always available from the first visit and it is crucial to identify high risk patients from the beginning as the rate of drop-out is very high in mental health treatment. In this study, a new approach of creating a text representation of questionnaire data for deep learning analysis is proposed. Employing this new text representation has enabled us to use transfer learning and develop a deep Natural Language Processing (NLP) model that estimates the possibility of 6-month ED visit among children and youth using mental health patient reported outcome measures (PROM). The proposed method achieved an Area Under Receiver Operating Characteristic Curve of 0.75 for classification of 6-month ED visit. In addition, a novel method was proposed to identify the words that carry the highest amount of information related to the outcome of the deep NLP models. This measurement of word information using Entropy Gain increases the explainability of the model by providing insight to the model attention. Finally, the results of this method were analyzed to explain how the deep NLP model achieved a high classification performance. / Dissertation / Master of Applied Science (MASc) / In this document, an Artificial Intelligence (AI) approach for predicting 6-month Emergency Department (ED) visits is proposed. In this approach, the questionnaires gathered from children and youth admitted to an outpatient or inpatient clinic are converted to a text representation called Textionnaire. Next, AI is utilized to analyze the Textionnaire and predict the possibility of a future ED visit. This method was successful in about 75% of the time. In addition to the AI solution, an explainability component is introduced to explain how the natural language processing algorithm identifies the high risk patients. Deep Learning Transfer Learning Natural Language Processing Readmission Prediction Emergency Department Visit Questionnaire Patient Reported Outcome Measure PROM NLP Explainability Artificial Intelligence Machine Learning
336	Training Neural Models for Abstractive Text Summarization Kryściński, Wojciech January 2018 (has links) Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation. / Abstraktiv textsammanfattning syftar på att korta ner långa textdokument till en förkortad, mänskligt läsbar form, samtidigt som den viktigaste informationen i källdokumentet bevaras. Ett vanligt tillvägagångssätt för att träna sammanfattningsmodeller är att använda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det här arbetet undersöks hur användningen av alternativa, uppgiftsspecifika träningssignaler påverkar sammanfattningsmodellens prestanda. Två nya träningssignaler föreslås och utvärderas som en del av detta arbete. Den första, vilket är en ny metrik, mäter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra använder en diskrimineringsmodell för att skilja mänskliga skriftliga sammanfattningar från genererade på ordnivå. Empiriska resultat visar att användandet av de nämnda mätvärdena som belöningar för policygradient-träning ger betydande prestationsvinster mätt med ROUGE-score, novelty score och mänsklig utvärdering. machine learning deep learning text summarization natural language processing neural networks recurrent neural networks reinforcement learning generative adversarial networks gans abstractive text summarization nlp Computer Sciences Datavetenskap (datalogi)
337	Large language models as an interface to interact with API tools in natural language Tesfagiorgis, Yohannes Gebreyohannes, Monteiro Silva, Bruno Miguel January 2023 (has links) In this research project, we aim to explore the use of Large Language Models (LLMs) as an interface to interact with API tools in natural language. Bubeck et al. [1] shed some light on how LLMs could be used to interact with API tools. Since then, new versions of LLMs have been launched and the question of how reliable a LLM can be in this task remains unanswered. The main goal of our thesis is to investigate the designs of the available system prompts for LLMs, identify the best-performing prompts, and evaluate the reliability of different LLMs when using the best-identified prompts. We will employ a multiple-stage controlled experiment: A literature review where we reveal the available system prompts used in the scientific community and open-source projects; then, using F1-score as a metric we will analyse the precision and recall of the system prompts aiming to select the best-performing system prompts in interacting with API tools; and in a latter stage, we compare a selection of LLMs with the best-performing prompts identified earlier. From these experiences, we realize that AI-generated system prompts perform better than the current prompts used in open-source and literature with GPT-4, zero-shot prompts have better performance in this specific task with GPT-4 and that a good system prompt in one model does not generalize well into other models. Large language model (LLM) Natural Language Processing (NLP) GPT-4 Llama-2 Palm Application Programming Interface (API). Engineering and Technology Teknik och teknologier Computer Sciences Datavetenskap (datalogi)
338	Natural Language Processing for Improving Search Query Results : Applied on The Swedish Armed Force's Profession Guide Harju Schnee, Andreas January 2023 (has links) Text has been the historical way of preserving and acquiring knowledge, and text data today is an increasingly growing part of the digital footprint together with the need to query this data for information. Seeking information is a constant ongoing process, and is a crucial part of many systems all around us. The ability to perform fast and effective searches is a must when dealing with vast amounts of data. This thesis implements an information retrieval system based on the Swedish Defence Force's profession guide, with the aim to produce a system that retrieves relevant professions based on user defined queries of varying size. A number of Natural Language Processing techniques are investigated and implemented, in order to transform the gathered profession descriptions a document embedding model, doc2vec, was implemented resulting in document vectors that are compared to find similarities between documents. The final system was evaluated by domain experts, represented by active military personal that quantified the relevancy of the profession retrievals into a measurable performance. The system managed to retrieve relevant information for 46.6% and 56.6% of the long- and short text inputs respectively. Resulting in a much more generalized and capable system compared to the search function available at the profession guide today. natural language processing NLP maskininlärning ML artificiell intelligens AI language model information retrieval system document embedding text representation text data augmentation Information Systems
339	Understanding School Shootings Using Qualitatively-Informed Natural Language Processing Do, Quan K 01 January 2023 (has links) (PDF) Prior literature has investigated the connection between school shootings and factors of familial trauma and mental health. Specifically, experiences related to parental suicide, physical or sexual abuse, neglect, marital violence, or severe bullying have been associated with a propensity for carrying out a mass shooting. Given prior research has shown common histories among school shooters, it follows that a person's violent tendencies can be revealed by their previous communications with others, thus aiding in predicting an individual's proclivity for school shootings. However, previous literature found no conclusions were drawn from online posts made by the shooters prior to the mass shootings. This thesis applies NVivo-supported thematic analysis and Natural Language Processing (NLP) to study school shootings by comparing the online speech patterns of known school terrorists versus those of non-violent extremists and ordinary teenagers online. Findings indicate that out of all the possible NLP indicators, conversation, HarmVice, negative tone, and conflict are the most suitable school shootings indicators. Ordinary people score eight times higher than known school shooters and online extremists in conversation. Known shooters score more than 14 times higher in HarmVice, than in both ordinary people and online extremists. Known shooters also score higher in negative tone (1.37 times higher than ordinary people and 1.78 times higher than online extremists) and conflict (more than three times higher than ordinary people and 1.8 times higher than online extremists). The implications for domestic violence prediction and prevention can be used to protect citizens inside educational infrastructure by linking the flagged accounts to the schools or colleges that they attend. Further research is needed to determine the severity of emotional coping displayed in online posts, as well as the amount of information and frequency with which weapons and killing are discussed. NLP qualitative analysis school shootings gun violence extremists school violence mass shooting natural language processing NVivo LWIC Moral Foundation Community-Based Research Mental and Social Health
340	Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network model Domadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links) Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria. Bag of Words(BoW) Deep Learning IMDb Movie Reviews Machine Learning Natural Language Processing(NLP) Sentiment Analysis Computer Sciences Datavetenskap (datalogi)

Search results