Global ETD Search

81	Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients Khizra, Shufa January 2018 (has links) No description available. Computer Science Sickle Cell Disease SCD cTAKES Natural Language Processing NLP Logistic Regression Random Forest Support Vector Machines Multinomial Naive Bayes
82	Machine Learning for Activity Recognition of Dumpers Axelsson, Henrik, Wass, Daniel January 2019 (has links) The construction industry has lagged behind other industries in productivity growth rate. Earth-moving sites, and other practices where dumpers are used, are no exceptions. Such projects lack convenient and accurate solutions for utilization mapping and tracking of mass flows, which both currently and mainly rely on manual activity tracking. This study intends to provide insights of how autonomous systems for activity tracking of dumpers can contribute to the productivity at earthmoving sites. Autonomous systems available on the market are not implementable to dumper fleets of various manufacturers and model year, whereas this study examines the possibilities of using activity recognition by machine learning for a system based on smartphones mounted in the driver’s cabin. Three machine learning algorithms (naive Bayes, random forest and feed-forward backpropagation neural network) are trained and tested on data collected by smartphone sensors. Conclusions are that machine learning models, and particularly the neural network and random forest algorithms, trained on data from a standard smartphone, are able to estimate a dumper’s activities at a high degree of certainty. Finally, a market analysis is presented, identifying the innovation opportunity for a potential end-product as high. / Byggnadsbranschen har halkat efter andra branscher i produktivitetsökning. Markarbetesprojekt och andra arbeten där dumprar används är inga undantag. Sådana projekt saknar användarvänliga system för att kartlägga maskinutnyttjande och massaflöde. Nuvarande lösningar bygger framförallt på manuellt arbete. Denna studie syftar skapa kännedom kring hur autonoma system för aktivitetsspårning av dumprar kan öka produktiviteten på markarbetesprojekt. Befintliga autonoma lösningar är inte implementerbara på maskinparker med olika fabrikat eller äldre årsmodeller. Denna studie undersöker möjligheten att applicera aktivitetsigenkänning genom maskininlärning baserad på smartphones placerade i förarhytten för en sådan autonom lösning. Tre maskininlärningsalgoritmer (naive Bayes, random forest och backpropagation neuralt nätverk) tränas och testas på data från sensorer tillgängliga i vanliga smartphones. Studiens slutsatser är att maskininlärningsmodeller, i synnerhet neuralt nätverk och random forest-algoritmerna, tränade på data från vanliga smartphones, till hög grad kan känna igen en dumpers aktiviteter. Avslutningsvis presenteras en marknadsanalys som bedömer innovationsmöjligheten för en eventuell slutprodukt som hög. Civil engineering earth-moving dumper machine learning naive bayes random forest neural networks smartphone sensors accelerometer gyroscope. Computer and Information Sciences Data- och informationsvetenskap
83	Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links) Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover. Machine Learning Employee Turnover Prediction Supervised Learn- ing Models Logistic Regression Naive Bayes Classifier Random Forest Classifier XGBoost Computer Sciences Datavetenskap (datalogi)
84	Classifying Urgency : A Study in Machine Learning for Classifying the Level of Medical Emergency of an Animal’s Situation Strallhofer, Daniel, Ahlqvist, Jonatan January 2018 (has links) This paper explores the use of Naive Bayes as well a Linear Support Vector Machines in order to classify a text based on the level of medical emergency. The primary source of testing will be an online veterinarian service’s customer data. The aspects explored are whether a single text gives enough information for a medical decision to be made and if there are alternative data gathering processes that would be preferred. Past research has proven that text classiﬁers based on Naive Bayes and SVMs can often give good results. We show how to optimize the results so that important decisions can be made with these classiﬁcations as a basis. Optimal data gathering procedures will be a part of this optimization process. The business applications of such a venture will also be discussed since implementing such a system in an online medical service will possibly affect customer ﬂow, goodwill, cost/revenue, and online competitiveness. / Denna studie utforskar användandet av Naive Bayes samt Linear Support Vector Machines för att klassificera en text på en medicinsk skala. Den huvudsakliga datamängden som kommer att användas för att göra detta är kundinformation från en online veterinär. Aspekter som utforskas är om en enda text kan innehålla tillräckligt med information för att göra ett medicinskt beslut och om det finns alternativa metoder för att samla in mer anpassade datamängder i framtiden. Tidigare studier har bevisat att både Naive Bayes och SVMs ofta kan nå väldigt bra resultat. Vi visar hur man kan optimera resultat för att främja framtida studier. Optimala metoder för att samla in datamängder diskuteras som en del av optimeringsprocessen. Slutligen utforskas även de affärsmässiga aspekterna utigenom implementationen av ett datalogiskt system och hur detta kommer påverka kundflödet, goodwill, intäkter/kostnader och konkurrenskraft. Medical Urgency Veterinarian Text Classiﬁcation Machine Learning Multinomial Naive Bayes Linear Support Vector Classiﬁcation Edge cases Data gathering process
85	Optimising Machine Learning Models for Imbalanced Swedish Text Financial Datasets: A Study on Receipt Classification : Exploring Balancing Methods, Naive Bayes Algorithms, and Performance Tradeoffs Hu, Li Ang, Ma, Long January 2023 (has links) This thesis investigates imbalanced Swedish text financial datasets, specifically receipt classification using machine learning models. The study explores the effectiveness of under-sampling and over-sampling methods for Naive Bayes algorithms, collaborating with Fortnox for a controlled experiment. Evaluation metrics compare balancing methods regarding the accuracy, Matthews's correlation coefficient (MCC) , F1 score, precision, and recall. Findings contribute to Swedish text classification, providing insights into balancing methods. The thesis report examines balancing methods and parameter tuning on machine learning models for imbalanced datasets. Multinomial Naive Bayes (MultiNB) algorithms in Natural language processing (NLP) are studied, with potential application in image classification for assessing industrial thin component deformation. Experiments show balancing methods significantly affect MCC and recall, with a recall-MCC-accuracy tradeoff. Smaller alpha values generally improve accuracy. Synthetic Minority Oversampling Technique (SMOTE) and Tomek's algorithm for removing links developed in 1976 by Ivan Tomek. First Tomek, then SMOTE (TomekSMOTE) yield promising accuracy improvements. Due to time constraints, Over-sampling using SMOTE and cleaning using Tomek links. First SMOTE, then Tomek (SMOTETomek) training is incomplete. This thesis report finds the best MCC is achieved when $\alpha$ is 0.01 on imbalanced datasets. Imbalanced datasets Swedish text financial datasets Accuracy Matthews correlation coefficient Recall Multinomial Naive Bayes SMOTE TomekLinks Performance optimization Computer Sciences Datavetenskap (datalogi)
86	AUTOMATED IMAGE LOCALIZATION AND DAMAGE LEVEL EVALUATION FOR RAPID POST-EVENT BUILDING ASSESSMENT Xiaoyu Liu (13989906) 25 October 2022 (has links) <p> </p> <p>Image data remains an important tool for post-event building assessment and documentation. After each natural hazard event, significant efforts are made by teams of engineers to visit the affected regions and collect useful image data. In general, a global positioning system (GPS) can provide useful spatial information for localizing image data. However, it is challenging to collect such information when images are captured in places where GPS signals are weak or interrupted, such as the indoor spaces of buildings. An inability to document the images’ locations would hinder the analysis, organization, and documentation of these images as they lack sufficient spatial context. This problem becomes more urgent to solve for the inspection mission covering a large area, like a community. To address this issue, the objective of this research is to generate a tool to automatically process the image data collected during such a mission and provide the location of each image. Towards this goal, the following tasks are performed. First, I develop a methodology to localize images and link them to locations on a structural drawing (Task 1). Second, this methodology is extended to be able to process data collected from a large scale area, and perform indoor localization for images collected on each of the indoor floors of each individual building (Task 2). Third, I develop an automated technique to render the damage condition decision of buildings by fusing the image data collected within (Task 3). The methods developed through each task have been evaluated with data collected from real world buildings. This research may also lead to automated assessment of buildings over a large scale area. </p> post-event building assessment indoor localization visual odometry 3D reconstruction information fusion naive Bayes fusion optimization
87	COVID-19: Анализ эмоциональной окраски сообщений в социальных сетях (на материале сети «Twitter») : магистерская диссертация / COVID-19: Social network sentiment analysis (based on the material of "Twitter" messages) Денисова, П. А., Denisova, P. A. January 2021 (has links) Работа посвящена изучению анализа тональности текстов в социальных сетях на примере сообщений-твитов из социальной сети Twitter. Материал исследования составили 818 224 сообщения по 17-ти ключевым словам, из которых 89 025 твитов содержали слова «COVID-19» и «Сoronavirus». В первой части работы рассматриваются общие теоретические и методологические вопросы: вводится понятие Sentiment Analysis, анализируются различные подходы к классификации тональности текстов. Особое внимание в задачах классификации текстов уделяется Байесовскому классификатору, который показывает высокую точность работы. Изучаются особенности анализа тональности текстов в социальных сетях во время эпидемий и вспышек болезней. Описывается процедура и алгоритм анализа тональности текста. Большое внимание уделяется анализу тональности текстов в Python с помощью библиотеки TextBlob, а также выбирается ещё один из инструментов «SaaS» - программное обеспечение как услуга, который позволяет реализовать анализ тональности текстов в режиме реального времени, где нет необходимости в большом опыте машинного обучения и обработке естественного языка, в сравнении с языком программирования Python. Вторая часть исследования начинается с построения выборок, т.е. определения ключевых слов, по которым в работе осуществляется поиск и экспорт необходимых твитов. Для этой цели используется корпус - Coronavirus Corpus, предназначенный для отражения социальных, культурных и экономических последствий коронавируса (COVID-19) в 2020 году и в последующий период. Анализируется динамика использования слов по изучаемой тематике в течение 2020 года и проводится аналогия между частотой их использования и происходящими событиями. Далее по выбранным ключевым словам осуществляется поиск твитов и, основываясь на полученных данных, реализуется анализ тональности cообщений с помощью библиотеки Python - TextBlob, созданной для обработки текстовых данных, и онлайн - сервиса Brand24. Сравнивая данные инструменты, отмечается схожесть полученных результатов. Исследование помогает быстро и в реальном времени понять общественные настроения по поводу вспышки COVID-19, способствуя тем самым пониманию развивающихся событий. Также данная работа может быть использована в качестве модели для определения эмоционального состояния интернет-пользователей в различных ситуациях. / The work is devoted to the sentiment analysis study of messages in Twitter social network. The research material consisted of 818,224 messages and 17 keywords, whereas 89,025 tweets contained the words "COVID-19" and "Coronavirus". In the first part, theoretical and methodological issues are considered: the concept of sentiment analysis is introduced, various approaches to text classification are analyzed. Particular attention in the problems of text classification is given to Naive Bayes classifier, which shows high accuracy of work. The features of sentiment analysis in social networks during epidemics and disease outbreaks are studied. The procedure and algorithm for analyzing the sentiment of the text are described. Much attention is paid to the analysis of sentiment of texts in Python using TextBlob library, and also one of the SaaS tools is chosen - software as a service, which allows real-time sentiment analysis of texts, where there is no need for extensive experience in machine learning and natural language processing against Python programming language. The second part of the study begins with sampling, i.e. definition of keywords by which the search and export of the necessary tweets is carried out. For this purpose, the Coronavirus Corpus is used, designed to reflect the social, cultural and economic consequences of the coronavirus (COVID-19) in 2020 and beyond. The dynamics of the topic words usage during 2020 is analyzed and an analogy is drawn between the frequency of their usage and the events in place. Next, the selected keywords are used to search for tweets and, based on the data obtained, the sentiment analysis of messages is carried out using the Python library - TextBlob, created for processing textual data, and the Brand24 online service. Comparing these tools, the results are similar. The study helps to understand quickly and in real-time public sentiments about the COVID-19 outbreak, thereby contributing to the understanding of developing events. Also, this work can be used as a model for determining the emotional state of Internet users in various situations. COVID-19 ПАНДЕМИЯ КОРОНАВИРУС TWITTER TEXTBLOB BRAND24 MASTER'S THESIS COVID-19 PANDEMIC CORONAVIRUS TWITTER SENTIMENT ANALYSIS TEXTBLOB NAIVE BAYES CLASSIFIER BRAND24
88	Simulating ADS-B vulnerabilities by imitating aircrafts : Using an air traffic management simulator / Simulering av ADS-B sårbarheter genom imitering av flygplan : Med hjälp av en flyglednings simulator Boström, Axel, Börjesson, Oliver January 2022 (has links) Air traffic communication is one of the most vital systems for air traffic management controllers. It is used every day to allow millions of people to travel safely and efficiently across the globe. But many of the systems considered industry-standard are used without any sort of encryption and authentication meaning that they are vulnerable to different wireless attacks. In this thesis vulnerabilities within an air traffic management system called ADS-B will be investigated. The structure and theory behind this system will be described as well as the reasons why ADS-B is unencrypted. Two attacks will then be implemented and performed in an open-source air traffic management simulator called openScope. ADS-B data from these attacks will be gathered and combined with actual ADS-B data from genuine aircrafts. The collected data will be cleaned and used for machine learning purposes where three different algorithms will be applied to detect attacks. Based on our findings, where two out of the three machine learning algorithms used were able to detect 99.99% of the attacks, we propose that machine learning algorithms should be used to improve ADS-B security. We also think that educating air traffic controllers on how to detect and handle attacks is an important part of the future of air traffic management. ADS-B ATC Impersonation attack Sybil attack Air Traffic Communication openScope Security Machine learning K nearest neighbour Naive Bayes Decision tree Other Computer and Information Science Annan data- och informationsvetenskap
89	The Influence of Artificial Intelligence on Education: Sentiment Analysis on YouTube Comments : What is people´s sentiment on ChatGPT for educational purposes? Rodríguez Roldán, Javier January 2024 (has links) The use of artificial intelligence (AI), especially ChatGPT, has increased exponentially in the past years, and it can be seen how AI-based tools are being used in several fields, including education. The literature on AI on education (AIEd), how it has been used, its potential uses, opportunities and challenges were reviewed as well as the literature on sentiment analysis on social media to evaluate the best approach. Since education might face notorious changes due to this technology, assessing how people feel about this potential change in the paradigm is essential. Sentiment analysis on YouTube comments of videos related to ChatGPT, the most popular AI tool for education across learners and educators, was performed. It was found that 62.1% of thes ample had a positive feeling regarding this technology for educational purposes, whereas 19.4% had a negative sentiment and 18.5% were neutral. To contribute to the literature on sentiment analysis of YouTube comments, the two most used and best-performing algorithms were used for this task: Naive Bayes and Support Vector Machine. The results show that the first algorithm had a 61.30% accuracy, whereas SVM had a 71.79%. Education YouTube Artificial Intelligence Chatbot ChatGPT Sentiment Analysis Lexicon-based VADER Machine Learning Naive Bayes SVM Information Systems
90	Detection of bullying with MachineLearning : Using Supervised Machine Learning and LLMs to classify bullying in text Yousef, Seif-Alamir, Svensson, Ludvig January 2024 (has links) In recent years, there has been an increase in the issue of bullying, particularly in academic settings. This degree project examines the use of supervised machine learning techniques to identify bullying in text data from school surveys provided by the Friends Foundation. It evaluates various traditional algorithms such as Logistic Regression, Naive Bayes, SVM, Convolutional neural networks (CNN), alongside a Retrieval-Augmented Generation (RAG) model using Llama 3, with a primary goal of achieving high recall on the texts consisting of bullying while also considering precision, which is reflected in the use of the F3-score. The SVM model emerged as the most effective among the traditional methods, achieving the highest F3-score of 0.83. Although the RAG model showed promising recall, it suffered from very low precision, resulting in a slightly lower F3-score of 0.79. The study also addresses challenges such as the small and imbalanced dataset as well as emphasizes the importance of retaining stop words to maintain context in the text data. The findings highlight the potential of advanced machine learning models to significantly assist in bullying detection with adequate resources and further refinement. Natural Language Processing Bullying Text Classification SVM Logistic Regression Naive Bayes Convolutional Neural Networks Retrieval-augmented generation GPT-4o Computer Sciences Datavetenskap (datalogi)

Search results