Global ETD Search

561	Vytvoření nových klasifikačních modulů v systému pro dolování z dat na platformě NetBeans / Creation of New Clasification Units in Data Mining System on NetBeans Platform Kmoščák, Ondřej January 2009 (has links) This diploma thesis deals with the data mining and the creation of data mining unit for data mining system, which is beeing developed at FIT. This is a client application consisting of a kernel and its graphical user interface and independent mining modules. The application uses support of Oracle Data Mining. The data mining system is implemented in Java language and its graphical user interface is built on NetBeans platform. The content of this work will be the introduction into the issue of knowledge discovery and then the presentation of the chosen Bayesian classification method, for which there will subsequently be implemented the stand-alone data mining module. Furthermore, the implementation of this module will be described.
562	Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients Khizra, Shufa January 2018 (has links) No description available. Computer Science Sickle Cell Disease SCD cTAKES Natural Language Processing NLP Logistic Regression Random Forest Support Vector Machines Multinomial Naive Bayes
563	Machine Learning for Activity Recognition of Dumpers Axelsson, Henrik, Wass, Daniel January 2019 (has links) The construction industry has lagged behind other industries in productivity growth rate. Earth-moving sites, and other practices where dumpers are used, are no exceptions. Such projects lack convenient and accurate solutions for utilization mapping and tracking of mass flows, which both currently and mainly rely on manual activity tracking. This study intends to provide insights of how autonomous systems for activity tracking of dumpers can contribute to the productivity at earthmoving sites. Autonomous systems available on the market are not implementable to dumper fleets of various manufacturers and model year, whereas this study examines the possibilities of using activity recognition by machine learning for a system based on smartphones mounted in the driver’s cabin. Three machine learning algorithms (naive Bayes, random forest and feed-forward backpropagation neural network) are trained and tested on data collected by smartphone sensors. Conclusions are that machine learning models, and particularly the neural network and random forest algorithms, trained on data from a standard smartphone, are able to estimate a dumper’s activities at a high degree of certainty. Finally, a market analysis is presented, identifying the innovation opportunity for a potential end-product as high. / Byggnadsbranschen har halkat efter andra branscher i produktivitetsökning. Markarbetesprojekt och andra arbeten där dumprar används är inga undantag. Sådana projekt saknar användarvänliga system för att kartlägga maskinutnyttjande och massaflöde. Nuvarande lösningar bygger framförallt på manuellt arbete. Denna studie syftar skapa kännedom kring hur autonoma system för aktivitetsspårning av dumprar kan öka produktiviteten på markarbetesprojekt. Befintliga autonoma lösningar är inte implementerbara på maskinparker med olika fabrikat eller äldre årsmodeller. Denna studie undersöker möjligheten att applicera aktivitetsigenkänning genom maskininlärning baserad på smartphones placerade i förarhytten för en sådan autonom lösning. Tre maskininlärningsalgoritmer (naive Bayes, random forest och backpropagation neuralt nätverk) tränas och testas på data från sensorer tillgängliga i vanliga smartphones. Studiens slutsatser är att maskininlärningsmodeller, i synnerhet neuralt nätverk och random forest-algoritmerna, tränade på data från vanliga smartphones, till hög grad kan känna igen en dumpers aktiviteter. Avslutningsvis presenteras en marknadsanalys som bedömer innovationsmöjligheten för en eventuell slutprodukt som hög. Civil engineering earth-moving dumper machine learning naive bayes random forest neural networks smartphone sensors accelerometer gyroscope. Computer and Information Sciences Data- och informationsvetenskap
564	An Application of Multi-Level Bayesian Negative Binomial Models with Mixed Effects on Motorcycle Crashes in Ohio Flask, Thomas V. 08 May 2012 (has links) No description available. Civil Engineering Engineering Geographic Information Science Geography Statistics Transportation Transportation Planning Bayesian negative binomial model transportation safety modeling spatial random effects mixed effects model hierarchical Bayes
565	Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links) Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover. Machine Learning Employee Turnover Prediction Supervised Learn- ing Models Logistic Regression Naive Bayes Classifier Random Forest Classifier XGBoost Computer Sciences Datavetenskap (datalogi)
566	Classifying Urgency : A Study in Machine Learning for Classifying the Level of Medical Emergency of an Animal’s Situation Strallhofer, Daniel, Ahlqvist, Jonatan January 2018 (has links) This paper explores the use of Naive Bayes as well a Linear Support Vector Machines in order to classify a text based on the level of medical emergency. The primary source of testing will be an online veterinarian service’s customer data. The aspects explored are whether a single text gives enough information for a medical decision to be made and if there are alternative data gathering processes that would be preferred. Past research has proven that text classiﬁers based on Naive Bayes and SVMs can often give good results. We show how to optimize the results so that important decisions can be made with these classiﬁcations as a basis. Optimal data gathering procedures will be a part of this optimization process. The business applications of such a venture will also be discussed since implementing such a system in an online medical service will possibly affect customer ﬂow, goodwill, cost/revenue, and online competitiveness. / Denna studie utforskar användandet av Naive Bayes samt Linear Support Vector Machines för att klassificera en text på en medicinsk skala. Den huvudsakliga datamängden som kommer att användas för att göra detta är kundinformation från en online veterinär. Aspekter som utforskas är om en enda text kan innehålla tillräckligt med information för att göra ett medicinskt beslut och om det finns alternativa metoder för att samla in mer anpassade datamängder i framtiden. Tidigare studier har bevisat att både Naive Bayes och SVMs ofta kan nå väldigt bra resultat. Vi visar hur man kan optimera resultat för att främja framtida studier. Optimala metoder för att samla in datamängder diskuteras som en del av optimeringsprocessen. Slutligen utforskas även de affärsmässiga aspekterna utigenom implementationen av ett datalogiskt system och hur detta kommer påverka kundflödet, goodwill, intäkter/kostnader och konkurrenskraft. Medical Urgency Veterinarian Text Classiﬁcation Machine Learning Multinomial Naive Bayes Linear Support Vector Classiﬁcation Edge cases Data gathering process
567	Optimising Machine Learning Models for Imbalanced Swedish Text Financial Datasets: A Study on Receipt Classification : Exploring Balancing Methods, Naive Bayes Algorithms, and Performance Tradeoffs Hu, Li Ang, Ma, Long January 2023 (has links) This thesis investigates imbalanced Swedish text financial datasets, specifically receipt classification using machine learning models. The study explores the effectiveness of under-sampling and over-sampling methods for Naive Bayes algorithms, collaborating with Fortnox for a controlled experiment. Evaluation metrics compare balancing methods regarding the accuracy, Matthews's correlation coefficient (MCC) , F1 score, precision, and recall. Findings contribute to Swedish text classification, providing insights into balancing methods. The thesis report examines balancing methods and parameter tuning on machine learning models for imbalanced datasets. Multinomial Naive Bayes (MultiNB) algorithms in Natural language processing (NLP) are studied, with potential application in image classification for assessing industrial thin component deformation. Experiments show balancing methods significantly affect MCC and recall, with a recall-MCC-accuracy tradeoff. Smaller alpha values generally improve accuracy. Synthetic Minority Oversampling Technique (SMOTE) and Tomek's algorithm for removing links developed in 1976 by Ivan Tomek. First Tomek, then SMOTE (TomekSMOTE) yield promising accuracy improvements. Due to time constraints, Over-sampling using SMOTE and cleaning using Tomek links. First SMOTE, then Tomek (SMOTETomek) training is incomplete. This thesis report finds the best MCC is achieved when $\alpha$ is 0.01 on imbalanced datasets. Imbalanced datasets Swedish text financial datasets Accuracy Matthews correlation coefficient Recall Multinomial Naive Bayes SMOTE TomekLinks Performance optimization Computer Sciences Datavetenskap (datalogi)
568	AUTOMATED IMAGE LOCALIZATION AND DAMAGE LEVEL EVALUATION FOR RAPID POST-EVENT BUILDING ASSESSMENT Xiaoyu Liu (13989906) 25 October 2022 (has links) <p> </p> <p>Image data remains an important tool for post-event building assessment and documentation. After each natural hazard event, significant efforts are made by teams of engineers to visit the affected regions and collect useful image data. In general, a global positioning system (GPS) can provide useful spatial information for localizing image data. However, it is challenging to collect such information when images are captured in places where GPS signals are weak or interrupted, such as the indoor spaces of buildings. An inability to document the images’ locations would hinder the analysis, organization, and documentation of these images as they lack sufficient spatial context. This problem becomes more urgent to solve for the inspection mission covering a large area, like a community. To address this issue, the objective of this research is to generate a tool to automatically process the image data collected during such a mission and provide the location of each image. Towards this goal, the following tasks are performed. First, I develop a methodology to localize images and link them to locations on a structural drawing (Task 1). Second, this methodology is extended to be able to process data collected from a large scale area, and perform indoor localization for images collected on each of the indoor floors of each individual building (Task 2). Third, I develop an automated technique to render the damage condition decision of buildings by fusing the image data collected within (Task 3). The methods developed through each task have been evaluated with data collected from real world buildings. This research may also lead to automated assessment of buildings over a large scale area. </p> post-event building assessment indoor localization visual odometry 3D reconstruction information fusion naive Bayes fusion optimization
569	Qu'est-il arrivé au taux d'intérêt neutre canadien après la crise financière de 2008? Rocheleau, William 09 November 2022 (has links) Selon Dorich et al. (2017), la Banque du Canada estime que le taux d'intérêt neutre nominal, qui se situait autour de 5% dans les années 1990, a chuté pour atteindre un peu moins de 4% vers le milieu des années 2000. Ce travail a comme objectif de confirmer ou d'infirmer la conjecture selon laquelle la baisse du taux d'intérêt neutre au Canada se serait poursuivie après la crise financière de 2008. Cette conjecture a été avancée par certains économistes en raison de la faible inflation observée après la crise financière de 2008 malgré un taux directeur extrêmement bas (proche de 0%). Plusieurs économistes de la Banque du Canada se sont déjà attardés à cette problématique, notamment Mendes (2014) et Dorich et al. (2017). Afin de confirmer ou d'infirmer cette conjecture, le taux d'intérêt neutre au Canada après la crise financière de 2008 est estimé dans ce travail en utilisant une analyse économétrique rigoureuse. Plus précisément, un modèle de bris structurel couplé à une approche bayésienne avec Monte-Carlo par chaînes de Markov est employé. Ce travail s'inspire de certaines spécifications postulées dans Laubach et Williams (2003) ainsi que de la méthodologie énoncée dans Gordon et Bélanger (1996). Les données trimestrielles utilisées dans ce travail couvrent la période allant du premier trimestre de 1993 jusqu'au quatrième trimestre de 2019. À la lumière des résultats obtenus, il est possible de confirmer la conjecture selon laquelle il y aurait eu une baisse du taux d'intérêt neutre au Canada après la crise économique de 2008. En effet, la simulation effectuée semble placer le taux d'intérêt neutre nominal dans une fourchette de 4.00% à 4.25% pour la période allant du premier trimestre de 1993 jusqu'au deuxième et troisième trimestre de 2008, alors qu'il semble plutôt se situer dans une fourchette de 0.65% à 1.00% pour la période allant du deuxième et troisième trimestre de 2008 jusqu'au quatrième trimestre de 2019. / According to Dorich et al. (2017), Bank of Canada's estimates of the nominal neutral rate of interest, which were around 5% in the 1990's, fell to just under 4% in the early beginning of the 2000's. This paper aims to confirm or invalidate the conjecture that the decline in the neutral rate of interest in Canada has continued after the financial crisis of 2008. This conjecture was put forward by some economists because of the low inflation observed after the financial crisis of 2008 despite an extremely low policy rate (close to 0%). Several Bank of Canada economists have already worked on this issue, including Mendes (2014) and Dorich et al. (2017). In order to confirm or refute this conjecture, the neutral rate of interest in Canada after the financial crisis of 2008 is estimated in this work using a rigorous econometric analysis. More precisely, a structural break model coupled with a Bayesian approach with Markov Chain Monte Carlo is used. This paper uses certain specifications postulated in Laubach et Williams (2003) as well as the methodology stated in Gordon et Bélanger (1996). The quarterly data used in this work covers the period from the first quarter of 1993 to the fourth quarter of 2019. In light of the results obtained, it is possible to confirm the conjecture stating that there has been a drop in the neutral rate of interest in Canada after the economic crisis of 2008. Indeed, the simulation carried out seems to place the nominal neutral rate of interest in a range of 4.00% to 4.25% for the period going from the first quarter of 1993 to the second and third quarters of 2008, while it seems to be in a range of 0.65% to 1.00% for the period going from the second and third quarters of 2008 to the fourth quarter of 2019. Crise financière mondiale, 2008-2009. Théorème de Bayes. Méthode de Monte-Carlo. Processus de Markov.
570	COVID-19: Анализ эмоциональной окраски сообщений в социальных сетях (на материале сети «Twitter») : магистерская диссертация / COVID-19: Social network sentiment analysis (based on the material of "Twitter" messages) Денисова, П. А., Denisova, P. A. January 2021 (has links) Работа посвящена изучению анализа тональности текстов в социальных сетях на примере сообщений-твитов из социальной сети Twitter. Материал исследования составили 818 224 сообщения по 17-ти ключевым словам, из которых 89 025 твитов содержали слова «COVID-19» и «Сoronavirus». В первой части работы рассматриваются общие теоретические и методологические вопросы: вводится понятие Sentiment Analysis, анализируются различные подходы к классификации тональности текстов. Особое внимание в задачах классификации текстов уделяется Байесовскому классификатору, который показывает высокую точность работы. Изучаются особенности анализа тональности текстов в социальных сетях во время эпидемий и вспышек болезней. Описывается процедура и алгоритм анализа тональности текста. Большое внимание уделяется анализу тональности текстов в Python с помощью библиотеки TextBlob, а также выбирается ещё один из инструментов «SaaS» - программное обеспечение как услуга, который позволяет реализовать анализ тональности текстов в режиме реального времени, где нет необходимости в большом опыте машинного обучения и обработке естественного языка, в сравнении с языком программирования Python. Вторая часть исследования начинается с построения выборок, т.е. определения ключевых слов, по которым в работе осуществляется поиск и экспорт необходимых твитов. Для этой цели используется корпус - Coronavirus Corpus, предназначенный для отражения социальных, культурных и экономических последствий коронавируса (COVID-19) в 2020 году и в последующий период. Анализируется динамика использования слов по изучаемой тематике в течение 2020 года и проводится аналогия между частотой их использования и происходящими событиями. Далее по выбранным ключевым словам осуществляется поиск твитов и, основываясь на полученных данных, реализуется анализ тональности cообщений с помощью библиотеки Python - TextBlob, созданной для обработки текстовых данных, и онлайн - сервиса Brand24. Сравнивая данные инструменты, отмечается схожесть полученных результатов. Исследование помогает быстро и в реальном времени понять общественные настроения по поводу вспышки COVID-19, способствуя тем самым пониманию развивающихся событий. Также данная работа может быть использована в качестве модели для определения эмоционального состояния интернет-пользователей в различных ситуациях. / The work is devoted to the sentiment analysis study of messages in Twitter social network. The research material consisted of 818,224 messages and 17 keywords, whereas 89,025 tweets contained the words "COVID-19" and "Coronavirus". In the first part, theoretical and methodological issues are considered: the concept of sentiment analysis is introduced, various approaches to text classification are analyzed. Particular attention in the problems of text classification is given to Naive Bayes classifier, which shows high accuracy of work. The features of sentiment analysis in social networks during epidemics and disease outbreaks are studied. The procedure and algorithm for analyzing the sentiment of the text are described. Much attention is paid to the analysis of sentiment of texts in Python using TextBlob library, and also one of the SaaS tools is chosen - software as a service, which allows real-time sentiment analysis of texts, where there is no need for extensive experience in machine learning and natural language processing against Python programming language. The second part of the study begins with sampling, i.e. definition of keywords by which the search and export of the necessary tweets is carried out. For this purpose, the Coronavirus Corpus is used, designed to reflect the social, cultural and economic consequences of the coronavirus (COVID-19) in 2020 and beyond. The dynamics of the topic words usage during 2020 is analyzed and an analogy is drawn between the frequency of their usage and the events in place. Next, the selected keywords are used to search for tweets and, based on the data obtained, the sentiment analysis of messages is carried out using the Python library - TextBlob, created for processing textual data, and the Brand24 online service. Comparing these tools, the results are similar. The study helps to understand quickly and in real-time public sentiments about the COVID-19 outbreak, thereby contributing to the understanding of developing events. Also, this work can be used as a model for determining the emotional state of Internet users in various situations. COVID-19 ПАНДЕМИЯ КОРОНАВИРУС TWITTER TEXTBLOB BRAND24 MASTER'S THESIS COVID-19 PANDEMIC CORONAVIRUS TWITTER SENTIMENT ANALYSIS TEXTBLOB NAIVE BAYES CLASSIFIER BRAND24

Search results