Global ETD Search

1	STUDENT ATTENTIVENESS CLASSIFICATION USING GEOMETRIC MOMENTS AIDED POSTURE ESTIMATION Gowri Kurthkoti Sridhara Rao (14191886) 30 November 2022 (has links) <p> Body Posture provides enough information regarding the current state of mind of a person. This idea is used to implement a system that provides feedback to lecturers on how engaging the class has been by identifying the attentive levels of students. This is carried out using the posture information extracted with the help of Mediapipe. A novel method of extracting features are from the key points returned by Mediapipe is proposed. Geometric moments aided features classification performs better than the general distances and angles features classification. In order to extend the single person pose classification to multi person pose classification object detection is implemented. Feedback is generated regarding the entire lecture and provided as the output of the system. </p> Computer vision : Pose Classification Mediapipe Geometric Moments Object detection Attentiveness classification Random Forest Classifier Retina-net
2	Low-Power Wireless Sensor Node with Edge Computing for Pig Behavior Classifications Xu, Yuezhong 25 April 2024 (has links) A wireless sensor node (WSN) system, capable of sensing animal motion and transmitting motion data wirelessly, is an effective and efficient way to monitor pigs' activity. However, the raw sensor data sampling and transmission consumes lots of power such that WSNs' battery have to be frequently charged or replaced. The proposed work solves this issue through WSN edge computing solution, in which a Random Forest Classifier (RFC) is trained and implemented into WSNs. The implementation of RFC on WSNs does not save power, but the RFC predicts animal behavior such that WSNs can adaptively adjust the data sampling frequency to reduce power consumption. In addition, WSNs can transmit less data by sending RFC predictions instead of raw sensor data to save power. The proposed RFC classifies common animal activities: eating, drinking, laying, standing, and walking with a F-1 score of 93%. The WSN power consumption is reduced by 25% with edge computing intelligence, compare to WSN power that samples and transmits raw sensor data periodically at 10 Hz. / Master of Science / A wireless sensor node (WSN) system that detects animal movement and wirelessly transmits this data is a valuable tool for monitoring pigs' activity. However, the process of sampling and transmitting raw sensor data consumes a significant amount of power, leading to frequent recharging or replacement of WSN batteries. To address this issue, our proposed solution integrates edge computing into WSNs, utilizing a Random Forest Classifier (RFC). The RFC is trained and deployed within the WSNs to predict animal behavior, allowing for adaptive adjustment of data sampling frequency to reduce power consumption. Additionally, by transmitting RFC predictions instead of raw sensor data, WSNs can conserve power by transmitting less data. Our RFC can accurately classify common animal activities, such as eating, drinking, laying, standing, and walking, achieving an F-1 score of 93%. With the integration of edge computing intelligence, WSN power consumption is reduced by 25% compared to traditional WSNs that periodically sample and transmit raw sensor data at 10 Hz. Wireless Sensor Node (WSN) Edge Computing Random Forest Classifier (RFC) Bluetooth Low Energy (BLE) Bio-sensing.
3	Predicting Risk Level in Life Insurance Application : Comparing Accuracy of Logistic Regression, DecisionTree, Random Forest and Linear Support VectorClassifiers Karthik Reddy, Pulagam, Veerababu, Sutapalli January 2023 (has links) Background: Over the last decade, there has been a significant rise in the life insurance industry. Every life insurance application is associated with some level ofrisk, which determines the premium they charge. The process of evaluating this levelof risk for a life insurance application is time-consuming. In the present scenario, it is hard for the insurance industry to process millions of life insurance applications.One potential approach is to involve machine learning to establish a framework forevaluating the level of risk associated with a life insurance application. Objectives: The aim of this thesis is to perform two comparison studies. The firststudy aims to compare the accuracy of the logistic regression classifier, decision tree classifier, random forest classifier and linear support vector classifier for evaluatingthe level of risk associated with a life insurance application. The second study aimsto identify the impact of changes in the dataset over the accuracy of these selected classification models. Methods: The chosen approach was an experimentation methodology to attain theaim of the thesis and address its research questions. The experimentation involvedcomparing four ML algorithms, namely the LRC, DTC, RFC and Linear SVC. These algorithms were trained, validated and tested on two datasets. A new dataset wascreated by replacing the "BMI" variable with the "Life Expectancy" variable. Thefour selected ML algorithms were compared based on their performance metrics,which included accuracy, precision, recall and f1-score. Results: Among the four selected machine learning algorithms, random forest classifier attained higher accuracy with 53.79% and 52.80% on unmodified and modifieddatasets respectively. Hence, it was the most accurate algorithm for predicting risklevel in life insurance application. The second best algorithm was decision tree classifier with 51.12% and 50.79% on unmodified and modified datasets. The selectedmodels attained higher accuracies when they are trained, validated and tested withunmodified dataset. Conclusions: The random forest classifier scored high accuracy among the fourselected algorithms on both unmodified dataset and modified datasets. The selected models attained higher accuracies when they are trained, validated and tested with unmodified compared to modified dataset. Therefore, the unmodified dataset is more suitable for predicting risk level in life insurance application. Decision Tree Classifier Logistic Regression Machine Learning Random Forest Classifier Linear Support Vector Classifier Computer Sciences Datavetenskap (datalogi)
4	High-Dimensional Classification Models with Applications to Email Targeting / Högdimensionella klassificeringsmetoder med tillämpning på målgruppsinriktning för e-mejl Pettersson, Anders January 2015 (has links) Email communication is valuable for any modern company, since it offers an easy mean for spreading important information or advertising new products, features or offers and much more. To be able to identify which customers that would be interested in certain information would make it possible to significantly improve a company's email communication and as such avoiding that customers start ignoring messages and creating unnecessary badwill. This thesis focuses on trying to target customers by applying statistical learning methods to historical data provided by the music streaming company Spotify. An important aspect was the high-dimensionality of the data, creating certain demands on the applied methods. A binary classification model was created, where the target was whether a customer will open the email or not. Two approaches were used for trying to target the costumers, logistic regression, both with and without regularization, and random forest classifier, for their ability to handle the high-dimensionality of the data. Performance accuracy of the suggested models were then evaluated on both a training set and a test set using statistical validation methods, such as cross-validation, ROC curves and lift charts. The models were studied under both large-sample and high-dimensional scenarios. The high-dimensional scenario represents when the number of observations, N, is of the same order as the number of features, p and the large sample scenario represents when N ≫ p. Lasso-based variable selection was performed for both these scenarios, to study the informative value of the features. This study demonstrates that it is possible to greatly improve the opening rate of emails by targeting users, even in the high dimensional scenario. The results show that increasing the amount of training data over a thousand fold will only improve the performance marginally. Rather efficient customer targeting can be achieved by using a few highly informative variables selected by the Lasso regularization. / Företag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt. Statistical learning logistic regression random forest classifier customer relationship management customer targeting. Statistisk inlärning logistisk regression random forest klassificerare kundrelationshantering kundinriktning. Mathematical Analysis Matematisk analys
5	Automatic Feature Extraction for Human Activity Recognitionon the Edge Cleve, Oscar, Gustafsson, Sara January 2019 (has links) This thesis evaluates two methods for automatic feature extraction to classify the accelerometer data of periodic and sporadic human activities. The first method selects features using individual hypothesis tests and the second one is using a random forest classifier as an embedded feature selector. The hypothesis test was combined with a correlation filter in this study. Both methods used the same initial pool of automatically generated time series features. A decision tree classifier was used to perform the human activity recognition task for both methods.The possibility of running the developed model on a processor with limited computing power was taken into consideration when selecting methods for evaluation. The classification results showed that the random forest method was good at prioritizing among features. With 23 features selected it had a macro average F1 score of 0.84 and a weighted average F1 score of 0.93. The first method, however, only had a macro average F1 score of 0.40 and a weighted average F1 score of 0.63 when using the same number of features. In addition to the classification performance this thesis studies the potential business benefits that automation of feature extractioncan result in. / Denna studie utvärderar två metoder som automatiskt extraherar features för att klassificera accelerometerdata från periodiska och sporadiska mänskliga aktiviteter. Den första metoden väljer features genom att använda individuella hypotestester och den andra metoden använder en random forest-klassificerare som en inbäddad feature-väljare. Hypotestestmetoden kombinerades med ett korrelationsfilter i denna studie. Båda metoderna använde samma initiala samling av automatiskt genererade features. En decision tree-klassificerare användes för att utföra klassificeringen av de mänskliga aktiviteterna för båda metoderna. Möjligheten att använda den slutliga modellen på en processor med begränsad hårdvarukapacitet togs i beaktning då studiens metoder valdes. Klassificeringsresultaten visade att random forest-metoden hade god förmåga att prioritera bland features. Med 23 utvalda features erhölls ett makromedelvärde av F1 score på 0,84 och ett viktat medelvärde av F1 score på 0,93. Hypotestestmetoden resulterade i ett makromedelvärde av F1 score på 0,40 och ett viktat medelvärde av F1 score på 0,63 då lika många features valdes ut. Utöver resultat kopplade till klassificeringsproblemet undersöker denna studie även potentiella affärsmässiga fördelar kopplade till automatisk extrahering av features. Human Activity Recognition Automatic Feature Extraction Automatic Feature Selection Automated Machine Learning Random Forest Classifier Hypothesis Test Computer and Information Sciences Data- och informationsvetenskap
6	Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links) Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover. Machine Learning Employee Turnover Prediction Supervised Learn- ing Models Logistic Regression Naive Bayes Classifier Random Forest Classifier XGBoost Computer Sciences Datavetenskap (datalogi)
7	Machine Learning Methods for Segmentation of Complex Metal Microstructure Features Fredriksson, Daniel January 2022 (has links) Machine learning is a growing topic with possibilities that seems endless with growing areas of applications. The field of metallography today is highly dependent on the operators’ knowledge and technical equipment to perform segmentation and analysis of the microstructure. Having expert dependents is both costly and very time-consuming. Some automatic segmentation is possible using SEM but not for all materials and only having to depend on one machine will create a bottleneck. In this thesis, a traditional supervised machine learning model has been built with a Random Forest (RF) classifier. The model performs automatic segmentation of complex microstructure features from images taken using light optical- and scanning electron microscopes. Two types of material, High-Strength-Low-Alloy (HSLA) steel with in-grain carbides and grain boundary carbides, and nitrocarburized steel with different amounts of porosity were analyzed in this work. Using a bank of feature extractors together with labeled ground truth data one model for each material was trained and used for the segmentation of new data. The model trained for the HSLA steel was able to effectively segment and analyze the carbides with a small amount of training. The model could separate the two types of carbides which is not possible with traditional thresholding. However, the model trained on nitrocarburized steel showcased difficulties in detecting the porosity. The result was however improved with a different approach to the labeling. The result implies that further development can be made to improve the model. / Maskininlärning är ett växande område där möjligheterna verkar oändliga med växande applikationsområden. Området för metallografi är idag till stor utsträckning beroende av operatörens kunskap och de tekniska instrumenten som finns tillgängliga för att genomföra segmentering och analys av mikrostrukturen. Viss automatisk segmentering är möjlig genom att använda SEM, men det är inte möjligt för alla material samt att behöva vara beroende av endast en maskin kommer skapa en flaskhals. I denna uppsats har en traditionell övervakad maskininlärnings modell skapats med en Random Forest klassificerare. Modellen genomför automatisk segmentering av komplexa mikrostrukturer på bilder från både ljusoptiskt- och svepelektron-mikroskop. Två olika typer av material, Hög-Styrka-Låg-Legerat (HSLA) stål med karbider och korngräns karbider, samt nitrokarburerat stål med varierande mängd porositet analyserades i detta arbete. Genom användningen av en särdragsextraktions bank tillsammans med annoterad grundsannings data tränades en modell för vartdera materialet och användes för segmentering av ny bild data. Modellen som tränades för HSLA stålet kunde effektivt segmentera och analysera karbiderna med en liten mängd träning. Modellen kunde separera de två typerna av karbider vilket inte varit möjligt med traditionellt tröskelvärde. Den modell som tränades för det nitrokarburerade stålet visade emellertid upp svårigheter i att detektera porositeten. Resultatet kunde dock förbättras genom ett annorlunda tillvägagångssätt för annoteringen. Resultatet vittnar om att vidareutveckling kan göras för att förbättra slutresultatet. Machine learning Metallography Automatic segmentation Complex microstructures Random Forest classifier. Maskininlärning Metallografi Automatisk segmentering Komplex mikrostruktur Random Forest klassificerare Other Materials Engineering Annan materialteknik
8	Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links) Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD. Autism Spectrum Disorder(ASD) Classification Data pre-processing Feature selection Machine learning algorithms Random Forest Classifier Support Vector Classifier. Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
9	Analyzing Radial Basis Function Neural Networks for predicting anomalies in Intrusion Detection Systems / Utvärdera prestanda av radiella basfunktionsnätverk för intrångsdetekteringssystem Kamat, Sai Shyamsunder January 2019 (has links) In the 21st century, information is the new currency. With the omnipresence of devices connected to the internet, humanity can instantly avail any information. However, there are certain are cybercrime groups which steal the information. An Intrusion Detection System (IDS) monitors a network for suspicious activities and alerts its owner about an undesired intrusion. These commercial IDS’es react after detecting intrusion attempts. With the cyber attacks becoming increasingly complex, it is expensive to wait for the attacks to happen and respond later. It is crucial for network owners to employ IDS’es that preemptively differentiate a harmless data request from a malicious one. Machine Learning (ML) can solve this problem by recognizing patterns in internet traffic to predict the behaviour of network users. This project studies how effectively Radial Basis Function Neural Network (RBFN) with Deep Learning Architecture can impact intrusion detection. On the basis of the existing framework, it asks how well can an RBFN predict malicious intrusive attempts, especially when compared to contemporary detection practices.Here, an RBFN is a multi-layered neural network model that uses a radial basis function to transform input traffic data. Once transformed, it is possible to separate the various traffic data points using a single straight line in extradimensional space. The outcome of the project indicates that the proposed method is severely affected by limitations. E.g. the model needs to be fine tuned over several trials to achieve a desired accuracy. The results of the implementation show that RBFN is accurate at predicting various cyber attacks such as web attacks, infiltrations, brute force, SSH etc, and normal internet behaviour on an average 80% of the time. Other algorithms in identical testbed are more than 90% accurate. Despite the lower accuracy, RBFN model is more than 94% accurate at recording specific kinds of attacks such as Port Scans and BotNet malware. One possible solution is to restrict this model to predict only malware attacks and use different machine learning algorithm for other attacks. / I det 21: a århundradet är information den nya valutan. Med allnärvaro av enheter anslutna till internet har mänskligheten tillgång till information inom ett ögonblick. Det finns dock vissa grupper som använder metoder för att stjäla information för personlig vinst via internet. Ett intrångsdetekteringssystem (IDS) övervakar ett nätverk för misstänkta aktiviteter och varnar dess ägare om ett oönskat intrång skett. Kommersiella IDS reagerar efter detekteringen av ett intrångsförsök. Angreppen blir alltmer komplexa och det kan vara dyrt att vänta på att attackerna ska ske för att reagera senare. Det är avgörande för nätverksägare att använda IDS:er som på ett förebyggande sätt kan skilja på oskadlig dataanvändning från skadlig. Maskininlärning kan lösa detta problem. Den kan analysera all befintliga data om internettrafik, känna igen mönster och förutse användarnas beteende. Detta projekt syftar till att studera hur effektivt Radial Basis Function Neural Networks (RBFN) med Djupinlärnings arkitektur kan påverka intrångsdetektering. Från detta perspektiv ställs frågan hur väl en RBFN kan förutsäga skadliga intrångsförsök, särskilt i jämförelse med befintliga detektionsmetoder.Här är RBFN definierad som en flera-lagers neuralt nätverksmodell som använder en radiell grundfunktion för att omvandla data till linjärt separerbar. Efter en undersökning av modern litteratur och lokalisering av ett namngivet dataset användes kvantitativ forskningsmetodik med prestanda indikatorer för att utvärdera RBFN: s prestanda. En Random Forest Classifier algorithm användes också för jämförelse. Resultaten erhölls efter en serie finjusteringar av parametrar på modellerna. Resultaten visar att RBFN är korrekt när den förutsäger avvikande internetbeteende i genomsnitt 80% av tiden. Andra algoritmer i litteraturen beskrivs som mer än 90% korrekta. Den föreslagna RBFN-modellen är emellertid mycket exakt när man registrerar specifika typer av attacker som Port Scans och BotNet malware. Resultatet av projektet visar att den föreslagna metoden är allvarligt påverkad av begränsningar. T.ex. så behöver modellen finjusteras över flera försök för att uppnå önskad noggrannhet. En möjlig lösning är att begränsa denna modell till att endast förutsäga malware-attacker och använda andra maskininlärnings-algoritmer för andra attacker. anomaly cyber security evaluation machine learning radial basis function random forest classifier supervised learning anomali cybersäkerhet utvärdering maskininlärning radialbaserad funktion slumpmässig skogsklassificering övervakad inlärning Computer and Information Sciences Data- och informationsvetenskap
10	A PROBABILISTIC MACHINE LEARNING FRAMEWORK FOR CLOUD RESOURCE SELECTION ON THE CLOUD Khan, Syeduzzaman 01 January 2020 (has links) (PDF) The execution of the scientific applications on the Cloud comes with great flexibility, scalability, cost-effectiveness, and substantial computing power. Market-leading Cloud service providers such as Amazon Web service (AWS), Azure, Google Cloud Platform (GCP) offer various general purposes, memory-intensive, and compute-intensive Cloud instances for the execution of scientific applications. The scientific community, especially small research institutions and undergraduate universities, face many hurdles while conducting high-performance computing research in the absence of large dedicated clusters. The Cloud provides a lucrative alternative to dedicated clusters, however a wide range of Cloud computing choices makes the instance selection for the end-users. This thesis aims to simplify Cloud instance selection for end-users by proposing a probabilistic machine learning framework to allow to users select a suitable Cloud instance for their scientific applications. This research builds on the previously proposed A2Cloud-RF framework that recommends high-performing Cloud instances by profiling the application and the selected Cloud instances. The framework produces a set of objective scores called the A2Cloud scores, which denote the compatibility level between the application and the selected Cloud instances. When used alone, the A2Cloud scores become increasingly unwieldy with an increasing number of tested Cloud instances. Additionally, the framework only examines the raw application performance and does not consider the execution cost to guide resource selection. To improve the usability of the framework and assist with economical instance selection, this research adds two Naïve Bayes (NB) classifiers that consider both the application’s performance and execution cost. These NB classifiers include: 1) NB with a Random Forest Classifier (RFC) and 2) a standalone NB module. Naïve Bayes with a Random Forest Classifier (RFC) augments the A2Cloud-RF framework's final instance ratings with the execution cost metric. In the training phase, the classifier builds the frequency and probability tables. The classifier recommends a Cloud instance based on the highest posterior probability for the selected application. The standalone NB classifier uses the generated A2Cloud score (an intermediate result from the A2Cloud-RF framework) and execution cost metric to construct an NB classifier. The NB classifier forms a frequency table and probability (prior and likelihood) tables. For recommending a Cloud instance for a test application, the classifier calculates the highest posterior probability for all of the Cloud instances. The classifier recommends a Cloud instance with the highest posterior probability. This study performs the execution of eight real-world applications on 20 Cloud instances from AWS, Azure, GCP, and Linode. We train the NB classifiers using 80% of this dataset and employ the remaining 20% for testing. The testing yields more than 90% recommendation accuracy for the chosen applications and Cloud instances. Because of the imbalanced nature of the dataset and multi-class nature of classification, we consider the confusion matrix (true positive, false positive, true negative, and false negative) and F1 score with above 0.9 scores to describe the model performance. The final goal of this research is to make Cloud computing an accessible resource for conducting high-performance scientific executions by enabling users to select an effective Cloud instance from across multiple providers. Cloud computing Cloud resource selection K Means Machine learning Naive Bayes Random forest classifier Computer engineering Computer Engineering Computer Sciences Data Storage Systems Engineering Other Computer Engineering Other Computer Sciences

Search results