Spelling suggestions: "subject:"8upport vector"" "subject:"6upport vector""
131 |
Product Similarity Matching for Food Retail using Machine Learning / Produktliknande matchning för livsmedel med maskininlärningKerek, Hanna January 2020 (has links)
Product similarity matching for food retail is studied in this thesis. The goal is to find products that are similar but not necessarily of the same brand which can be used as a replacement product for a product that is out of stock or does not exist in a specific store. The aim of the thesis is to examine which machine learning model that is best suited to perform the product similarity matching. The product data used for training the models were name, description, nutrients, weight and filters (labels, for example organic). Product similarity matching was performed pairwise and the similarity between the products was measured by jaccard distance for text attributes and relative difference for numeric values. Random Forest, Logistic Regression and Support Vector Machines were tested and compared to a baseline. The baseline computed the jaccard distance for the product names and did the classification based on a threshold value of the jaccard distance. The result was measured by accuracy, F-measure and AUC score. Random Forest performed best in terms of all evaluation metrics and Logistic Regression, Random Forest and Support Vector Machines all performed better than the baseline. / I den här rapporten studeras produktliknande matchning för livsmedel. Målet är att hitta produkter som är liknande men inte nödvändigtvis har samma märke som kan vara en ersättningsprodukt till en produkt som är slutsåld eller inte säljs i en specifik affär. Syftet med den här rapporten är att undersöka vilken maskininlärningsmodel som är bäst lämpad för att göra produktliknande matchning. Produktdatan som användes för att träna modellerna var namn, beskrivning, näringsvärden, vikt och märkning (exempelvis ekologisk). Produktmatchningen gjordes parvis och likhet mellan produkterna beräknades genom jaccard index för textattribut och relativ differens för numeriska värden. Random Forest, logistisk regression och Support Vector Machines testades och jämfördes mot en baslinje. I baslinjen räknades jaccard index ut enbart för produkternas namn och klassificeringen gjordes genom att använda ett tröskelvärde för jaccard indexet. Resultatet mättes genom noggrannhet, F-measure och AUC. Random Forest presterade bäst sett till alla prestationsmått och logistisk regression, Random Forest och Support Vector Machines gav alla bättre resultat än baslinjen.
|
132 |
Analys av luftkvaliteten på Hornsgatan med hjälp av maskininlärning utifrån trafikflödesvariabler / Air Quality Analysis on Hornsgatan using Machine Learning with regards to Traffic FlowTeurnberg, Ellinor January 2023 (has links)
Denna studie har syftet att undersöka sambandet mellan luftföroreningar och olika fordonsvariabler, såsom årsmodell, bränsletyp och fordonstyp, på Hornsgatan i Stockholm. Studien avser att besvara vilka faktorer som har störst inverkan på luftkvaliteten. Utförandet baseras på maskininlärningsalgoritmerna Random Forest och Support Vector Regression, vilka jämförs utifrån R² och RMSE. Modellerna skapade med Random Forest överträffar Support Vector Regression för de olika luftföroreningarna. Den modell som presterade bäst var modellen för kolmonoxid vilken hade ett R²-värde på 99.7%. Den modell som gav prediktioner med lägst R²-värde, 68.4%, var modellen för kvävedioxid. Överlag var resultaten goda i relation till tidigare studier. Utifrån modellerna diskuteras variablers inverkan och olika åtgärder som kan införas i Stockholm Stad och på Hornsgatan för att förbättra luftkvaliteten. / This study aims to investigate the relationship between multiple air pollution and different vehicle variables, such as vehicle year, fuel type and vehicle type, on Hornsgatan in Stockholm. The study intends to answer which factors have the greatest impact on air quality. The implementation is based on the two machine learning algorithms Random Forest and Support Vector Regression, which are compared based on R² and RMSE. The models created with Random Forest outperform Support Vector Regression for the various air pollutants. The best performing model was the carbon monoxide model which had an R²-value of 99.7%. The model that gave predictions with the lowest R²-value, 68.4%, was the model for nitrogen dioxide. Overall, the results were good in relation to previous studies. With regards to these models, the impact of variables and different measures that can be introduced in the City of Stockholm and on Hornsgatan to improve air quality are discussed.
|
133 |
Analys av luftkvaliteten på Hornsgatan med hjälp av maskininlärning utifrån trafikflödesvariabler / Air Quality Analysis on Hornsgatan using Machine Learning with regards to Traffic Flow VariablesTreskog, Paulina, Teurnberg, Ellinor January 2023 (has links)
Denna studie har syftet att undersöka sambandet mellan luftföroreningar och olika fordonsvariabler, såsom årsmodell, bränsletyp och fordonstyp, på Hornsgatan i Stockholm. Studien avser att besvara vilka faktorer som har störst inverkan på luftkvaliteten. Utförandet baseras på maskininlärningsalgoritmerna Random Forest och Support Vector Regression, vilka jämförs utifrån R^2 och RMSE. Modellerna skapade med Random Forest överträffar Support Vector Regression för de olika luftföroreningarna. Den modell som presterade bäst var modellen för kolmonoxid vilken hade ett R^2-värde på 99.7%. Den modell som gav prediktioner med lägst R^2-värde, 68.4%, var modellen för kvävedioxid. Överlag var resultaten goda i relation till tidigare studier. Utifrån modellerna diskuteras variablers inverkan och olika åtgärder som kan införas i Stockholm Stad och på Hornsgatan för att förbättra luftkvaliteten. / This study aims to investigate the relationship between multiple air pollution and different vehicle variables, such as vehicle year, fuel type and vehicle type, on Hornsgatan in Stockholm. The study intends to answer which factors have the greatest impact on air quality. The implementation is based on the two machine learning algorithms Random Forest and Support Vector Regression, which are compared based on R^2 and RMSE. The models created with Random Forest outperform Support Vector Regression for the various air pollutants. The best performing model was the carbon monoxide model which had an R^2-value of 99.7%. The model that gave predictions with the lowest R^2-value, 68.4%, was the model for nitrogen dioxide. Overall, the results were good in relation to previous studies. With regards to these models, the impact of variables and different measures that can be introduced in the City of Stockholm and on Hornsgatan to improve air quality are discussed.
|
134 |
Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with ApplicationsRazzaghi, Talayeh 01 January 2014 (has links)
Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics.
|
135 |
An Efficient Ranking and Classification Method for Linear Functions, Kernel Functions, Decision Trees, and Ensemble MethodsGlass, Jesse Miller January 2020 (has links)
Structural algorithms incorporate the interdependence of outputs into the prediction, the loss, or both. Frank-Wolfe optimizations of pairwise losses and Gaussian conditional random fields for multivariate output regression are two such structural algorithms. Pairwise losses are standard 0-1 classification surrogate losses applied to pairs of features and outputs, resulting in improved ranking performance (area under the ROC curve, average precision, and F-1 score) at the cost of increased learning complexity. In this dissertation, it is proven that the balanced loss 0-1 SVM and the pairwise SVM have the same dual loss and the pairwise dual coefficient domain is a subdomain of the balanced loss 0-1 SVM with bias dual coefficient domain. This provides a theoretical advancement in the understanding of pairwise loss, which we exploit for the development of a novel ranking algorithm that is fast and memory efficient method with state the art ranking metric performance across eight benchmark data sets. Various practical advancements are also made in multivariate output regression. The learning time for Gaussian conditional random fields is greatly reduced and the parameter domain is expanded to enable repulsion between outputs. Last, a novel multivariate regression is presented that keeps the desirable elements of GCRF and infuses them into a local regression model that improves mean squared error and reduces learning complexity. / Computer and Information Science
|
136 |
An evolutionary Pentagon Support Vector finder methodMousavi, S.M.H., Vincent, Charles, Gherman, T. 02 March 2020 (has links)
Yes / In dealing with big data, we need effective algorithms; effectiveness that depends, among others, on the ability to remove outliers from the data set, especially when dealing with classification problems. To this aim, support vector finder algorithms have been created to save just the most important data in the data pool. Nevertheless, existing classification algorithms, such as Fuzzy C-Means (FCM), suffer from the drawback of setting the initial cluster centers imprecisely. In this paper, we avoid existing shortcomings and aim to find and remove unnecessary data in order to speed up the final classification task without losing vital samples and without harming final accuracy; in this sense, we present a unique approach for finding support vectors, named evolutionary Pentagon Support Vector (PSV) finder method. The originality of the current research lies in using geometrical computations and evolutionary algorithms to make a more effective system, which has the advantage of higher accuracy on some data sets. The proposed method is subsequently tested with seven benchmark data sets and the results are compared to those obtained from performing classification on the original data (classification before and after PSV) under the same conditions. The testing returned promising results.
|
137 |
Deep Learning One-Class Classification With Support Vector MethodsHampton, Hayden D 01 January 2024 (has links) (PDF)
Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in an era where data volume and complexity present unprecedented challenges. The first two methods are designed for tabular data from a least squares perspective. Formulating these optimization problems within a least squares framework offers notable advantages. It facilitates the derivation of closed-form solutions for critical gradients that largely influence the optimization procedure. Moreover, this approach circumvents the prevalent issue of degenerate or uninformative solutions, a challenge often associated with these types of deep learning algorithms. The third method is designed for second-order tensors. This proposed method has certain computational advantages and alleviates the need for vectorization, which can lead to structural information loss when spatial or contextual relationships exist in the data structure. The performance of the three proposed methods are demonstrated with simulation studies and real-world datasets. Compared to kernel-based one-class classification methods, the proposed deep learning methods achieve significantly better performance under the settings considered.
|
138 |
Solving support vector machine classification problems and their applications to supplier selectionKim, Gitae January 1900 (has links)
Doctor of Philosophy / Department of Industrial & Manufacturing Systems Engineering / Chih-Hang Wu / Recently, interdisciplinary (management, engineering, science, and economics) collaboration research has been growing to achieve the synergy and to reinforce the weakness of each discipline. Along this trend, this research combines three topics: mathematical programming, data mining, and supply chain management. A new pegging algorithm is developed for solving the continuous nonlinear knapsack problem. An efficient solving approach is proposed for solving the ν-support vector machine for classification problem in the field of data mining. The new pegging algorithm is used to solve the subproblem of the support vector machine problem. For the supply chain management, this research proposes an efficient integrated solving approach for the supplier selection problem. The support vector machine is applied to solve the problem of selecting potential supplies in the procedure of the integrated solving approach.
In the first part of this research, a new pegging algorithm solves the continuous nonlinear knapsack problem with box constraints. The problem is to minimize a convex and differentiable nonlinear function with one equality constraint and box constraints. Pegging algorithm needs to calculate primal variables to check bounds on variables at each iteration, which frequently is a time-consuming task. The newly proposed dual bound algorithm checks the bounds of Lagrange multipliers without calculating primal variables explicitly at each iteration. In addition, the calculation of the dual solution at each iteration can be reduced by a proposed new method for updating the solution.
In the second part, this research proposes several streamlined solution procedures of ν-support vector machine for the classification. The main solving procedure is the matrix splitting method. The proposed method in this research is a specified matrix splitting method combined with the gradient projection method, line search technique, and the incomplete Cholesky decomposition method. The method proposed can use a variety of methods for line search and parameter updating. Moreover, large scale problems are solved with the incomplete Cholesky decomposition and some efficient implementation techniques.
To apply the research findings in real-world problems, this research developed an efficient integrated approach for supplier selection problems using the support vector machine and the mixed integer programming. Supplier selection is an essential step in the procurement processes. For companies considering maximizing their profits and reducing costs, supplier selection requires seeking satisfactory suppliers and allocating proper orders to the selected suppliers. In the early stage of supplier selection, a company can use the support vector machine classification to choose potential qualified suppliers using specific criteria. However, the company may not need to purchase from all qualified suppliers. Once the company determines the amount of raw materials and components to purchase, the company then selects final suppliers from which to order optimal order quantities at the final stage of the process. Mixed integer programming model is then used to determine final suppliers and allocates optimal orders at this stage.
|
139 |
Classification of Hate Tweets and Their Reasons using SVMTarasova, Natalya January 2016 (has links)
Denna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att kunna identifiera ett hat-meddelande och dess orsak. Studien resulterade i två metoder: en "naiv" metod (the Naive Method, NM) och en mer "avancerad" metod (the Partial Timeline Method, PTM). NM är en binär metod i den bemärkelsen att den ställer frågan: "Tillhör denna tweet klassen Hat?". PTM ställer samma fråga men till en begränsad mängd av tweets, dvs bara de som ligger inom ± 30 min från publiceringen av hat-tweeten. Sammanfattningsvis indikerade studiens resultat att PTM är noggrannare än NM. Dock tar den inte hänsyn till samtliga tweets på användarens tidslinje. Därför medför valet av metod en avvägning: PTM erbjuder en noggrannare klassificering och NM erbjuder en mer utförlig klassificering. / This study focused on finding the hate tweets posted by the customers of three mobileoperators Verizon, AT&T and Sprint and identifying the reasons for their dissatisfaction. The timelines with a hate tweet were collected and studied for the presence of an explanation. A machine learning approach was employed using four categories: Hate, Reason, Explanatory and Other. The classication was conducted with one-versus-all approach using Support Vector Machines algorithm implemented in a LIBSVM tool. The study resulted in two methodologies: the Naive method (NM) and the Partial Time-line Method (PTM). The Naive Method relied only on the feature space consisting of the most representative words chosen with Akaike Information Criterion. PTM utilized the fact that the majority of the explanations were posted within a one-hour time window of the posting of a hate tweet. We found that the accuracy of PTM is higher than for NM. In addition, PTM saves time and memory by analysing fewer tweets. At the same time this implies a trade-off between relevance and completeness. / <p>Opponent: Kristina Wettainen</p>
|
140 |
A Dynamic Behavioral Biometric Approach to Authenticate Users Employing Their Fingers to Interact with Touchscreen DevicesPonce, Arturo 01 May 2015 (has links)
The use of mobile devices has extended to all areas of human life and has changed the way people work and socialize. Mobile devices are susceptible to getting lost, stolen, or compromised. Several approaches have been adopted to protect the information stored on these devices. One of these approaches is user authentication. The two most popular methods of user authentication are knowledge based and token based methods but they present different kinds of problems.
Biometric authentication methods have emerged in recent years as a way to deal with these problems. They use an individual’s unique characteristics for identification and have proven to be somewhat effective in authenticating users. Biometric authentication methods also present several problems. For example, they aren’t 100% effective in identifying users, some of them are not well perceived by users, others require too much computational effort, and others require special equipment or special postures by the user. Ultimately their implementation can result in unauthorized use of the devices or the user being annoyed by the implementation.
New ways of interacting with mobile devices have emerged in recent years. This makes it necessary for authentication methods to adapt to these changes and take advantage of them. For example, the use of touchscreens has become prevalent in mobile devices, which means that biometric authentication methods need to adapt to it. One important aspect to consider when adopting these new methods is their acceptance of these methods by users. The Technology Acceptance Model (TAM) states that system use is a response that can be predicted by user motivation.
This work presents an authentication method that can constantly verify the user’s identity which can help prevent unauthorized use of a device or access to sensitive information. The goal was to authenticate people while they used their fingers to interact with their touchscreen mobile devices doing ordinary tasks like vertical and horizontal scrolling. The approach used six biometric traits to do the authentication. The combination of those traits allowed for authentication at the beginning and at the end of a finger stroke. Support Vector Machines were employed and the best results obtained show Equal Error Rate values around 35%. Those results demonstrate the potential of the approach to verify a person’s identity.
Additionally, this works tested the acceptance of the approach among participants, which can influence its eventual adoption. An acceptance level of 80% was obtained which compares favorably against other behavioral biometric approaches.
|
Page generated in 0.0732 seconds