Global ETD Search

11	A Combined Approach to Handle Multi-class Imbalanced Data and to Adapt Concept Drifts using Machine Learning Tumati, Saini 05 October 2021 (has links) No description available. Computer Science Imbalanced datasets Multi-class imbalanced datasets Oversampling Concept drifts Machine learning ensemble learning
12	Multi-Class Imbalanced Learning for Time Series Problem : An Industrial Case Study Andersson, Melanie January 2020 (has links) Classification problems with multiple classes and imbalanced sample sizes present a new challenge than the binary classification problems. Methods have been proposed to handle imbalanced learning, however most of them are specifically designed for binary classification problems. Multi-class imbalance imposes additional challenges when applied to time series classification problems, such as weather classification. In this thesis, we introduce, apply and evaluate a new algorithm for handling multi-class imbalanced problems involving time series data. Our proposed algorithm is designed to handle both multi-class imbalance and time series classification problems and is inspired by the Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification algorithm. The feasibility of our proposed algorithm is studied through an empirical evaluation performed on a telecom use-case at Ericsson, Sweden where data from commercial microwave links is used for weather classification. Our proposed algorithm is compared to the currently used model at Ericsson which is a one-dimensional convolutional neural network, as well as three other deep learning models. The empirical evaluation indicates that the performance of our proposed algorithm for weather classification is comparable to that of the current solution. Our proposed algorithm and the current solution are the two best performing models of the study. Time Series Imbalanced Learning Weather Classification Microwave Link Multi-Class Classification Engineering and Technology Teknik och teknologier
13	Evaluating machine learning strategies for classification of large-scale Kubernetes cluster logs Sarika, Pawan January 2022 (has links) Kubernetes is a free, open-source container orchestration system for deploying and managing Docker containers that host microservices. Its cluster logs are extremely helpful in determining the root cause of a failure. However, as systems become more complex, locating failures becomes more difficult and time-consuming. This study aims to identify the classification algorithms that accurately classify the given log data and, at the same time, require fewer computational resources. Because the data is quite large, we begin with expert-based feature selection to reduce the data size. Following that, TF-IDF feature extraction is performed, and finally, we compare five classification algorithms, SVM, KNN, random forest, gradient boosting and MLP using several metrics. The results show that Random forest produces good accuracy while requiring fewer computational resources compared to other algorithms. Kubernetes logs feature selection feature extraction multi-class classification Computational cost Computer Sciences Datavetenskap (datalogi)
14	Boosting for Learning From Imbalanced, Multiclass Data Sets Abouelenien, Mohamed 12 1900 (has links) In many real-world applications, it is common to have uneven number of examples among multiple classes. The data imbalance, however, usually complicates the learning process, especially for the minority classes, and results in deteriorated performance. Boosting methods were proposed to handle the imbalance problem. These methods need elongated training time and require diversity among the classifiers of the ensemble to achieve improved performance. Additionally, extending the boosting method to handle multi-class data sets is not straightforward. Examples of applications that suffer from imbalanced multi-class data can be found in face recognition, where tens of classes exist, and in capsule endoscopy, which suffers massive imbalance between the classes. This dissertation introduces RegBoost, a new boosting framework to address the imbalanced, multi-class problems. This method applies a weighted stratified sampling technique and incorporates a regularization term that accommodates multi-class data sets and automatically determines the error bound of each base classifier. The regularization parameter penalizes the classifier when it misclassifies instances that were correctly classified in the previous iteration. The parameter additionally reduces the bias towards majority classes. Experiments are conducted using 12 diverse data sets with moderate to high imbalance ratios. The results demonstrate superior performance of the proposed method compared to several state-of-the-art algorithms for imbalanced, multi-class classification problems. More importantly, the sensitivity improvement of the minority classes using RegBoost is accompanied with the improvement of the overall accuracy for all classes. With unpredictability regularization, a diverse group of classifiers are created and the maximum accuracy improvement reaches above 24%. Using stratified undersampling, RegBoost exhibits the best efficiency. The reduction in computational cost is significant reaching above 50%. As the volume of training data increase, the gain of efficiency with the proposed method becomes more significant. Boosting multi-class classifications stratified sampling regularization parameter imbalaced data sets
15	Multi-Class Classification for Predicting Customer Satisfaction : Application of machine learning methods to predict customer satisfaction at IKEA Backerholm, Stina, Börjesjö, Malin January 2023 (has links) Gaining a comprehensive understanding of the features that contribute to customer satisfaction after contact with IKEA’s Remote Customer Meeting Points (RCMPs) is essential for implementing effective remedial measures in the future. The aim of this project is to investigate if it is possible to find key features that influence customer satisfaction and to use these to predict customer satisfaction. The task has been approached as a multi-class classification problem, with the objective of classifying the observations into five distinct levels of customer satisfaction. The study utilized three models, Multinomial Logistic Regression, Random Forest, and Extreme Gradient Boosting, to investigate these possibilities. Based on the methods used and the available data, the results indicate that it is currently not feasible to accurately identify key features or predict customer satisfaction. / Att förstå vilka faktorer som bidrar till kundnöjdhet efter en kontakt med IKEAs RCMPs är avgörande för att kunna genomföra effektiva åtgärder i framtiden. Syftet med detta projekt är att undersöka om det är möjligt att hitta nyckelfaktorer som påverkar kundnöjdhet och använda dessa för att prediktera kundnöjdhet. Uppgiften har angripits som ett multi-klass klassificeringsproblem, med syftet att klas- sificera observationerna i fem olika nivåer av kundnöjdhet. Studien har utvärderat tre olika modeller, Multinomial Logistic Regression, Random Forest och Extreme Gradient Boosting, för att undersöka dessa möjligheter. Baserat på de använda metoderna med tillgängliga data, indikerar resultaten att det för tillfället inte är möjligt att identifiera nyckelfaktorer eller prediktera kundnöjdhet med hög noggrannhet. Multi-Class Classification Imbalanced Data Machine Learning Multi-Klass Klassifisering Obalanserat Data Maskininlärning Mathematics Matematik
16	Contributions to Efficient Statistical Modeling of Complex Data with Temporal Structures Hu, Zhihao 03 March 2022 (has links) This dissertation will focus on three research projects: Neighborhood vector auto regression in multivariate time series, uncertainty quantification for agent-based modeling networked anagrams, and a scalable algorithm for multi-class classification. The first project studies the modeling of multivariate time series, with the applications in the environmental sciences and other areas. In this work, a so-called neighborhood vector autoregression (NVAR) model is proposed to efficiently analyze large-dimensional multivariate time series. The time series are assumed to have underlying distances among them based on the inherent setting of the problem. When this distance matrix is available or can be obtained, the proposed NVAR method is demonstrated to provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study. The second project focuses on the study of group anagram games. In a group anagram game, players are provided letters to form as many words as possible. In this work, the enhanced agent behavior models for networked group anagram games are built, exercised, and evaluated under an uncertainty quantification framework. Specifically, the game data for players is clustered based on their skill levels (forming words, requesting letters, and replying to requests), the multinomial logistic regressions for transition probabilities are performed, and the uncertainty is quantified within each cluster. The result of this process is a model where players are assigned different numbers of neighbors and different skill levels in the game. Simulations of ego agents with neighbors are conducted to demonstrate the efficacy of the proposed methods. The third project aims to develop efficient and scalable algorithms for multi-class classification, which achieve a balance between prediction accuracy and computing efficiency, especially in high dimensional settings. The traditional multinomial logistic regression becomes slow in high dimensional settings where the number of classes (M) and the number of features (p) is large. Our algorithms are computing efficiently and scalable to data with even higher dimensions. The simulation and case study results demonstrate that our algorithms have huge advantage over traditional multinomial logistic regressions, and maintains comparable prediction performance. / Doctor of Philosophy / In many data-central applications, data often have complex structures involving temporal structures and high dimensionality. Modeling of complex data with temporal structures have attracted great attention in many applications such as enviromental sciences, network sciences, data mining, neuroscience, and economics. However, modeling such complex data is quite challenging due to large uncertainty and dimensionality of complex data. This dissertation focuses on modeling and prediction of complex data with temporal structures. Three different types of complex data are modeled. For example, the nitrogen of multiple streams are modeled in a joint manner, human actions in networked group anagrams are modeled and the uncertainty is quantified, and data with multiple labels are classified. Different models are proposed and they are demonstrated to be efficient through simulation and case study. Neighborhood Vector Autoregression Multivariate Time Series Uncertainty Quantification Agent-Based Modeling Multi-Class Prediction
17	Unární klasifikátor obrazových dat / Unary Classification of Image Data Beneš, Jiří January 2021 (has links) The work deals with an introduction to classification algorithms. It then divides classifiers into unary, binary and multi-class and describes the different types of classifiers. The work compares individual classifiers and their areas of use. For unary classifiers, practical examples and a list of used architectures are given in the work. The work contains a chapter focused on the comparison of the effects of hyperparameters on the quality of unary classification for individual architectures. Part of the submission is a practical example of implementation of the unary classifier.
18	Identifying Plankton from Grayscale Silhouette Images Kramer, Kurt A 27 October 2005 (has links) Utilizing a continuous silhouette image of marine plankton produced by a device called SIPPER, developed by the Marine Sciences Department, individual plankton images were extracted, features were derived, and classification was performed. There were plankton recognition experiments performed in Support Vector Machine parameter tuning, Fourier descriptors, and feature selection. Several groups of features were implemented, moments, gramulometric, Fourier transform for texture, intensity histograms, Fourier descriptors for contour, convex hull, and Eigen ratio. The Fourier descriptors were implemented in three different flavors sampling, averaging and hybrid (mix of sampling and averaging). The feature selection experiments utilized a modified WRAPPER approach of which several flavors were explored including Best Case Next, Forward and Backward, and Beam Search. Feature selection significantly reduced the number of features required for processing, while at the same time maintaining the same level of classification accuracy. This resulted in reduced processing time for training and classification. SIPPER Feature selection Feature calculation Active learning Support vector machine SVM Multi-class American Studies Arts and Humanities
19	System for Identifying Plankton from the SIPPER Instrument Platform Kramer, Kurt A. 29 October 2010 (has links) Plankton imaging systems such as SIPPER produce a large quantity of data in the form of plankton images from a variety of classes. A system known as PICES was developed to quickly extract, classify and manage the millions of images produced from a single one-week research cruise. A new fast technique for parameter tuning and feature selection for Support Vector Machines using Wrappers was created. This technique allows for faster feature selection, while at the same time maintaining and sometimes improving classification accuracy. It also gives the user greater flexibility in the management of class contents in existing training libraries. Support vector machines are binary classifiers that can implement multi-class classifiers by creating a classifier for each possible combination of classes or for each class using a one class versus all strategy. Feature selection searches for a single set of features to be used by each of the binary classifiers. This ignores the fact that features that may be good discriminators for two particular classes might not do well for other class combinations. As a result, the feature selection process may not include these features in the common set to be used by all support vector machines. It is shown through experimentation that by selecting features for each binary class combination, overall classification accuracy can be improved and the time required for training a multi-class support vector machine can be reduced. Another benefit of this approach is that significantly less time is required for feature selection when additional classes are added to the training data. This is because the features selected for the existing class combinations are still valid, so that feature selection only needs to be run for the new combination added. This work resulted in a system called PICES, a GUI based user friendly system, which aids in the classification management of over 55 million images of plankton split amongst 180 classes. PICES embodies an improved means of performing Wrapper based feature selection that creates classifiers that train faster and are just as accurate and sometimes more accurate, while reducing the feature selection time. Marine Science PICES Machine Learning Feature Selection Support Vector Machine SVM Multi-Class Pair-Wise American Studies Arts and Humanities
20	Multi-class recognition using pair-wise classifiers / Daugelio klasių atpažinimas naudojant klasifikatorius poroms Kybartas, Rimantas 01 October 2010 (has links) There are plenty of solutions for the task of multi-class recognition. Unfortunately, these solutions are not always unanimous. Most of them are based on empirical experiments while statistical data features consideration is often omitted. That’s why questions like when and which method should be used, what the reliability of any chosen method is for solving a multi-class recognition task arise. In this dissertation two-stage multi-class decision methods are analyzed. Pair-wise classifiers able to better exploit statistical data features are used in the first stage of such methods. In the second stage a particular fusion rule of the first stage results is used to fuse the first stage results in order to produce the final classification decision. Complexity issues of pair-wise classifiers, training data size and precision of method quality estimation are pointed out in the research. The precision of algorithm highly depends on the data and the number of experiments performed (data permutation, division into training and testing data). It is shown that the declared superiority of some known algorithms is not reliable due to low precision of estimation. A detailed comparison of well known multi-class classification methods is performed and a new pair-wise classifier fusion method based on similar method used in multi-class classifier fusion is presented. The recommendations for multi-class classification task designer are provided. Methods which allow reducing classification... [to full text] / Daugelio klasių atpažinimo uždaviniams spręsti yra sukurta aibė sprendimų ir ne visada vieningų rekomendacijų. Dauguma jų paremta empiriniais bandymais, retai atsižvelgiama į statistines duomenų savybes. Dėl to sprendžiant daugelio klasių klasifikavimo uždavinį kyla klausimų, kurį metodą ir kada geriausia naudoti, koks vieno ar kito metodo patikimumas. Disertacijoje nagrinėjami dviejų pakopų sprendimo priėmimo metodai, kai pirmame etape sudaromi klasifikatoriai poroms (angl. pair-wise), sugebantys geriau išnaudoti klasių tarpusavio statistines savybes, o kitame etape yra atliekamas klasifikatorių poroms rezultatų apjungimas. Tyrime ypatingas dėmesys yra skiriamas klasifikatorių poroms sudėtingumui, mokymo duomenų kiekiui bei algoritmų kokybės įvertinimo tikslumui. Tikslumas labai priklauso nuo duomenų bei atliktų eksperimentų kiekio (duomenų permaišymo klasėse, juos skirstant į mokymo ir testavimo). Parodyta, jog dėl žemo įvertinimo tikslumo kai kurių publikuotų algoritmų deklaruojamas pranašumas prieš žinomus algoritmus nėra patikimas. Darbe atliktas detalus žinomų metodų palyginimas bei pristatytas naujai sukurtas klasifikatorių poroms apjungimo algoritmas, kuris yra paremtas analogišku algoritmu daugelio klasių klasifikatorių rezultatų apjungimui. Pateiktos bendros rekomendacijos, kaip projektuotojui elgtis daugelio klasių atveju. Pasiūlyti metodai, leidžiantys sumažinti klasifikavimo klaidą atliekant klasifikatorių poroms apjungimo koregavimą, kad algoritmas nebūtų... [toliau žr. visą tekstą] Informatics Multi-class classification Single layer perceptron Pair-wise classification Daugelio klasių klasifikavimas Vieno sluoksnio perceptronas Klasifikavimas poroms

Search results