Global ETD Search

11	Dynamic Committees for Handling Concept Drift in Databases (DCCD) AlShammeri, Mohammed 07 November 2012 (has links) Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset. Data Mining Machine Learning Concept Drift Concept Shift Non-Stationary Environments Ensemble Learning Learning Committees Dynamic Committees
12	Machine learning methods for the estimation of weather and animal-related power outages on overhead distribution feeders Kankanala, Padmavathy January 1900 (has links) Doctor of Philosophy / Department of Electrical and Computer Engineering / Sanjoy Das and Anil Pahwa / Because a majority of day-to-day activities rely on electricity, it plays an important role in daily life. In this digital world, most of the people’s life depends on electricity. Without electricity, the flip of a switch would no longer produce instant light, television or refrigerators would be nonexistent, and hundreds of conveniences often taken for granted would be impossible. Electricity has become a basic necessity, and so any interruption in service due to disturbances in power lines causes a great inconvenience to customers. Customers and utility commissions expect a high level of reliability. Power distribution systems are geographically dispersed and exposure to environment makes them highly vulnerable part of power systems with respect to failures and interruption of service to customers. Following the restructuring and increased competition in the electric utility industry, distribution system reliability has acquired larger significance. Better understanding of causes and consequences of distribution interruptions is helpful in maintaining distribution systems, designing reliable systems, installing protection devices, and environmental issues. Various events, such as equipment failure, animal activity, tree fall, wind, and lightning, can negatively affect power distribution systems. Weather is one of the primary causes affecting distribution system reliability. Unfortunately, as weather-related outages are highly random, predicting their occurrence is an arduous task. To study the impact of weather on overhead distribution system several models, such as linear and exponential regression models, neural network model, and ensemble methods are presented in this dissertation. The models were extended to study the impact of animal activity on outages in overhead distribution system. Outage, lightning, and weather data for four different cities in Kansas of various sizes from 2005 to 2011 were provided by Westar Energy, Topeka, and state climate office at Kansas State University weather services. Models developed are applied to estimate daily outages. Performance tests shows that regression and neural network models are able to estimate outages well but failed to estimate well in lower and upper range of observed values. The introduction of committee machines inspired by the ‘divide & conquer” principle overcomes this problem. Simulation results shows that mixture of experts model is more effective followed by AdaBoost model in estimating daily outages. Similar results on performance of these models were found for animal-caused outages. Overhead distribution system Distribution reliability Weather & animal-related outages Ensemble learning methods Electrical Engineering (0544)
13	Identification of Flying Drones in Mobile Networks using Machine Learning / Identifiering av flygande drönare i mobila nätverk med hjälp av maskininlärning Alesand, Elias January 2019 (has links) Drone usage is increasing, both in recreational use and in the industry. With it comes a number of problems to tackle. Primarily, there are certain areas in which flying drones pose a security threat, e.g., around airports or other no-fly zones. Other problems can appear when there are drones in mobile networks which can cause interference. Such interference comes from the fact that radio transmissions emitted from drones can travel more freely than those from regular UEs (User Equipment) on the ground since there are few obstructions in the air. Additionally, the data traffic sent from drones is often high volume in the form of video streams. The goal of this thesis is to identify so-called "rogue drones" connected to an LTE network. Rogue drones are flying drones that appear to be regular UEs in the network. Drone identification is a binary classification problem where UEs in a network are classified as either a drone or a regular UE and this thesis proposes machine learning methods that can be used to solve it. Classifications are based on radio measurements and statistics reported by UEs in the network. The data for the work in this thesis is gathered through simulations of a heterogenous LTE network in an urban scenario. The primary idea of this thesis is to use a type of cascading classifier, meaning that classifications are made in a series of stages with increasingly complex models where only a subset of examples are passed forward to subsequent stages. The motivation for such a structure is to minimize the computational requirements at the entity making the classifications while still being complex enough to achieve high accuracy. The models explored in this thesis are two-stage cascading classifiers using decision trees and ensemble learning techniques. It is found that close to 60% of the UEs in the dataset can be classified without errors in the first of the two stages. The rest is forwarded to a more complex model which requires more data from the UEs and can achieve up to 98% accuracy. Drones Machine Learning LTE Mobile networks Radio Decision tree Ensemble learning Communication Systems Kommunikationssystem
14	A Graph Theoretic Clustering Algorithm based on the Regularity Lemma and Strategies to Exploit Clustering for Prediction Trivedi, Shubhendu 30 April 2012 (has links) The fact that clustering is perhaps the most used technique for exploratory data analysis is only a semaphore that underlines its fundamental importance. The general problem statement that broadly describes clustering as the identification and classification of patterns into coherent groups also implicitly indicates it's utility in other tasks such as supervised learning. In the past decade and a half there have been two developments that have altered the landscape of research in clustering: One is improved results by the increased use of graph theoretic techniques such as spectral clustering and the other is the study of clustering with respect to its relevance in semi-supervised learning i.e. using unlabeled data for improving prediction accuracies. In this work an attempt is made to make contributions to both these aspects. Thus our contributions are two-fold: First, we identify some general issues with the spectral clustering framework and while working towards a solution, we introduce a new algorithm which we call "Regularity Clustering" which makes an attempt to harness the power of the Szemeredi Regularity Lemma, a remarkable result from extremal graph theory for the task of clustering. Secondly, we investigate some practical and useful strategies for using clustering unlabeled data in boosting prediction accuracy. For all of these contributions we evaluate our methods against existing ones and also apply these ideas in a number of settings. Machine Learning Graph Mining Unsupervised Learning Ensemble Learning Semi-Supervised Learning Regularity Lemma Graph Partitioning
15	Apprentissage Ensembliste, Étude comparative et Améliorations via Sélection Dynamique / Ensemble Learning, Comparative Analysis and Further Improvements with Dynamic Ensemble Selection Narassiguin, Anil 04 May 2018 (has links) Les méthodes ensemblistes constituent un sujet de recherche très populaire au cours de la dernière décennie. Leur succès découle en grande partie de leurs solutions attrayantes pour résoudre différents problèmes d'apprentissage intéressants parmi lesquels l'amélioration de l'exactitude d'une prédiction, la sélection de variables, l'apprentissage de métrique, le passage à l'échelle d'algorithmes inductifs, l'apprentissage de multiples jeux de données physiques distribués, l'apprentissage de flux de données soumis à une dérive conceptuelle, etc... Dans cette thèse nous allons dans un premier temps présenter une comparaison empirique approfondie de 19 algorithmes ensemblistes d'apprentissage supervisé proposé dans la littérature sur différents jeux de données de référence. Non seulement nous allons comparer leurs performances selon des métriques standards de performances (Exactitude, AUC, RMS) mais également nous analyserons leur diagrammes kappa-erreur, la calibration et les propriétés biais-variance. Nous allons aborder ensuite la problématique d'amélioration des ensembles de modèles par la sélection dynamique d'ensembles (dynamic ensemble selection, DES). La sélection dynamique est un sous-domaine de l'apprentissage ensembliste où pour une donnée d'entrée x, le meilleur sous-ensemble en terme de taux de réussite est sélectionné dynamiquement. L'idée derrière les approches DES est que différents modèles ont différentes zones de compétence dans l'espace des instances. La plupart des méthodes proposées estime l'importance individuelle de chaque classifieur faible au sein d'une zone de compétence habituellement déterminée par les plus proches voisins dans un espace euclidien. Nous proposons et étudions dans cette thèse deux nouvelles approches DES. La première nommée ST-DES est conçue pour les ensembles de modèles à base d'arbres de décision. Cette méthode sélectionne via une métrique supervisée interne à l'arbre, idée motivée par le problème de la malédiction de la dimensionnalité : pour les jeux de données avec un grand nombre de variables, les métriques usuelles telle la distance euclidienne sont moins pertinentes. La seconde approche, PCC-DES, formule la problématique DES en une tâche d'apprentissage multi-label avec une fonction coût spécifique. Ici chaque label correspond à un classifieur et une base multi-label d'entraînement est constituée sur l'habilité de chaque classifieur de classer chaque instance du jeu de données d'origine. Cela nous permet d'exploiter des récentes avancées dans le domaine de l'apprentissage multi-label. PCC-DES peut être utilisé pour les approches ensemblistes homogènes et également hétérogènes. Son avantage est de prendre en compte explicitement les corrélations entre les prédictions des classifieurs. Ces algorithmes sont testés sur un éventail de jeux de données de référence et les résultats démontrent leur efficacité faces aux dernières alternatives de l'état de l'art / Ensemble methods has been a very popular research topic during the last decade. Their success arises largely from the fact that they offer an appealing solution to several interesting learning problems, such as improving prediction accuracy, feature selection, metric learning, scaling inductive algorithms to large databases, learning from multiple physically distributed data sets, learning from concept-drifting data streams etc. In this thesis, we first present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, that have been proposed in the literature, on various benchmark data sets. We not only compare their performance in terms of standard performance metrics (Accuracy, AUC, RMS) but we also analyze their kappa-error diagrams, calibration and bias-variance properties. We then address the problem of improving the performances of ensemble learning approaches with dynamic ensemble selection (DES). Dynamic pruning is the problem of finding given an input x, a subset of models among the ensemble that achieves the best possible prediction accuracy. The idea behind DES approaches is that different models have different areas of expertise in the instance space. Most methods proposed for this purpose estimate the individual relevance of the base classifiers within a local region of competence usually given by the nearest neighbours in the euclidean space. We propose and discuss two novel DES approaches. The first, called ST-DES, is designed for decision tree based ensemble models. This method prunes the trees using an internal supervised tree-based metric; it is motivated by the fact that in high dimensional data sets, usual metrics like euclidean distance suffer from the curse of dimensionality. The second approach, called PCC-DES, formulates the DES problem as a multi-label learning task with a specific loss function. Labels correspond to the base classifiers and multi-label training examples are formed based on the ability of each classifier to correctly classify each original training example. This allows us to take advantage of recent advances in the area of multi-label learning. PCC-DES works on homogeneous and heterogeneous ensembles as well. Its advantage is to explicitly capture the dependencies between the classifiers predictions. These algorithms are tested on a variety of benchmark data sets and the results demonstrate their effectiveness against competitive state-of-the-art alternatives Apprentissage ensembliste Sélection dynamique Multi-label Ensemble learning Dynamic ensemble selection Multi-label 004
16	Effective Linear-Time Feature Selection Pradhananga, Nripendra January 2007 (has links) The classification learning task requires selection of a subset of features to represent patterns to be classified. This is because the performance of the classifier and the cost of classification are sensitive to the choice of the features used to construct the classifier. Exhaustive search is impractical since it searches every possible combination of features. The runtime of heuristic and random searches are better but the problem still persists when dealing with high-dimensional datasets. We investigate a heuristic, forward, wrapper-based approach, called Linear Sequential Selection, which limits the search space at each iteration of the feature selection process. We introduce randomization in the search space. The algorithm is called Randomized Linear Sequential Selection. Our experiments demonstrate that both methods are faster, find smaller subsets and can even increase the classification accuracy. We also explore the idea of ensemble learning. We have proposed two ensemble creation methods, Feature Selection Ensemble and Random Feature Ensemble. Both methods apply a feature selection algorithm to create individual classifiers of the ensemble. Our experiments have shown that both methods work well with high-dimensional data. filter wrapper feature selection attribute selection ensemble learning machine learning Linear Feature Selection
17	Machine Learning Methods For Opponent Modeling In Games Of Imperfect Information Sirin, Volkan 01 September 2012 (has links) (PDF) This thesis presents a machine learning approach to the problem of opponent modeling in games of imperfect information. The efficiency of various artificial intelligence techniques are investigated in this domain. A sequential game is called imperfect information game if players do not have all the information about the current state of the game. A very popular example is the Texas Holdem Poker, which is used for realization of the suggested methods in this thesis. Opponent modeling is the system that enables a player to predict the behaviour of its opponent. In this study, opponent modeling problem is approached as a classification problem. An architecture with different classifiers for each phase of the game is suggested. Neural Networks, K-Nearest Neighbors (KNN) and Support Vector Machines are used as classifier. For modeling a particular player, KNN is found to be most successful amongst all, with a prediction accuracy of 88%. An ensemble learning system is proposed for modeling different playing styles and unknown ones. Computational complexity and parallelization of some calculations are also provided. QA General 15707
18	Dynamic Committees for Handling Concept Drift in Databases (DCCD) AlShammeri, Mohammed 07 November 2012 (has links) Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset. Data Mining Machine Learning Concept Drift Concept Shift Non-Stationary Environments Ensemble Learning Learning Committees Dynamic Committees
19	Automatic Image Annotation By Ensemble Of Visual Descriptors Akbas, Emre 01 August 2006 (has links) (PDF) Automatic image annotation is the process of automatically producing words to de- scribe the content for a given image. It provides us with a natural means of semantic indexing for content based image retrieval. In this thesis, two novel automatic image annotation systems targeting di&amp / #64256 / erent types of annotated data are proposed. The &amp / #64257 / rst system, called Supervised Ensemble of Visual Descriptors (SEVD), is trained on a set of annotated images with prede&amp / #64257 / ned class labels. Then, the system auto- matically annotates an unknown sample depending on the classi&amp / #64257 / cation results. The second system, called Unsupervised Ensemble of Visual Descriptors (UEVD), assumes no class labels. Therefore, the annotation of an unknown sample is accomplished by unsupervised learning based on the visual similarity of images. The available auto- matic annotation systems in the literature mostly use a single set of features to train a single learning architecture. On the other hand, the proposed annotation systems utilize a novel model of image representation in which an image is represented with a variety of feature sets, spanning an almost complete visual information comprising color, shape, and texture characteristics. In both systems, a separate learning entity is trained for each feature set and these entities are gathered under an ensemble learning approach. Empirical results show that both SEVD and UEVD outperform some of the state-of-the-art automatic image annotation systems in equivalent experimental setups. QA General 15707
20	Toward The Frontiers Of Stacked Generalization Architecture For Learning Mertayak, Cuneyt 01 September 2007 (has links) (PDF) In pattern recognition, &ldquo / bias-variance&rdquo / trade-off is a challenging issue that the scientists has been working to get better generalization performances over the last decades. Among many learning methods, two-layered homogeneous stacked generalization has been reported to be successful in the literature, in different problem domains such as object recognition and image annotation. The aim of this work is two-folded. First, the problems of stacked generalization are attacked by a proposed novel architecture. Then, a set of success criteria for stacked generalization is studied. A serious drawback of stacked generalization architecture is the sensitivity to curse of dimensionality problem. In order to solve this problem, a new architecture named &ldquo / unanimous decision&rdquo / is designed. The performance of this architecture is shown to be comparably similar to two layered homogeneous stacked generalization architecture in low number of classes while it performs better than stacked generalization architecture in higher number of classes. Additionally, a new success criterion for two layered homogeneous stacked generalization architecture is proposed based on the individual properties of the used descriptors and it is verified in synthetic datasets. QA Computer Software 76.75-76.765

Search results