Global ETD Search

1	A Comparative Study of Ensemble Active Learning Alabdulrahman, Rabaa January 2014 (has links) Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. Data Streams Ensemble Learning Active Learning Active Ensemble Learning
2	Distributed boosting algorithms Thompson, Simon Giles January 1999 (has links) No description available. 005 Machine learning; Ensemble learning
3	A Novel Ensemble Machine Learning for Robust Microarray Data Classification. Peng, Yonghong January 2006 (has links) No / Microarray data analysis and classification has demonstrated convincingly that it provides an effective methodology for the effective diagnosis of diseases and cancers. Although much research has been performed on applying machine learning techniques for microarray data classification during the past years, it has been shown that conventional machine learning techniques have intrinsic drawbacks in achieving accurate and robust classifications. This paper presents a novel ensemble machine learning approach for the development of robust microarray data classification. Different from the conventional ensemble learning techniques, the approach presented begins with generating a pool of candidate base classifiers based on the gene sub-sampling and then the selection of a sub-set of appropriate base classifiers to construct the classification committee based on classifier clustering. Experimental results have demonstrated that the classifiers constructed by the proposed method outperforms not only the classifiers generated by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods (bagging and boosting). Microarray data Machine learning Ensemble learning Classification
4	Weakly Selective Training induces Specialization within Populations of Sensory Neurons Hillmann, Julia 11 January 2016 (has links) No description available. 570 Ensemble Learning Neural Networks Neuronal Specialization Biologie (PPN619462639)
5	Inferring Gene Regulatory Networks from Expression Data using Ensemble Methods Slawek, Janusz 01 May 2014 (has links) High-throughput technologies for measuring gene expression made inferring of the genome-wide Gene Regulatory Networks an active field of research. Reverse-engineering of systems of transcriptional regulations became an important challenge in molecular and computational biology. Because such systems model dependencies between genes, they are important in understanding of cell behavior, and can potentially turn observed expression data into the new biological knowledge and practical applications. In this dissertation we introduce a set of algorithms, which infer networks of transcriptional regulations from variety of expression profiles with superior accuracy compared to the state-of-the-art techniques. The proposed methods make use of ensembles of trees, which became popular in many scientific fields, including genetics and bioinformatics. However, originally they were motivated from the perspective of classification, regression, and feature selection theory. In this study we exploit their relative variable importance measure as an indication of the presence or absence of a regulatory interaction between genes. We further analyze their predictions on a set of the universally recognized benchmark expression data sets, and achieve favorable results in compare with the state-of-the-art algorithms. Bioinformatics Gene Regulatory Networks Network Inference Ensemble Learning Boosting Engineering
6	Penalised regression for high-dimensional data : an empirical investigation and improvements via ensemble learning Wang, Fan January 2019 (has links) In a wide range of applications, datasets are generated for which the number of variables p exceeds the sample size n. Penalised likelihood methods are widely used to tackle regression problems in these high-dimensional settings. In this thesis, we carry out an extensive empirical comparison of the performance of popular penalised regression methods in high-dimensional settings and propose new methodology that uses ensemble learning to enhance the performance of these methods. The relative efficacy of different penalised regression methods in finite-sample settings remains incompletely understood. Through a large-scale simulation study, consisting of more than 1,800 data-generating scenarios, we systematically consider the influence of various factors (for example, sample size and sparsity) on method performance. We focus on three related goals --- prediction, variable selection and variable ranking --- and consider six widely used methods. The results are supported by a semi-synthetic data example. Our empirical results complement existing theory and provide a resource to compare performance across a range of settings and metrics. We then propose a new ensemble learning approach for improving the performance of penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the ``base learner''. In simulations, we show that STRANDS typically improves upon its base learner, and demonstrate that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored. We propose another ensemble learning method to improve the prediction performance of Ridge Regression in sparse settings. Specifically, we combine Bayesian Ridge Regression with a probabilistic forward selection procedure, where inclusion of a variable at each stage is probabilistically determined by a Bayes factor. We compare the prediction performance of the proposed method to penalised regression methods using simulated data.
7	A probabilistic perspective on ensemble diversity Zanda, Manuela January 2010 (has links) We study diversity in classifier ensembles from a broader perspectivethan the 0/1 loss function, the main reason being that the bias-variance decomposition of the 0/1 loss function is not unique, and therefore the relationship between ensemble accuracy and diversity is still unclear. In the parallel field of regression ensembles, where the loss function of interest is the mean squared error, this decomposition not only exists, but it has been shown that diversity can be managed via the Negative Correlation (NC) framework. In the field of probabilistic modelling the expected value of the negative log-likelihood loss function is given by its conditional entropy; this result suggests that interaction information might provide some insight into the trade off between accuracy and diversity. Our objective is to improve our understanding of classifier diversity by focusing on two different loss functions - the mean squared error and the negative log-likelihood. In a study of mean squared error functions, we reformulate the Tumer & Ghosh model for the classification error as a regression problem, and we show how the NC learning framework can be deployed to manage diversity in classification problems. In an empirical study of classifiers that minimise the negative log-likelihood loss function, we discuss model diversity as opposed to error diversity in ensembles of Naive Bayes classifiers. We observe that diversity in low-variance classifiers has to be structurally inferred. We apply interaction information to the problem of monitoring diversity in classifier ensembles. We present empirical evidence that interaction information can capture the trade-off between accuracy and diversity, and that diversity occurs at different levels of interactions between base classifiers. We use interaction information properties to build ensembles of structurally diverse averaged Augmented Naive Bayes classifiers. Our empirical study shows that this novel ensemble approach is computationally more efficient than an accuracy based approach and at the same time it does not negatively affect the ensemble classification performance. 006.3
8	Dynamic Committees for Handling Concept Drift in Databases (DCCD) AlShammeri, Mohammed 07 November 2012 (has links) Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset. Data Mining Machine Learning Concept Drift Concept Shift Non-Stationary Environments Ensemble Learning Learning Committees Dynamic Committees
9	Machine learning methods for the estimation of weather and animal-related power outages on overhead distribution feeders Kankanala, Padmavathy January 1900 (has links) Doctor of Philosophy / Department of Electrical and Computer Engineering / Sanjoy Das and Anil Pahwa / Because a majority of day-to-day activities rely on electricity, it plays an important role in daily life. In this digital world, most of the people’s life depends on electricity. Without electricity, the flip of a switch would no longer produce instant light, television or refrigerators would be nonexistent, and hundreds of conveniences often taken for granted would be impossible. Electricity has become a basic necessity, and so any interruption in service due to disturbances in power lines causes a great inconvenience to customers. Customers and utility commissions expect a high level of reliability. Power distribution systems are geographically dispersed and exposure to environment makes them highly vulnerable part of power systems with respect to failures and interruption of service to customers. Following the restructuring and increased competition in the electric utility industry, distribution system reliability has acquired larger significance. Better understanding of causes and consequences of distribution interruptions is helpful in maintaining distribution systems, designing reliable systems, installing protection devices, and environmental issues. Various events, such as equipment failure, animal activity, tree fall, wind, and lightning, can negatively affect power distribution systems. Weather is one of the primary causes affecting distribution system reliability. Unfortunately, as weather-related outages are highly random, predicting their occurrence is an arduous task. To study the impact of weather on overhead distribution system several models, such as linear and exponential regression models, neural network model, and ensemble methods are presented in this dissertation. The models were extended to study the impact of animal activity on outages in overhead distribution system. Outage, lightning, and weather data for four different cities in Kansas of various sizes from 2005 to 2011 were provided by Westar Energy, Topeka, and state climate office at Kansas State University weather services. Models developed are applied to estimate daily outages. Performance tests shows that regression and neural network models are able to estimate outages well but failed to estimate well in lower and upper range of observed values. The introduction of committee machines inspired by the ‘divide & conquer” principle overcomes this problem. Simulation results shows that mixture of experts model is more effective followed by AdaBoost model in estimating daily outages. Similar results on performance of these models were found for animal-caused outages. Overhead distribution system Distribution reliability Weather & animal-related outages Ensemble learning methods Electrical Engineering (0544)
10	Identification of Flying Drones in Mobile Networks using Machine Learning / Identifiering av flygande drönare i mobila nätverk med hjälp av maskininlärning Alesand, Elias January 2019 (has links) Drone usage is increasing, both in recreational use and in the industry. With it comes a number of problems to tackle. Primarily, there are certain areas in which flying drones pose a security threat, e.g., around airports or other no-fly zones. Other problems can appear when there are drones in mobile networks which can cause interference. Such interference comes from the fact that radio transmissions emitted from drones can travel more freely than those from regular UEs (User Equipment) on the ground since there are few obstructions in the air. Additionally, the data traffic sent from drones is often high volume in the form of video streams. The goal of this thesis is to identify so-called "rogue drones" connected to an LTE network. Rogue drones are flying drones that appear to be regular UEs in the network. Drone identification is a binary classification problem where UEs in a network are classified as either a drone or a regular UE and this thesis proposes machine learning methods that can be used to solve it. Classifications are based on radio measurements and statistics reported by UEs in the network. The data for the work in this thesis is gathered through simulations of a heterogenous LTE network in an urban scenario. The primary idea of this thesis is to use a type of cascading classifier, meaning that classifications are made in a series of stages with increasingly complex models where only a subset of examples are passed forward to subsequent stages. The motivation for such a structure is to minimize the computational requirements at the entity making the classifications while still being complex enough to achieve high accuracy. The models explored in this thesis are two-stage cascading classifiers using decision trees and ensemble learning techniques. It is found that close to 60% of the UEs in the dataset can be classified without errors in the first of the two stages. The rest is forwarded to a more complex model which requires more data from the UEs and can achieve up to 98% accuracy. Drones Machine Learning LTE Mobile networks Radio Decision tree Ensemble learning Communication Systems Kommunikationssystem

Search results