Global ETD Search

111	SVM-Based Negative Data Mining to Binary Classification Jiang, Fuhua 03 August 2006 (has links) The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result or learner agrees with audit's judgment, otherwise returns the booster's result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm. Data partition Data classification Vector similarity Multiple passes learning Machine learning Bagging Boosting Support vector machines Data preparation Computer Sciences
112	Ensembles of Artificial Neural Networks: Analysis and Development of Design Methods Torres Sospedra, Joaquín 30 September 2011 (has links) This thesis is focused on the analysis and development of Ensembles of Neural Networks. An ensemble is a system in which a set of heterogeneous Artificial Neural Networks are generated in order to outperform the Single network based classifiers. However, this proposed thesis differs from others related to ensembles of neural networks [1, 2, 3, 4, 5, 6, 7] since it is organized as follows. In this thesis, firstly, an ensemble methods comparison has been introduced in order to provide a rank-based list of the best ensemble methods existing in the bibliography. This comparison has been split into two researches which represents two chapters of the thesis. Moreover, there is another important step related to the ensembles of neural networks which is how to combine the information provided by the neural networks in the ensemble. In the bibliography, there are some alternatives to apply in order to get an accurate combination of the information provided by the heterogeneous set of networks. For this reason, a combiner comparison has also been introduced in this thesis. Furthermore, Ensembles of Neural Networks is only a kind of Multiple Classifier System based on neural networks. However, there are other alternatives to generate MCS based on neural networks which are quite different to Ensembles. The most important systems are Stacked Generalization and Mixture of Experts. These two systems will be also analysed in this thesis and new alternatives are proposed. One of the results of the comparative research developed is a deep understanding of the field of ensembles. So new ensemble methods and combiners can be designed after analyzing the results provided by the research performed. Concretely, two new ensemble methods, a new ensemble methodology called Cross-Validated Boosting and two reordering algorithms are proposed in this thesis. The best overall results are obtained by the ensemble methods proposed. Finally, all the experiments done have been carried out on a common experimental setup. The experiments have been repeated ten times on nineteen different datasets from the UCI repository in order to validate the results. Moreover, the procedure applied to set up specific parameters is quite similar in all the experiments performed. It is important to conclude by remarking that the main contributions are: 1) An experimental setup to prepare the experiments which can be applied for further comparisons. 2) A guide to select the most appropriate methods to build and combine ensembles and multiple classifiers systems. 3) New methods proposed to build ensembles and other multiple classifier systems. Ensemble Neural Networks Multilayer Feedforward Mixture Stacked Combination Multiple Classifier Systems Cross-Validation Boosting Reordering 004
113	A Pattern Classification Approach Boosted With Genetic Algorithms Yalabik, Ismet 01 June 2007 (has links) (PDF) Ensemble learning is a multiple-classi&amp / #64257 / er machine learning approach which combines, produces collections and ensembles statistical classi&amp / #64257 / ers to build up more accurate classi&amp / #64257 / er than the individual classi&amp / #64257 / ers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this thesis, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed systems &amp / #64257 / nd an elegant way of boosting a bunch of classi&amp / #64257 / ers successively to form a better classi&amp / #64257 / er than each ensembled classi&amp / #64257 / er. AdaBoost algorithm employs a greedy search over hypothesis space to &amp / #64257 / nd a good suboptimal solution. On the other hand, this work proposes an evolutionary search with genetic algorithms instead of greedy search. Empirical results show that classi&amp / #64257 / cation with boosted evolutionary computing outperforms AdaBoost in equivalent experimental environments. #64257 cation, Pattern Recognition
114	On discriminative semi-supervised incremental learning with a multi-view perspective for image concept modeling Byun, Byungki 17 January 2012 (has links) This dissertation presents the development of a semi-supervised incremental learning framework with a multi-view perspective for image concept modeling. For reliable image concept characterization, having a large number of labeled images is crucial. However, the size of the training set is often limited due to the cost required for generating concept labels associated with objects in a large quantity of images. To address this issue, in this research, we propose to incrementally incorporate unlabeled samples into a learning process to enhance concept models originally learned with a small number of labeled samples. To tackle the sub-optimality problem of conventional techniques, the proposed incremental learning framework selects unlabeled samples based on an expected error reduction function that measures contributions of the unlabeled samples based on their ability to increase the modeling accuracy. To improve the convergence property of the proposed incremental learning framework, we further propose a multi-view learning approach that makes use of multiple features such as color, texture, etc., of images when including unlabeled samples. For robustness to mismatches between training and testing conditions, a discriminative learning algorithm, namely a kernelized maximal- figure-of-merit (kMFoM) learning approach is also developed. Combining individual techniques, we conduct a set of experiments on various image concept modeling problems, such as handwritten digit recognition, object recognition, and image spam detection to highlight the effectiveness of the proposed framework. Discriminative learning Semi-supervised learning Incremental learning Image modeling Multi-view learning Machine learning Supervised learning (Machine learning) Boosting (Algorithms)
115	Semi-Supervised Learning for Object Detection Rosell, Mikael January 2015 (has links) Many automotive safety applications in modern cars make use of cameras and object detection to analyze the surrounding environment. Pedestrians, animals and other vehicles can be detected and safety actions can be taken before dangerous situations arise. To detect occurrences of the different objects, these systems are traditionally trained to learn a classification model using a set of images that carry labels corresponding to their content. To obtain high performance with a variety of object appearances, the required amount of data is very large. Acquiring unlabeled images is easy, while the manual work of labeling is both time-consuming and costly. Semi-supervised learning refers to methods that utilize both labeled and unlabeled data, a situation that is highly desirable if it can lead to improved accuracy and at the same time alleviate the demand of labeled data. This has been an active area of research in the last few decades, but few studies have investigated the performance of these algorithms in larger systems. In this thesis, we investigate if and how semi-supervised learning can be used in a large-scale pedestrian detection system. With the area of application being automotive safety, where real-time performance is of high importance, the work is focused around boosting classifiers. Results are presented on a few publicly available UCI data sets and on a large data set for pedestrian detection captured in real-life traffic situations. By evaluating the algorithms on the pedestrian data set, we add the complexity of data set size, a large variety of object appearances and high input dimension. It is possible to find situations in low dimensions where an additional set of unlabeled data can be used successfully to improve a classification model, but the results show that it is hard to efficiently utilize semi-supervised learning in large-scale object detection systems. The results are hard to scale to large data sets of higher dimensions as pair-wise computations are of high complexity and proper similarity measures are hard to find. semi-supervised learning object detection pedestrian detection boosting machine learning supervised learning adaboost semiboost regboost self-learning
116	Polisstudent i akademiska skriftspråksvärldar. : En studie av polisstudenters kritiska förhållningssätt i deras självständiga arbeten. / Police student in the world of academic writing - : a study of police students' critical approach in their degree projects Pappinen Hillert, Anna January 2014 (has links) This master’s essay presents a study designed to investigate twelve police students’ability to handle academic writing conventions and to show critical-analyticalcompetence in their degree projects, which are written as part of the education atSweden’s three police academies. The education of police officers is not an academicone, and the students’ main focus is therefore on writing texts in the field of policediscourse. At the same time, scientific principles and critical thinking are emphasized intheir syllabus, but the question of how students handle the encounter with academicdiscourse has so far not been investigated. This, therefore, is the aim of the presentstudy. The texts have been analyzed according to the academic writing conventions ofreferencing, citation, hedging and boosting. By studying how these conventions areapplied, the writers’ stances become visible, which makes it possible to discern in whatways they remain critical-analytical to person, theory and content. The study shows thatpolice students can handle academic writing conventions to varying degrees, whichmeans that their texts differ in how well they function in an academic context. The factthat they do not fully master these conventions also makes it more difficult for thestudents to signal a critical stance, and only occasionally does their critical-analyticalcompetence show. The difficulties displayed by the police students can thus be viewedas general within the field of academic discourse. At the same time, the study showsthat the students do acquire a certain competence in textual as well as writingconventions, which they are able to transfer from police discourse. This, in turn,constitutes the didactic consequence of the study. If this were to be brought up as part ofthe educational program, the writing of a degree project might raise police students’awareness of different genres and further develop the competence necessary in theirprofession. Police students academic writing critical-analytical competence citation referencing hedging boosting stance Specific Languages Studier av enskilda språk
117	Regularizace a výběr proměnných v regresních modelech / Regularization and variable selection in regression models Lahodová, Kateřina January 2017 (has links) This diploma thesis focuses on regularization and variable selection in regres- sion models. Basics of penalised likelihood, generalized linear models and their evaluation and comparison based on prediction quality and variable selection are described. Methods called LASSO and LARS for variable selection in normal linear regression are briefly introduced. The main topic of this thesis is method called Boosting. General Boosting algorithm is introduced including functional gradient descent, followed by selection of base procedure, especially the componentwise linear least squares method. Two specific application of general Boosting algorithm are introduced with derivation of some important characteristics. These methods are AdaBoost for data with conditional binomial distribution and L2Boosting for condi- tional normal distribution. As a final point a simulation study comparing LASSO, LARS and L2Boosting methods was conducted. It is shown that methods LASSO and LARS are more suitable for variable selection whereas L2Boosting is more fitting for new data prediction.
118	Extracting Rules from Trained Machine Learning Models with Applications in Bioinformatics / 機械学習モデルからの知識抽出と生命情報学への応用 Liu, Pengyu 24 May 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23397号 / 情博第766号 / 新制\|\|情\|\|131(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授阿久津達也, 教授山本章博, 教授鹿島久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Machine learning Neural networks Boolean functions Rule extraction Dynamic programming Dicer cleavage site Gradient boosting machine 007
119	Meta-learning / Meta-learning Hovorka, Martin January 2008 (has links) Goal of this work is to make acquaintance and study meta-learningu methods, program algorithm and compare with other machine learning methods.
120	Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods Al-Mter, Yusur January 2020 (has links) Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs. Supervised learning Classification Ensemble Support Vector Machines Heart Rate Variability Extreme Gradient Boosting Random Forest Computer and Information Sciences Data- och informationsvetenskap

Search results