Global ETD Search

281	A Boosted-Window Ensemble Elahi, Haroon January 2014 (has links) Context. The problem of obtaining predictions from stream data involves training on the labeled instances and suggesting the class values for the unseen stream instances. The nature of the data-stream environments makes this task complicated. The large number of instances, the possibility of changes in the data distribution, presence of noise and drifting concepts are just some of the factors that add complexity to the problem. Various supervised-learning algorithms have been designed by putting together efficient data-sampling, ensemble-learning, and incremental-learning methods. The performance of the algorithm is dependent on the chosen methods. This leaves an opportunity to design new supervised-learning algorithms by using different combinations of constructing methods. Objectives. This thesis work proposes a fast and accurate supervised-learning algorithm for performing predictions on the data-streams. This algorithm is called as Boosted-Window Ensemble (BWE), which is invented using the mixture-of-experts technique. BWE uses Sliding Window, Online Boosting and incremental-learning for data-sampling, ensemble-learning, and maintaining a consistent state with the current stream data, respectively. In this regard, a sliding window method is introduced. This method uses partial-updates for sliding the window on the data-stream and is called Partially-Updating Sliding Window (PUSW). The investigation is carried out to compare two variants of sliding window and three different ensemble-learning methods for choosing the superior methods. Methods. The thesis uses experimentation approach for evaluating the Boosted-Window Ensemble (BWE). CPU-time and the Prediction accuracy are used as performance indicators, where CPU-time is the execution time in seconds. The benchmark algorithms include: Accuracy-Updated Ensemble1 (AUE1), Accuracy-Updated Ensemble2 (AUE2), and Accuracy-Weighted Ensemble (AWE). The experiments use nine synthetic and five real-world datasets for generating performance estimates. The Asymptotic Friedman test and the Wilcoxon Signed-Rank test are used for hypothesis testing. The Wilcoxon-Nemenyi-McDonald-Thompson test is used for performing post-hoc analysis. Results. The hypothesis testing suggests that: 1) both for the synthetic and real-wrold datasets, the Boosted Window Ensemble (BWE) has significantly lower CPU-time values than two benchmark algorithms (Accuracy-updated Ensemble1 (AUE1) and Accuracy-weighted Ensemble (AWE). 2) BWE returns similar prediction accuracy as AUE1 and AWE for synthetic datasets. 3) BWE returns similar prediction accuracy as the three benchmark algorithms for the real-world datasets. Conclusions. Experimental results demonstrate that the proposed algorithm can be as accurate as the state-of-the-art benchmark algorithms, while obtaining predictions from the stream data. The results further show that the use of Partially-Updating Sliding Window has resulted in lower CPU-time for BWE as compared with the chunk-based sliding window method used in AUE1, AUE2, and AWE. Stream Mining Supervised-learning by classification Online learning algorithms Ensemble Methods Boosting Computer Sciences Datavetenskap (datalogi)
282	Measure-based Learning Algorithms : An Analysis of Back-propagated Neural Networks Khalid, Fahad January 2008 (has links) In this thesis we present a theoretical investigation of the feasibility of using a problem specific inductive bias for back-propagated neural networks. We argue that if a learning algorithm is biased towards optimizing a certain performance measure, it is plausible to assume that it will generate a higher performance score when evaluated using that particular measure. We use the term measure function for a multi-criteria evaluation function that can also be used as an inherent function in learning algorithms, in order to customize the bias of a learning algorithm for a specific problem. Hence, the term measure-based learning algorithms. We discuss different characteristics of the most commonly used performance measures and establish similarities among them. The characteristics of individual measures and the established similarities are then correlated to the characteristics of the backpropagation algorithm, in order to explore the applicability of introducing a measure function to backpropagated neural networks. Our study shows that there are certain characteristics of the error back-propagation mechanism and the inherent gradient search method that limit the set of measures that can be used for the measure function. Also, we highlight the significance of taking the representational bias of the neural network into account when developing methods for measure-based learning. The overall analysis of the research shows that measure-based learning is a promising area of research with potential for further exploration. We suggest directions for future research that might help realize measure-based neural networks. / The study is an investigation on the feasibility of using a generic inductive bias for backpropagation artificial neural networks, which could incorporate any one or a combination of problem specific performance metrics to be optimized. We have identified several limitations of both the standard error backpropagation mechanism as well the inherent gradient search approach. These limitations suggest exploration of methods other than backpropagation, as well use of global search methods instead of gradient search. Also, we emphasize the importance of taking the representational bias of the neural network in consideration, since only a combination of both procedural and representational bias can provide highly optimal solutions. Supervised learning Inductive Bias Artificial Neural Networks Computer Sciences Datavetenskap (datalogi)
283	Performance evaluation based on data from code reviews Andrej, Sekáč January 2016 (has links) Context. Modern code review tools such as Gerrit have made available great amounts of code review data from different open source projects as well as other commercial projects. Code reviews are used to keep the quality of produced source code under control but the stored data could also be used for evaluation of the software development process. Objectives. This thesis uses machine learning methods for an approximation of review expert’s performance evaluation function. Due to limitations in the size of labelled data sample, this work uses semisupervised machine learning methods and measure their influence on the performance. In this research we propose features and also analyse their relevance to development performance evaluation. Methods. This thesis uses Radial Basis Function networks as the regression algorithm for the performance evaluation approximation and Metric Based Regularisation as the semi-supervised learning method. For the analysis of feature set and goodness of fit we use statistical tools with manual analysis. Results. The semi-supervised learning method achieved a similar accuracy to supervised versions of algorithm. The feature analysis showed that there is a significant negative correlation between the performance evaluation and three other features. A manual verification of learned models on unlabelled data achieved 73.68% accuracy. Conclusions. We have not managed to prove that the used semisupervised learning method would perform better than supervised learning methods. The analysis of the feature set suggests that the number of reviewers, the ratio of comments to the change size and the amount of code lines modified in later parts of development are relevant to performance evaluation task with high probability. The achieved accuracy of models close to 75% leads us to believe that, considering the limited size of labelled data set, our work provides a solid base for further improvements in the performance evaluation approximation. software verification semi-supervised learning regression analysis development performance evaluation Computer Sciences Datavetenskap (datalogi)
284	Apprentissage de données génomiques multiples pour le diagnostic et le pronostic du cancer / Learning from multiple genomic information in cancer for diagnosis and prognosis Moarii, Matahi 26 June 2015 (has links) De nombreuses initiatives ont été mises en places pour caractériser d'un point de vue moléculaire de grandes cohortes de cancers à partir de diverses sources biologiques dans l'espoir de comprendre les altérations majeures impliquées durant la tumorogénèse. Les données mesurées incluent l'expression des gènes, les mutations et variations de copy-number, ainsi que des signaux épigénétiques tel que la méthylation de l'ADN. De grands consortium tels que “The Cancer Genome Atlas” (TCGA) ont déjà permis de rassembler plusieurs milliers d'échantillons cancéreux mis à la disposition du public. Nous contribuons dans cette thèse à analyser d'un point de vue mathématique les relations existant entre les différentes sources biologiques, valider et/ou généraliser des phénomènes biologiques à grande échelle par une analyse intégrative de données épigénétiques et génétiques.En effet, nous avons montré dans un premier temps que la méthylation de l'ADN était un marqueur substitutif intéressant pour jauger du caractère clonal entre deux cellules et permettait ainsi de mettre en place un outil clinique des récurrences de cancer du sein plus précis et plus stable que les outils actuels, afin de permettre une meilleure prise en charge des patients.D'autre part, nous avons dans un second temps permis de quantifier d'un point de vue statistique l'impact de la méthylation sur la transcription. Nous montrons l'importance d'incorporer des hypothèses biologiques afin de pallier au faible nombre d'échantillons par rapport aux nombre de variables.Enfin, nous montrons l'existence d'un phénomène biologique lié à l'apparition d'un phénotype d'hyperméthylation dans plusieurs cancers. Pour cela, nous adaptons des méthodes de régression en utilisant la similarité entre les différentes tâches de prédictions afin d'obtenir des signatures génétiques communes prédictives du phénotypes plus précises.En conclusion, nous montrons l'importance d'une collaboration biologique et statistique afin d'établir des méthodes adaptées aux problématiques actuelles en bioinformatique. / Several initiatives have been launched recently to investigate the molecular characterisation of large cohorts of human cancers with various high-throughput technologies in order to understanding the major biological alterations related to tumorogenesis. The information measured include gene expression, mutations, copy-number variations, as well as epigenetic signals such as DNA methylation. Large consortiums such as “The Cancer Genome Atlas” (TCGA) have already gathered publicly thousands of cancerous and non-cancerous samples. We contribute in this thesis in the statistical analysis of the relationship between the different biological sources, the validation and/or large scale generalisation of biological phenomenon using an integrative analysis of genetic and epigenetic data.Firstly, we show the role of DNA methylation as a surrogate biomarker of clonality between cells which would allow for a powerful clinical tool for to elaborate appropriate treatments for specific patients with breast cancer relapses.In addition, we developed systematic statistical analyses to assess the significance of DNA methylation variations on gene expression regulation. We highlight the importance of adding prior knowledge to tackle the small number of samples in comparison with the number of variables. In return, we show the potential of bioinformatics to infer new interesting biological hypotheses.Finally, we tackle the existence of the universal biological phenomenon related to the hypermethylator phenotype. Here, we adapt regression techniques using the similarity between the different prediction tasks to obtain robust genetic predictive signatures common to all cancers and that allow for a better prediction accuracy.In conclusion, we highlight the importance of a biological and computational collaboration in order to establish appropriate methods to the current issues in bioinformatics that will in turn provide new biological insights. Apprentissage supervisé Apprentissage non-Supervisé Données à grande dimension Supervised Analysis Unsupervised Analysis High-Dimensional Data 610.28
285	Automated Essay Scoring : Scoring Essays in Swedish Smolentzov, Andre January 2013 (has links) Good writing skills are essential in the education system at all levels. However, the evaluation of essays is labor intensive and can entail a subjective bias. Automated Essay Scoring (AES) is a tool that may be able to save teacher time and provide more objective evaluations. There are several successful AES systems for essays in English that are used in large scale tests. Supervised machine learning algorithms are the core component in developing these systems. In this project four AES systems were developed and evaluated. The AES systems were based on standard supervised machine learning software, i.e., LDAC, SVM with RBF kernel, polynomial kernel and Extremely Randomized Trees. The training data consisted of 1500 high school essays that had been scored by the students' teachers and blind raters. To evaluate the AES systems, the agreement between blind raters' scores and AES scores was compared to agreement between blind raters' and teacher scores. On average, the agreement between blind raters and the AES systems was better than between blind raters and teachers. The AES based on LDAC software had the best agreement with a quadratic weighted kappa value of 0.475. In comparison, the teachers and blind raters had a value of 0.391. However the AES results do not meet the required minimum agreement of a quadratic weighted kappa of 0.7 as defined by the US based nonprofit organization Educational Testing Services. / Jag har utvecklat och utvärderat fyra system för automatisk betygsättning av uppsatser (AES). LDAC, SVM med RBF kernel, SVM med Polynomial kernel och "Extremely Randomized trees" som är standard klassificerarprogramvaror har använts som grunden för att bygga respektivt AES system. Automated Essay Scoring Swedish Essays supervised machine learning General Language Studies and Linguistics
286	Evaluering och optimering av automatisk beståndsindelning Brehmer, Dan January 2016 (has links) Beståndsindelning av skog är till stor den en manuell process som kräver mycket tid. De senaste 20 åren har tekniker som Airborne Laser Scanning (ALS) bidragit till en effektivisering av processen genom att generera laserdata som möjliggör skapandet av lättolkade bilder av skogsområden. Ur laser- och bilddata kan skogliga attribut så som trädhöjd, trädtäthet och markhöjd extraheras. Studiens syfte var att utvärdera vilka attribut som var mest relevanta för att särskilja skogsbestånd i ett system som delade in skog i bestånd automatiskt. Vid analys av attributens relevans användes klassificeringsmodeller. Fackmän intervjuades och litteratur studerades. Under studien modifierades systemets algoritmer med ambitionen att höja dess resultat till en tillfredsställande nivå. Studien visade att attribut som är kopplade till skogssköstel har störst relevans vid automatisk beståndsindelning. Trots modifieringar och använding av relevanta attribut lyckades studien inte påvisa att systemet kunde fungera som en egen lösning för beståndsindelning av skog. Däremot var den resulterande beståndsindelningen lämplig att använda som ett komplement vid manuell beståndsindelning. Beståndsindelning Airborne Laser Scanning ALS Supervised Learning Random Forest Computer Sciences Datavetenskap (datalogi)
287	Context-based Human Activity Recognition Using Multimodal Wearable Sensors Bharti, Pratool 17 November 2017 (has links) In the past decade, Human Activity Recognition (HAR) has been an important part of the regular day to day life of many people. Activity recognition has wide applications in the field of health care, remote monitoring of elders, sports, biometric authentication, e-commerce and more. Each HAR application needs a unique approach to provide solutions driven by the context of the problem. In this dissertation, we are primarily discussing two application of HAR in different contexts. First, we design a novel approach for in-home, fine-grained activity recognition using multimodal wearable sensors on multiple body positions, along with very small Bluetooth beacons deployed in the environment. State-of-the-art in-home activity recognition schemes with wearable devices are mostly capable of detecting coarse-grained activities (sitting, standing, walking, or lying down), but cannot distinguish complex activities (sitting on the floor versus on the sofa or bed). Such schemes are not effective for emerging critical healthcare applications – for example, in remote monitoring of patients with Alzheimer's disease, Bulimia, or Anorexia – because they require a more comprehensive, contextual, and fine-grained recognition of complex daily user activities. Second, we introduced Watch-Dog – a self-harm activity recognition engine, which attempts to infer self-harming activities from sensing accelerometer data using wearable sensors worn on a subject's wrist. In the United States, there are more than 35,000 reported suicides with approximately 1,800 of them being psychiatric inpatients every year. Staff perform intermittent or continuous observations in order to prevent such tragedies, but a study of 98 articles over time showed that 20% to 62% of suicides happened while inpatients were on an observation schedule. Reducing the instances of suicides of inpatients is a problem of critical importance to both patients and healthcare providers. Watch-dog uses supervised learning algorithm to model the system which can discriminate the harmful activities from non-harmful activities. The system is not only very accurate but also energy efficient. Apart from these two HAR systems, we also demonstrated the difference in activity pattern between elder and younger age group. For this experiment, we used 5 activities of daily living (ADL). Based on our findings we recommend that a context aware age-specific HAR model would be a better solution than all age-mixed models. Additionally, we find that personalized models for each individual elder person perform better classification than mixed models. Wearables in Healthcare Wearable Sensing Supervised Machine Learning Artificial Intelligence and Robotics Medicine and Health Sciences
288	Active Cleaning of Label Noise Using Support Vector Machines Ekambaram, Rajmadhan 19 June 2017 (has links) Large scale datasets collected using non-expert labelers are prone to labeling errors. Errors in the given labels or label noise affect the classifier performance, classifier complexity, class proportions, etc. It may be that a relatively small, but important class needs to have all its examples identified. Typical solutions to the label noise problem involve creating classifiers that are robust or tolerant to errors in the labels, or removing the suspected examples using machine learning algorithms. Finding the label noise examples through a manual review process is largely unexplored due to the cost and time factors involved. Nevertheless, we believe it is the only way to create a label noise free dataset. This dissertation proposes a solution exploiting the characteristics of the Support Vector Machine (SVM) classifier and the sparsity of its solution representation to identify uniform random label noise examples in a dataset. Application of this method is illustrated with problems involving two real-world large scale datasets. This dissertation also presents results for datasets that contain adversarial label noise. A simple extension of this method to a semi-supervised learning approach is also presented. The results show that most mislabels are quickly and effectively identified by the approaches developed in this dissertation. Mislabeled Examples SVM Semi-supervised Learning Adversarial Label Noise Finding Malwares Computer Sciences
289	Development of a Supervised Multivariate Statistical Algorithm for Enhanced Interpretability of Multiblock Analysis. / Utveckling av en algoritm för förbättrad tolkningsbarhet av övervakad multivariat statistisk simultan analys av flera designmatriser. Petters, Patrik January 2017 (has links) In modern biological research, OMICs techniques, such as genomics, proteomics or metabolomics, are often employed to gain deep insights into metabolic regulations and biochemical perturbations in response to a specific research question. To gain complementary biologically relevant information, multiOMICs, i.e., several different OMICs measurements on the same specimen, is becoming increasingly frequent. To be able to take full advantage of this complementarity, joint analysis of such multiOMICs data is necessary, but this is yet an underdeveloped area. In this thesis, a theoretical background is given on general component-based methods for dimensionality reduction such as PCA, PLS for single block analysis, and multiblock PLS for co-analysis of OMICs data. This is followed by a rotation of an unsupervised analysis method. The aim of this method is to divide dimensionality-reduced data in block-distinct and common variance partitions, using the DISCO-SCA approach. Finally, an algorithm for a similar rotation of a supervised (PLS) solution is presented using data available in the literature. To the best of our knowledge, this is the first time that such an approach for rotation of a supervised analysis in block-distinct and common partitions has been developed and tested.This newly developed DISCO-PLS algorithm clearly showed an increased potential for visualisation and interpretation of data, compared to standard PLS. This is shown bybiplots of observation scores and multiblock variable loadings. PCA PLS supervised multiblock analysis common and distinctive variation Natural Sciences Naturvetenskap
290	A Biologically Plausible Supervised Learning Method for Spiking Neurons with Real-world Applications Guo, Lilin 07 November 2016 (has links) Learning is central to infusing intelligence to any biologically inspired system. This study introduces a novel Cross-Correlated Delay Shift (CCDS) learning method for spiking neurons with the ability to learn and reproduce arbitrary spike patterns in a supervised fashion with applicability tospatiotemporalinformation encoded at the precise timing of spikes. By integrating the cross-correlated term,axonaland synapse delays, the CCDS rule is proven to be both biologically plausible and computationally efficient. The proposed learning algorithm is evaluated in terms of reliability, adaptive learning performance, generality to different neuron models, learning in the presence of noise, effects of its learning parameters and classification performance. The results indicate that the proposed CCDS learning rule greatly improves classification accuracy when compared to the standards reached with the Spike Pattern Association Neuron (SPAN) learning rule and the Tempotron learning rule. Network structureis the crucial partforany application domain of Artificial Spiking Neural Network (ASNN). Thus, temporal learning rules in multilayer spiking neural networks are investigated. As extensions of single-layer learning rules, the multilayer CCDS (MutCCDS) is also developed. Correlated neurons are connected through fine-tuned weights and delays. In contrast to the multilayer Remote Supervised Method (MutReSuMe) and multilayertempotronrule (MutTmptr), the newly developed MutCCDS shows better generalization ability and faster convergence. The proposed multilayer rules provide an efficient and biologically plausible mechanism, describing how delays and synapses in the multilayer networks are adjusted to facilitate learning. Interictalspikes (IS) aremorphologicallydefined brief events observed in electroencephalography (EEG) records from patients with epilepsy. The detection of IS remains an essential task for 3D source localization as well as in developing algorithms for seizure prediction and guided therapy. In this work, we present a new IS detection method using the Wavelet Encoding Device (WED) method together with CCDS learning rule and a specially designed Spiking Neural Network (SNN) structure. The results confirm the ability of such SNN to achieve good performance for automatically detecting such events from multichannel EEG records. Spiking Neural Network Supervised Learning Interictal Spike Detection Bioelectrical and Neuroengineering Biomedical Signal Processing

Search results