Global ETD Search

51	Machine learning to detect anomalies in datacenter Lindh, Filip January 2019 (has links) This thesis investigates the possibility of using anomaly detection on performance data of virtual servers in a datacenter to detect malfunctioning servers. Using anomaly detection can potentially reduce the time a server is malfunctioning, as the server can be detected and checked before the error has a significant impact. Several approaches and methods were applied and evaluated on one virtual server: the K-nearest neighbor algorithm, the support-vector machine, the K-means clustering algorithm, self-organizing maps, CPU-memory usage ratio using a Gaussian model, and time series analysis using neural network and linear regression. The evaluation and comparison of the methods were mainly based on reported errors during the time period they were tested. The better the detected anomalies matched the reported errors the higher score they received. It turned out that anomalies in performance data could be linked to real errors in the server to some extent. This enables the possibility of using anomaly detection on performance data as a way to detect malfunctioning servers. The most simple method, looking at the ratio between memory usage and CPU, was the most successful one, detecting most errors. However the anomalies were often detected just after the error had been reported. Support vector machine were more successful at detecting anomalies before they were reported. The proportion of anomalies played a big role however and K-nearest neighbor received higher score when having a higher proportion of anomalies. machine learning anomaly detection server support vector machine performance data Computer Sciences Datavetenskap (datalogi)
52	Aplicação de máquinas de vetor de suporte e modelos auto-regressivos de média móvel na classificação de sinais eletromiográficos. / Application of support vector machines and autoregressive moving average models in electromyography signal classification. Barretto, Mateus Ymanaka 10 December 2007 (has links) O diagnóstico de doenças neuromusculares é feito pelo uso conjunto de várias ferramentas. Dentre elas, o exame de eletromiografia clínica fornece informações vitais ao diagnóstico. A aplicação de alguns classificadores (discriminante linear e redes neurais artificiais) aos diversos parâmetros dos sinais de eletromiografia (número de fases, de reversões e de cruzamentos de zero, freqüência mediana, coeficientes auto-regressivos) tem fornecido resultados promissores na literatura. No entanto, a necessidade de um número grande de coeficientes auto-regressivos direcionou este mestrado ao uso de modelos auto-regressivos de média móvel com um número menor de coeficientes. A classificação (em normal, neuropático ou miopático) foi feita pela máquina de vetor de suporte, um tipo de rede neural artificial de uso recente. O objetivo deste trabalho foi o de estudar a viabilidade do uso de modelos auto-regressivos de média móvel (ARMA) de ordem baixa, em vez de auto-regressivos de ordem alta, em conjunção com a máquina de vetor de suporte, para auxílio ao diagnóstico. Os resultados indicam que a máquina de vetor de suporte tem desempenho melhor que o discriminante linear de Fisher e que os modelos ARMA(1,11) e ARMA(1,12) fornecem altas taxas de classificação (81,5%), cujos valores são próximos ao máximo obtido com modelos auto-regressivos de ordem 39. Portanto, recomenda-se o uso da máquina de vetor de suporte e de modelos ARMA (1,11) ou ARMA(1,12) para a classificação de sinais de eletromiografia de agulha, de 800ms de duração e amostrados a 25kHz. / The diagnosis of neuromuscular diseases is attained by the combined use of several tools. Among these tools, clinical electromyography provides key information to the diagnosis. In the literature, the application of some classifiers (linear discriminant and artificial neural networks) to a variety of electromyography parameters (number of phases, turns and zero crossings; median frequency, auto-regressive coefficients) has provided promising results. Nevertheless, the need of a large number of auto-regressive coefficients has guided this Master\'s thesis to the use of a smaller number of auto-regressive moving-average coefficients. The classification task (into normal, neuropathic or myopathic) was achieved by support vector machines, a type of artificial neural network recently proposed. This work\'s objective was to study if low-order auto-regressive moving-average (ARMA) models can or cannot be used to substitute high-order auto-regressive models, in combination with support vector machines, for diagnostic purposes. Results point that support vector machines have better performance than Fisher linear discriminants. They also show that ARMA(1,11) and ARMA(1,12) models provide high classification rates (81.5%). These values are close to the maximum obtained by using 39 auto-regressive coefficients. So, we recommend the use of support vector machines and ARMA(1,11) or ARMA(1,12) to the classification of 800ms needle electromyography signals acquired at 25kHz. Autoregressive Electromyography Eletromiografia Fisher linear discriminant Redes neurais Regressão linear Support vector machine
53	Support Vector Machine and Application in Seizure Prediction Qiu, Simeng 04 1900 (has links) Nowadays, Machine learning (ML) has been utilized in various kinds of area which across the range from engineering field to business area. In this paper, we first present several kernel machine learning methods of solving classification, regression and clustering problems. These have good performance but also have some limitations. We present examples to each method and analyze the advantages and disadvantages for solving different scenarios. Then we focus on one of the most popular classification methods, Support Vectors Machine (SVM). In addition, we introduce the basic theory, advantages and scenarios of using Support Vector Machine (SVM) deal with classification problems. We also explain a convenient approach of tacking SVM problems which are called Sequential Minimal Optimization (SMO). Moreover, one class SVM can be understood in a different way which is called Support Vector Data Description (SVDD). This is a famous non-linear model problem compared with SVM problems, SVDD can be solved by utilizing Gaussian RBF kernel function combined with SMO. At last, we compared the difference and performance of SVM-SMO implementation and SVM-SVDD implementation. About the application part, we utilized SVM method to handle seizure forecasting in canine epilepsy, after comparing the results from different methods such as random forest, extremely randomized tree, and SVM to classify preictal (pre-seizure) and interictal (interval-seizure) binary data. We draw the conclusion that SVM has the best performance. Seizure Forecasting Method Comparison Data Classification Support Vector Machine Support Vector Data Discription Sequential Minimal Optimization
54	Fault Classification and Location Identification on Electrical Transmission Network Based on Machine Learning Methods Venkatesh, Vidya 01 January 2018 (has links) Power transmission network is the most important link in the country’s energy system as they carry large amounts of power at high voltages from generators to substations. Modern power system is a complex network and requires high-speed, precise, and reliable protective system. Faults in power system are unavoidable and overhead transmission line faults are generally higher compare to other major components. They not only affect the reliability of the system but also cause widespread impact on the end users. Additionally, the complexity of protecting transmission line configurations increases with as the configurations get more complex. Therefore, prediction of faults (type and location) with high accuracy increases the operational stability and reliability of the power system and helps to avoid huge power failure. Furthermore, proper operation of the protective relays requires the correct determination of the fault type as quickly as possible (e.g., reclosing relays). With advent of smart grid, digital technology is implemented allowing deployment of sensors along the transmission lines which can collect live fault data as they contain useful information which can be used for analyzing disturbances that occur in transmission lines. In this thesis, application of machine learning algorithms for fault classification and location identification on the transmission line has been explored. They have ability to “learn” from the data without explicitly programmed and can independently adapt when exposed to new data. The work presented makes following contributions: 1) Two different architectures are proposed which adapts to any N-terminal in the transmission line. 2) The models proposed do not require large dataset or high sampling frequency. Additionally, they can be trained quickly and generalize well to the problem. 3) The first architecture is based off decision trees for its simplicity, easy visualization which have not been used earlier. Fault location method uses traveling wave-based approach for location of faults. The method is tested with performance better than expected accuracy and fault location error is less than ±1%. 4) The second architecture uses single support vector machine to classify ten types of shunt faults and Regression model for fault location which eliminates manual work. The architecture was tested on real data and has proven to be better than first architecture. The regression model has fault location error less than ±1% for both three and two terminals. 5) Both the architectures are tested on real fault data which gives a substantial evidence of its application. Machine Learning Electrical Transmission Lines Faults Decision Trees Support Vector Machine Other Engineering Power and Energy
55	Apprentissage d'un vocabulaire symbolique pour la détection d'objets dans une image Gadat, Sebastien 17 December 2004 (has links) (PDF) Nous étudions le problème fondamental de la sélection de variables descriptives d'un signal, sélection dédiée à divers traitements comme la classification d'objets dans une image. Nous définissons dans un premier temps une loi de probabilités sur les variables descriptives du signal et utilisons un algorithme de descente de gradient, exact puis stochastique pour identifier la bonne distribution de probabilités sur ces variables. Nous donnons alors diverses applications à la classification d'objets (chiffres manuscrits, détection de visages, de spam, ...).<br /> Dans un second temps, nous implémentons un algorithme de diffusion réfléchie sur l'espace des probabilités puis de diffusion réfléchie avec sauts pour permettre plus facilement de faire évoluer l'espace des variables, ainsi que la probabilité apprise. Cette seconde approche nécessite un effort particulier au niveau des simulations stochastiques, qui sont alors étudiées le plus clairement possible.<br />Nous concluons par quelques expériences dans les mêmes domaines que précédemment. [MATH] Mathematics Apprentissage statistique reconnaissance de formes suites de processus tension diffusion réfléchie Support Vector Machine
56	Support Vector Machines: Training and Applications Osuna, Edgar, Freund, Robert, Girosi, Federico 01 March 1997 (has links) The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images. AI MIT Artificial Intelligence Patter recognition Support Vector Machine Classification Detection
57	Geometric Tolerancing of Cylindricity Utilizing Support Vector Regression Lee, Keun Joo 01 January 2009 (has links) In the age where quick turn around time and high speed manufacturing methods are becoming more important, quality assurance is a consistent bottleneck in production. With the development of cheap and fast computer hardware, it has become viable to use machine vision for the collection of data points from a machined part. The generation of these large sample points have necessitated a need for a comprehensive algorithm that will be able to provide accurate results while being computationally efficient. Current established methods are least-squares (LSQ) and non-linear programming (NLP). The LSQ method is often deemed too inaccurate and is prone to providing bad results, while the NLP method is computationally taxing. A novel method of using support vector regression (SVR) to solve the NP-hard problem of cylindricity of machined parts is proposed. This method was evaluated against LSQ and NLP in both accuracy and CPU processing time. An open-source, user-modifiable programming package was developed to test the model. Analysis of test results show the novel SVR algorithm to be a viable alternative in exploring different methods of cylindricity in real-world manufacturing.
58	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data Baek, Seung Hyun 01 May 2010 (has links) In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models. Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering
59	A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis Moon, Sangwoo 01 August 2010 (has links) Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets. Support vector machine independent component analysis Hybrid dimensionality reduction Constrained optimization
60	Screening Web Breaks in a Pressroom by Soft Computing Ahmad, Alzghoul January 2008 (has links) <p>Web breaks are considered as one of the most significant runnability problems</p><p>in a pressroom. This work concerns the analysis of relation between various</p><p>parameters (variables) characterizing the paper, printing press, the printing</p><p>process and the web break occurrence. A large number of variables, 61 in</p><p>total, obtained off-line as well as measured online during the printing process</p><p>are used in the investigation. Each paper reel is characterized by a vector x</p><p>of 61 components.</p><p>Two main approaches are explored. The first one treats the problem as a</p><p>data classification task into "break" and "non break" classes. The procedures</p><p>of classifier training, the selection of relevant input variables and the selection</p><p>of hyper-parameters of the classifier are aggregated into one process based on</p><p>genetic search. The second approach combines procedures of genetic search</p><p>based variable selection and data mapping into a low dimensional space. The</p><p>genetic search process results into a variable set providing the best mapping</p><p>according to some quality function.</p><p>The empirical study was performed using data collected at a pressroom</p><p>in Sweden. The total number of data points available for the experiments</p><p>was equal to 309. Amongst those, only 37 data points represent the web</p><p>break cases. The results of the investigations have shown that the linear</p><p>relations between the independent variables and the web break frequency</p><p>are not strong.</p><p>Three important groups of variables were identified, namely Lab data</p><p>(variables characterizing paper properties and measured off-line in a paper</p><p>mill lab), Ink registry (variables characterizing operator actions aimed to</p><p>adjust ink registry) and Web tension. We found that the most important</p><p>variables are: Ink registry Y LS MD (adjustments of yellow ink registry</p><p>in machine direction on the lower paper side), Air permeability (character-</p><p>izes paper porosity), Paper grammage, Elongation MD, and four variables</p><p>characterizing web tension: Moment mean, Min sliding Mean, Web tension</p><p>variance, and Web tension mean.</p><p>The proposed methods were helpful in finding the variables influencing </p><p>the occurrence of web breaks and can also be used for solving other industrial</p><p>problems.</p> Printing press Support vector machine Web break Genetic search Data mining

Search results