Global ETD Search

41	A Balanced Secondary Structure Predictor Islam, Md Nasrul 15 May 2015 (has links) Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets. Protein Secondary structure MetaSSPred Support vector machine Genetic algorithm Balanced prediction Other Computer Engineering
42	Detecting exoplanets with machine learning : A comparative study between convolutional neural networks and support vector machines Tiensuu, Jacob, Linderholm, Maja, Dreborg, Sofia, Örn, Fredrik January 2019 (has links) In this project two machine learning methods, Support Vector Machine, SVM, and Convolutional Neural Network, CNN, are studied to determine which method performs best on a labeled data set containing time series of light intensity from extrasolar stars. The main difficulty is that in the data set there are a lot more non exoplanet stars than there are stars with orbiting exoplanets. This is causing a so called imbalanced data set which in this case is improved by i.e. mirroring the curves of stars with an orbiting exoplanet and adding them to the set. Trying to improve the results further, some preprocessing is done before implementing the methods on the data set. For the SVM, feature extraction and fourier transform of the time-series are important measures but further preprocessing alternatives are investigated. For the CNN-method the time-series are both detrended and smoothed, giving two inputs for the same light curve. All code is implemented in python. Of all the validation parameters recall is considered the main priority since it is more important to find all exoplanets than finding all non exoplanets. CNN turned out to be the best performing method for the chosen configurations with 1.000 in recall which exceeds SVM’s recall 0.800. Considering the second validation parameter precision CNN is also the best performing method with a precision of 0.769 over SVM's 0.571. Machine learning Exoplanet Support vector machine Convolution neuralnetwork Computer and Information Sciences Data- och informationsvetenskap
43	Machine learning to detect anomalies in datacenter Lindh, Filip January 2019 (has links) This thesis investigates the possibility of using anomaly detection on performance data of virtual servers in a datacenter to detect malfunctioning servers. Using anomaly detection can potentially reduce the time a server is malfunctioning, as the server can be detected and checked before the error has a significant impact. Several approaches and methods were applied and evaluated on one virtual server: the K-nearest neighbor algorithm, the support-vector machine, the K-means clustering algorithm, self-organizing maps, CPU-memory usage ratio using a Gaussian model, and time series analysis using neural network and linear regression. The evaluation and comparison of the methods were mainly based on reported errors during the time period they were tested. The better the detected anomalies matched the reported errors the higher score they received. It turned out that anomalies in performance data could be linked to real errors in the server to some extent. This enables the possibility of using anomaly detection on performance data as a way to detect malfunctioning servers. The most simple method, looking at the ratio between memory usage and CPU, was the most successful one, detecting most errors. However the anomalies were often detected just after the error had been reported. Support vector machine were more successful at detecting anomalies before they were reported. The proportion of anomalies played a big role however and K-nearest neighbor received higher score when having a higher proportion of anomalies. machine learning anomaly detection server support vector machine performance data Computer Sciences Datavetenskap (datalogi)
44	Aplicação de máquinas de vetor de suporte e modelos auto-regressivos de média móvel na classificação de sinais eletromiográficos. / Application of support vector machines and autoregressive moving average models in electromyography signal classification. Barretto, Mateus Ymanaka 10 December 2007 (has links) O diagnóstico de doenças neuromusculares é feito pelo uso conjunto de várias ferramentas. Dentre elas, o exame de eletromiografia clínica fornece informações vitais ao diagnóstico. A aplicação de alguns classificadores (discriminante linear e redes neurais artificiais) aos diversos parâmetros dos sinais de eletromiografia (número de fases, de reversões e de cruzamentos de zero, freqüência mediana, coeficientes auto-regressivos) tem fornecido resultados promissores na literatura. No entanto, a necessidade de um número grande de coeficientes auto-regressivos direcionou este mestrado ao uso de modelos auto-regressivos de média móvel com um número menor de coeficientes. A classificação (em normal, neuropático ou miopático) foi feita pela máquina de vetor de suporte, um tipo de rede neural artificial de uso recente. O objetivo deste trabalho foi o de estudar a viabilidade do uso de modelos auto-regressivos de média móvel (ARMA) de ordem baixa, em vez de auto-regressivos de ordem alta, em conjunção com a máquina de vetor de suporte, para auxílio ao diagnóstico. Os resultados indicam que a máquina de vetor de suporte tem desempenho melhor que o discriminante linear de Fisher e que os modelos ARMA(1,11) e ARMA(1,12) fornecem altas taxas de classificação (81,5%), cujos valores são próximos ao máximo obtido com modelos auto-regressivos de ordem 39. Portanto, recomenda-se o uso da máquina de vetor de suporte e de modelos ARMA (1,11) ou ARMA(1,12) para a classificação de sinais de eletromiografia de agulha, de 800ms de duração e amostrados a 25kHz. / The diagnosis of neuromuscular diseases is attained by the combined use of several tools. Among these tools, clinical electromyography provides key information to the diagnosis. In the literature, the application of some classifiers (linear discriminant and artificial neural networks) to a variety of electromyography parameters (number of phases, turns and zero crossings; median frequency, auto-regressive coefficients) has provided promising results. Nevertheless, the need of a large number of auto-regressive coefficients has guided this Master\'s thesis to the use of a smaller number of auto-regressive moving-average coefficients. The classification task (into normal, neuropathic or myopathic) was achieved by support vector machines, a type of artificial neural network recently proposed. This work\'s objective was to study if low-order auto-regressive moving-average (ARMA) models can or cannot be used to substitute high-order auto-regressive models, in combination with support vector machines, for diagnostic purposes. Results point that support vector machines have better performance than Fisher linear discriminants. They also show that ARMA(1,11) and ARMA(1,12) models provide high classification rates (81.5%). These values are close to the maximum obtained by using 39 auto-regressive coefficients. So, we recommend the use of support vector machines and ARMA(1,11) or ARMA(1,12) to the classification of 800ms needle electromyography signals acquired at 25kHz. Autoregressive Electromyography Eletromiografia Fisher linear discriminant Redes neurais Regressão linear Support vector machine
45	Support Vector Machine and Application in Seizure Prediction Qiu, Simeng 04 1900 (has links) Nowadays, Machine learning (ML) has been utilized in various kinds of area which across the range from engineering field to business area. In this paper, we first present several kernel machine learning methods of solving classification, regression and clustering problems. These have good performance but also have some limitations. We present examples to each method and analyze the advantages and disadvantages for solving different scenarios. Then we focus on one of the most popular classification methods, Support Vectors Machine (SVM). In addition, we introduce the basic theory, advantages and scenarios of using Support Vector Machine (SVM) deal with classification problems. We also explain a convenient approach of tacking SVM problems which are called Sequential Minimal Optimization (SMO). Moreover, one class SVM can be understood in a different way which is called Support Vector Data Description (SVDD). This is a famous non-linear model problem compared with SVM problems, SVDD can be solved by utilizing Gaussian RBF kernel function combined with SMO. At last, we compared the difference and performance of SVM-SMO implementation and SVM-SVDD implementation. About the application part, we utilized SVM method to handle seizure forecasting in canine epilepsy, after comparing the results from different methods such as random forest, extremely randomized tree, and SVM to classify preictal (pre-seizure) and interictal (interval-seizure) binary data. We draw the conclusion that SVM has the best performance. Seizure Forecasting Method Comparison Data Classification Support Vector Machine Support Vector Data Discription Sequential Minimal Optimization
46	Fault Classification and Location Identification on Electrical Transmission Network Based on Machine Learning Methods Venkatesh, Vidya 01 January 2018 (has links) Power transmission network is the most important link in the country’s energy system as they carry large amounts of power at high voltages from generators to substations. Modern power system is a complex network and requires high-speed, precise, and reliable protective system. Faults in power system are unavoidable and overhead transmission line faults are generally higher compare to other major components. They not only affect the reliability of the system but also cause widespread impact on the end users. Additionally, the complexity of protecting transmission line configurations increases with as the configurations get more complex. Therefore, prediction of faults (type and location) with high accuracy increases the operational stability and reliability of the power system and helps to avoid huge power failure. Furthermore, proper operation of the protective relays requires the correct determination of the fault type as quickly as possible (e.g., reclosing relays). With advent of smart grid, digital technology is implemented allowing deployment of sensors along the transmission lines which can collect live fault data as they contain useful information which can be used for analyzing disturbances that occur in transmission lines. In this thesis, application of machine learning algorithms for fault classification and location identification on the transmission line has been explored. They have ability to “learn” from the data without explicitly programmed and can independently adapt when exposed to new data. The work presented makes following contributions: 1) Two different architectures are proposed which adapts to any N-terminal in the transmission line. 2) The models proposed do not require large dataset or high sampling frequency. Additionally, they can be trained quickly and generalize well to the problem. 3) The first architecture is based off decision trees for its simplicity, easy visualization which have not been used earlier. Fault location method uses traveling wave-based approach for location of faults. The method is tested with performance better than expected accuracy and fault location error is less than ±1%. 4) The second architecture uses single support vector machine to classify ten types of shunt faults and Regression model for fault location which eliminates manual work. The architecture was tested on real data and has proven to be better than first architecture. The regression model has fault location error less than ±1% for both three and two terminals. 5) Both the architectures are tested on real fault data which gives a substantial evidence of its application. Machine Learning Electrical Transmission Lines Faults Decision Trees Support Vector Machine Other Engineering Power and Energy
47	Apprentissage d'un vocabulaire symbolique pour la détection d'objets dans une image Gadat, Sebastien 17 December 2004 (has links) (PDF) Nous étudions le problème fondamental de la sélection de variables descriptives d'un signal, sélection dédiée à divers traitements comme la classification d'objets dans une image. Nous définissons dans un premier temps une loi de probabilités sur les variables descriptives du signal et utilisons un algorithme de descente de gradient, exact puis stochastique pour identifier la bonne distribution de probabilités sur ces variables. Nous donnons alors diverses applications à la classification d'objets (chiffres manuscrits, détection de visages, de spam, ...).<br /> Dans un second temps, nous implémentons un algorithme de diffusion réfléchie sur l'espace des probabilités puis de diffusion réfléchie avec sauts pour permettre plus facilement de faire évoluer l'espace des variables, ainsi que la probabilité apprise. Cette seconde approche nécessite un effort particulier au niveau des simulations stochastiques, qui sont alors étudiées le plus clairement possible.<br />Nous concluons par quelques expériences dans les mêmes domaines que précédemment. [MATH] Mathematics Apprentissage statistique reconnaissance de formes suites de processus tension diffusion réfléchie Support Vector Machine
48	Support Vector Machines: Training and Applications Osuna, Edgar, Freund, Robert, Girosi, Federico 01 March 1997 (has links) The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images. AI MIT Artificial Intelligence Patter recognition Support Vector Machine Classification Detection
49	Geometric Tolerancing of Cylindricity Utilizing Support Vector Regression Lee, Keun Joo 01 January 2009 (has links) In the age where quick turn around time and high speed manufacturing methods are becoming more important, quality assurance is a consistent bottleneck in production. With the development of cheap and fast computer hardware, it has become viable to use machine vision for the collection of data points from a machined part. The generation of these large sample points have necessitated a need for a comprehensive algorithm that will be able to provide accurate results while being computationally efficient. Current established methods are least-squares (LSQ) and non-linear programming (NLP). The LSQ method is often deemed too inaccurate and is prone to providing bad results, while the NLP method is computationally taxing. A novel method of using support vector regression (SVR) to solve the NP-hard problem of cylindricity of machined parts is proposed. This method was evaluated against LSQ and NLP in both accuracy and CPU processing time. An open-source, user-modifiable programming package was developed to test the model. Analysis of test results show the novel SVR algorithm to be a viable alternative in exploring different methods of cylindricity in real-world manufacturing.
50	Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data Baek, Seung Hyun 01 May 2010 (has links) In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models. Classification Support Vector Machine Information Complexity Wavelet Thresholding Recursive Feature Elimination Floating Search Industrial Engineering

Search results