Spelling suggestions: "subject:"8upport vector machines"" "subject:"6upport vector machines""
91 |
An Equivalence Between Sparse Approximation and Support Vector MachinesGirosi, Federico 01 May 1997 (has links)
In the first part of this paper we show a similarity between the principle of Structural Risk Minimization Principle (SRM) (Vapnik, 1982) and the idea of Sparse Approximation, as defined in (Chen, Donoho and Saunders, 1995) and Olshausen and Field (1996). Then we focus on two specific (approximate) implementations of SRM and Sparse Approximation, which have been used to solve the problem of function approximation. For SRM we consider the Support Vector Machine technique proposed by V. Vapnik and his team at AT&T Bell Labs, and for Sparse Approximation we consider a modification of the Basis Pursuit De-Noising algorithm proposed by Chen, Donoho and Saunders (1995). We show that, under certain conditions, these two techniques are equivalent: they give the same solution and they require the solution of the same quadratic programming problem.
|
92 |
A Note on Support Vector Machines DegeneracyRifkin, Ryan, Pontil, Massimiliano, Verri, Alessandro 11 August 1999 (has links)
When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.
|
93 |
Geometric Tolerancing of Cylindricity Utilizing Support Vector RegressionLee, Keun Joo 01 January 2009 (has links)
In the age where quick turn around time and high speed manufacturing methods are becoming more important, quality assurance is a consistent bottleneck in production. With the development of cheap and fast computer hardware, it has become viable to use machine vision for the collection of data points from a machined part. The generation of these large sample points have necessitated a need for a comprehensive algorithm that will be able to provide accurate results while being computationally efficient. Current established methods are least-squares (LSQ) and non-linear programming (NLP). The LSQ method is often deemed too inaccurate and is prone to providing bad results, while the NLP method is computationally taxing. A novel method of using support vector regression (SVR) to solve the NP-hard problem of cylindricity of machined parts is proposed. This method was evaluated against LSQ and NLP in both accuracy and CPU processing time. An open-source, user-modifiable programming package was developed to test the model. Analysis of test results show the novel SVR algorithm to be a viable alternative in exploring different methods of cylindricity in real-world manufacturing.
|
94 |
Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteinsDurek, Pawel, Schudoma, Christian, Weckwerth, Wolfram, Selbig, Joachim, Walther, Dirk January 2009 (has links)
Background:
Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.
Results:
We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.
Conclusion:
While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structurebased P-site prediction method has been made available at http://phos3d.mpimp-golm.mpg.de.
|
95 |
Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein StructureAltun, Gulsah 22 April 2008 (has links)
Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers.
|
96 |
Prediction Of Protein Subcellular Localization Using Global Protein Sequence FeatureBozkurt, Burcin 01 August 2003 (has links) (PDF)
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years.
Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic properties of proteins. Knowledge of the structure of a protein is essential for describing and understanding its function. In addition, subcellular localization of a protein can be used to provide some amount of characterization of a protein. In this study, a method for the prediction of protein subcellular localization based on primary sequence data is described. Primary sequence data for a protein is based on amino acid sequence. The frequency value for each amino acid is computed in one given position. Assigned frequencies are used in a new encoding scheme that conserves biological information based on point accepted mutations (PAM) substitution matrix. This method can be used to predict the nuclear, the cytosolic sequences, the mitochondrial targeting peptides (mTP) and the signal peptides (SP). For clustering purposes, other than well known traditional techniques, principle component analysis (PCA)" / and self-organizing maps (SOM)" / are used. For classication purposes, support vector machines (SVM)" / , a method of statistical learning theory recently introduced to bioinformatics is used. The aim of the combination of feature extraction, clustering and classification methods is to design an acccurate system that predicts the subcellular localization of proteins presented into the system. Our scheme for combining several methods is cascading or serial combination according to its architecture. In the cascading architecture, the output of a method serves as the input of the other model used.
|
97 |
Impacts of midpoint FACTS controllers on the coordiantion between generator phase backup protection and generator capability limitsElsamahy, Mohamed Salah Kamel 15 July 2011
The thesis reports the results of comprehensive studies carried out to explore the impact of midpoint FACTS Controllers (STATCOM and SVC) on the generator distance phase backup protection in order to identify important issues that protection engineers need to consider when designing and setting a generator protection system. In addition, practical, feasible and simple solutions to mitigate the adverse impact of midpoint FACTS Controllers on the generator distance phase backup protection are explored.
The results of these studies show that midpoint FACTS Controllers have an adverse effect on the generator distance phase backup protection. This adverse effect, which can be in the form of underreach, overreach or a time delay, varies according to the fault type, fault location and generator loading. Moreover, it has been found that the adverse effect of the midpoint FACTS Controllers extends to affect the coordination between the generator distance phase backup protection and the generator steady-state overexcited capability limit.
The Support Vector Machines classification technique is proposed as a replacement for the existing generator distance phase backup protection relay in order to alleviate potential problems. It has been demonstrated that this technique is a very promising solution, as it is fast, reliable and has a high performance efficiency. This will result in enhancing the coordination between the generator phase backup protection and the generator steady-state overexcited capability limit in the presence of midpoint FACTS Controllers.
The thesis also presents the results of investigations carried out to explore the impact of the generator distance phase backup protection relay on the generator overexcitation thermal capability. The results of these investigations reveal that with the relay settings according to the current standards, the generator is over-protected and the generator distance phase backup protection relay restricts the generator overexcitation thermal capability during system disturbances. This restriction does not allow the supply of the maximum reactive power of the generating unit during such events. The restriction on the generator overexcitation thermal capability caused by the generator distance phase backup protection relay highlights the necessity to revise the relay settings. The proposed solution in this thesis is to reduce the generator distance phase backup protection relay reach in order to provide secure performance during system disturbances.
|
98 |
Efficient Kernel Methods for Statistical DetectionSu, Wanhua 20 March 2008 (has links)
This research is motivated by a drug discovery problem -- the AIDS anti-viral database from the National Cancer Institute. The objective of the study is to develop effective statistical methods to model the relationship between the chemical structure of a compound and its activity against the HIV-1 virus. And as a result, the structure-activity model can be used to predict the activity of
new compounds and thus helps identify those active chemical compounds that can be used as drug candidates. Since active compounds are generally rare in a compound library, we recognize the drug discovery problem as an application of the so-called statistical detection problem. In a typical statistical detection problem, we have data {Xi,Yi}, where Xi is the predictor vector of the ith observation and Yi={0,1} is its class label. The objective of a statistical detection problem is to identify class-1 observations, which are extremely rare. Besides drug discovery problem, other applications of statistical detection include direct marketing and fraud detection.
We propose a computationally efficient detection method called LAGO, which stands for "locally adjusted GO estimator". The original idea is inspired by an ancient game known today as "GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1 observation is calculated as the average distance between this class-1 observation and its K-nearest class-0 neighbors. In the second step, we adjust the density estimated in the first step locally according to the density of class 0. It can be shown that the amount of adjustment in the second step is approximately inversely proportional to the bandwidth calculated in the first step.
Application to the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and support vector machines.
One drawback of the existing LAGO is that it only
provides a point estimate of a test point's possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis, we present a Bayesian framework for LAGO, referred to as BLAGO. This Bayesian approach enables quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is calculated over a grid of (K, alpha) pairs by integrating out beta0 and beta1 using the Laplace approximation, where K and alpha are two parameters to construct the LAGO score. The parameters beta0, beta1 are the coefficients of the logistic transformation that converts the LAGO score to the probability scale. BLAGO
provides proper probabilistic predictions that have support on (0,1) and captures uncertainty of the predictions as well. By avoiding Markov chain Monte Carlo algorithms and using the Laplace approximation, BLAGO is computationally very efficient. Without the need of cross-validation, BLAGO is even more computationally efficient than LAGO.
|
99 |
Road Sign Recognition based onInvariant Features using SupportVector MachineGilani, Syed Hassan January 2007 (has links)
Since last two decades researches have been working on developing systems that can assistsdrivers in the best way possible and make driving safe. Computer vision has played a crucialpart in design of these systems. With the introduction of vision techniques variousautonomous and robust real-time traffic automation systems have been designed such asTraffic monitoring, Traffic related parameter estimation and intelligent vehicles. Among theseautomatic detection and recognition of road signs has became an interesting research topic.The system can assist drivers about signs they don’t recognize before passing them.Aim of this research project is to present an Intelligent Road Sign Recognition System basedon state-of-the-art technique, the Support Vector Machine. The project is an extension to thework done at ITS research Platform at Dalarna University [25]. Focus of this research work ison the recognition of road signs under analysis. When classifying an image its location, sizeand orientation in the image plane are its irrelevant features and one way to get rid of thisambiguity is to extract those features which are invariant under the above mentionedtransformation. These invariant features are then used in Support Vector Machine forclassification. Support Vector Machine is a supervised learning machine that solves problemin higher dimension with the help of Kernel functions and is best know for classificationproblems.
|
100 |
Support Vector Machines for Classification applied to Facial Expression Analysis and Remote Sensing / Support Vector Machines for Classification applied to Facial Expression Analysis and Remote SensingJottrand, Matthieu January 2005 (has links)
The subject of this thesis is the application of Support Vector Machines on two totally different applications, facial expressions recognition and remote sensing. The basic idea of kernel algorithms is to transpose input data in a higher dimensional space, the feature space, in which linear operations on the data can be processed more easily. These operations in the feature space can be expressed in terms of input data thanks to the kernel functions. Support Vector Machines is a classifier using this kernel method by computing, in the feature space and on basis of examples of the different classes, hyperplanes that separate the classes. The hyperplanes in the feature space correspond to non linear surfaces in the input space. Concerning facial expressions, the aim is to train and test a classifier able to recognise, on basis of some pictures of faces, which emotion (among these six ones: anger, disgust, fear, joy, sad, and surprise) that is expressed by the person in the picture. In this application, each picture has to be seen has a point in an N-dimensional space where N is the number of pixels in the image. The second application is the detection of camouflage nets hidden in vegetation using a hyperspectral image taken by an aircraft. In this case the classification is computed for each pixel, represented by a vector whose elements are the different frequency bands of this pixel.
|
Page generated in 0.0845 seconds