Spelling suggestions: "subject:"8upport vector machines"" "subject:"6upport vector machines""
31 
Prediction of Oxidation States of Cysteines and Disulphide ConnectivityDu, Aiguo 27 November 2007 (has links)
Knowledge on cysteine oxidation state and disulfide bond connectivity is of great importance to protein chemistry and 3D structures. This research is aimed at finding the most relevant features in prediction of cysteines oxidation states and the disulfide bonds connectivity of proteins. Models predicting the oxidation states of cysteines are developed with machine learning techniques such as Support Vector Machines (SVMs) and Associative Neural Networks (ASNNs). A record high prediction accuracy of oxidation state, 95%, is achieved by incorporating the oxidation states of Nterminus cysteines, flanking sequences of cysteines and global information on the protein chain (number of cysteines, length of the chain and amino acids composition of the chain etc.) into the SVM encoding. This is 5% higher than the current methods. This indicates to us that the oxidation states of amino terminal cysteines infer the oxidation states of other cysteines in the same protein chain. Satisfactory prediction results are also obtained with the newer and more inclusive SPX dataset, especially for chains with higher number of cysteines. Compared to literature methods, our approach is a onestep prediction system, which is easier to implement and use. A side by side comparison of SVM and ASNN is conducted. Results indicated that SVM outperform ASNN on this particular problem. For the prediction of correct pairings of cysteines to form disulfide bonds, we first study disulfide connectivity by calculating the local interaction potentials between the flanking sequences of the cysteine pairs. The obtained interaction potential is further adjusted by the coefficients related to the binding motif of enzymes during disulfide formation and also by the linear distance between the cysteine pairs. Finally, maximized weight matching algorithm is applied and performance of the interaction potentials evaluated. Overall prediction accuracy is unsatisfactory compared with the literature. SVM is used to predict the disulfide connectivity with the assumption that oxidation states of cysteines on the protein are known. Information on binding region during disulfide formation, distance between cysteine pairs, global information of the protein chain and the flanking sequences around the cysteine pairs are included in the SVM encoding. Prediction results illustrate the advantage of using possible anchor region information.

32 
Active Learning with SemiSupervised Support Vector MachinesChinaei, Leila January 2007 (has links)
A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines
(SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next
unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semisupervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive.

33 
Convex Large Margin Training  Unsupervised, Semisupervised, and Robust Support Vector MachinesXu, Linli January 2007 (has links)
Support vector machines (SVMs) have been a dominant machine learning technique for more than a decade. The intuitive principle
behind SVM training is to find the maximum margin separating hyperplane for a given set of binary labeled training data. Previously, SVMs have been primarily applied to supervised learning problems, where target class labels are provided with the
data. Developing unsupervised extensions to SVMs, where no class labels are given, turns out to be a challenging problem. In
this dissertation, I propose a principled approach for unsupervised and semisupervised SVM training by formulating
convex relaxations of the natural training criterion: find a (constrained) labeling that would yield an optimal SVM classifier
on the resulting labeled training data. This relaxation yields a semidefinite program (SDP) that can be solved in polynomial time.
The resulting training procedures can be applied to twoclass and multiclass problems, and ultimately to the multivariate case, achieving high quality results in each case. In addition to unsupervised training, I also consider the problem of reducing the outlier sensitivity of standard supervised SVM training. Here I show that a similar convex relaxation can be applied to improve the robustness of SVMs by explicitly
suppressing outliers in the training process. The proposed approach can achieve superior results to standard SVMs in the
presence of outliers.

34 
Active Learning with SemiSupervised Support Vector MachinesChinaei, Leila January 2007 (has links)
A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines
(SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next
unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semisupervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive.

35 
Convex Large Margin Training  Unsupervised, Semisupervised, and Robust Support Vector MachinesXu, Linli January 2007 (has links)
Support vector machines (SVMs) have been a dominant machine learning technique for more than a decade. The intuitive principle
behind SVM training is to find the maximum margin separating hyperplane for a given set of binary labeled training data. Previously, SVMs have been primarily applied to supervised learning problems, where target class labels are provided with the
data. Developing unsupervised extensions to SVMs, where no class labels are given, turns out to be a challenging problem. In
this dissertation, I propose a principled approach for unsupervised and semisupervised SVM training by formulating
convex relaxations of the natural training criterion: find a (constrained) labeling that would yield an optimal SVM classifier
on the resulting labeled training data. This relaxation yields a semidefinite program (SDP) that can be solved in polynomial time.
The resulting training procedures can be applied to twoclass and multiclass problems, and ultimately to the multivariate case, achieving high quality results in each case. In addition to unsupervised training, I also consider the problem of reducing the outlier sensitivity of standard supervised SVM training. Here I show that a similar convex relaxation can be applied to improve the robustness of SVMs by explicitly
suppressing outliers in the training process. The proposed approach can achieve superior results to standard SVMs in the
presence of outliers.

36 
Text Categorization for EGovernment Applications: The Case of City Mayor¡¦s MailboxKuo, ChiungJung 29 August 2006 (has links)
The central government and most of local governments in Taiwan have adopted the email services to provide citizens for requesting services or expressing their opinions through Internet. Traditionally, these requests/opinions need to be manually classified into appropriate departments for service rendering. However, due to the everincreasing number of requests/opinions received, the manual classification approach is time consuming and becomes impractical. Therefore, in this study, we attempt to apply text categorization techniques for constructing automatically a classification mechanism in order to establish an efficient egovernment service portal.
The purpose of this thesis is to investigate effectiveness of different text categorization methods in supporting automatic classification of service requests/opinions emails sent to Mayor¡¦s mailbox. Specifically, in each phase of text categorization learning, we adopt and evaluate two methods commonly employed in prior research. In the feature selection phase, both the maximal x2¡@statistic method and the weighted average x2¡@statistic method of x2¡@statistic are evaluated. We consider the Binary and TFxIDF representation schemes in the document representation phase. Finally, we adopt the decision tree induction technique and the support vector machines (SVM) technique for inducing a text categorization model for our target egovernment application. Our empirical evaluation results show that the text categorization method that employs the maximal x2 statistic method for feature selection, the Binary representation scheme, and the support vector machines as the underlying induction algorithm can reach an accuracy rate of 77.28% and an recall and precision rates of more than 77%. Such satisfactory classification effectiveness suggests that the text categorization approach can be employed to establish an effective and intelligent egovernment service portal.

37 
SVMbased Robust Template Design of Cellular Neural Networks and Primary Study of Wilcoxon Learning MachinesLin, YihLon 01 January 2007 (has links)
This thesis is divided into two parts. In the first part, a general problem of the robust template decomposition with restricted weights for cellular neural networks (CNNs) implementing an arbitrary Boolean function is investigated. In the second part, some primary study of the novel Wilcoxon learning machines is made.
In the first part of the thesis for the robust CNN template design, the geometric margin of a linear classifier with respect to a training data set, a notion borrowed from the machine learning theory, is used to define the robustness of an uncoupled CNN implementing a linearly separable Boolean function. Consequently, the socalled maximal margin classifiers can be devised via support vector machines (SVMs) to provide the most robust template design for uncoupled CNNs implementing linearly separable Boolean functions. Some general properties of robust CNNs with or without restricted weights are discussed. Moreover, all robust CNNs with restricted weights are characterized. For an arbitrarily given Boolean function, we propose an algorithm, which is the generalized version of the well known CFC algorithm, to find a sequence of robust uncoupled CNNs implementing the given Boolean function.
In the second part of the thesis, we investigate the novel Wilcoxon learning machines (WLMs). The invention of these learning machines was motivated by the Wilcoxon approach to linear regression problems in statistics. The resulting linear regressors are quits robust against outliers, as is well known in statistics. The Wilcoxon learning machines investigated in this thesis include Wilcoxon Neural Network (WNN), Wilcoxon Generalized Radial Basis Function Network (WGRBFN), Wilcoxon Fuzzy Neural Network (WFNN), and Kernelbased Wilcoxon Regressor (KWR).

38 
MaltParser  An Architecture for Inductive Labeled Dependency ParsingHall, Johan January 2006 (has links)
<p>This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).</p><p>The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of wellformedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the partofspeech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memorybased learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.</p> / <p>Denna licentiatavhandling presenterar en mjukvaruarkitektur för</p><p>datadriven dependensparsning, dvs. för att automatiskt skapa en</p><p>syntaktisk analys i form av dependensgrafer för meningar i texter</p><p>på naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektormaskiner (SVM).</p><p>MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVMgränssnittet. Experimenten visar att det är möjligt att reducera inlärnings och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodell</p><p>har genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats.</p>

39 
Computational Prediction of Transposon Insertion SitesAyat, Maryam 04 April 2013 (has links)
Transposons are DNA segments that can move or transpose themselves to new positions within the genome of an organism. Biologists need to predict preferred insertion sites of transposons to devise strategies in functional genomics and gene therapy studies. It has been found that the deformability property of the local DNA structure of the integration sites, called Vstep, is of significant importance in the targetsite selection process. We considered the Vstep profiles of insertion sites and developed predictors based on Artificial Neural Networks (ANN) and Support Vector Machines (SVM). We trained our ANN and SVM predictors with the Sleeping Beauty transposonal data, and used them for identifying preferred individual insertion sites (each 12bp in length) and regions (each 100bp in length). Running a fivefold crossvalidation showed that (1) Both ANN and SVM predictors are more successful in recognizing preferred regions than preferred individual sites; (2) Both ANN and SVM predictors have excellent performance in finding the most preferred regions (more than 90% sensitivity and specificity); and (3) The SVM predictor outperforms the ANN predictor in recognizing preferred individual sites and regions. The SVM has 83% sensitivity and 72% specificity in identifying preferred individual insertion sites, and 85% sensitivity and 90% specificity in recognizing preferred insertion regions.

40 
EndtoEnd Singlerate Multicast Congestion Detection Using Support Vector Machines.Liu, Xiaoming. January 2008 (has links)
<p>
<p>  / </p>
</p>
<p align="left">IP multicast is an efficient mechanism for simultaneously transmitting bulk data to multiple receivers. Many applications can benefit from multicast, such as audio and videoconferencing, multiplayer games, multimedia broadcasting, distance education, and data replication. For either technical or policy reasons, IP multicast still has not yet been deployed in today&rsquo / s Internet. Congestion is one of the most important issues impeding the development and deployment of IP multicast and multicast applications.</p>

Page generated in 0.0641 seconds