Global ETD Search

31	Prediction of Oxidation States of Cysteines and Disulphide Connectivity Du, Aiguo 27 November 2007 (has links) Knowledge on cysteine oxidation state and disulfide bond connectivity is of great importance to protein chemistry and 3-D structures. This research is aimed at finding the most relevant features in prediction of cysteines oxidation states and the disulfide bonds connectivity of proteins. Models predicting the oxidation states of cysteines are developed with machine learning techniques such as Support Vector Machines (SVMs) and Associative Neural Networks (ASNNs). A record high prediction accuracy of oxidation state, 95%, is achieved by incorporating the oxidation states of N-terminus cysteines, flanking sequences of cysteines and global information on the protein chain (number of cysteines, length of the chain and amino acids composition of the chain etc.) into the SVM encoding. This is 5% higher than the current methods. This indicates to us that the oxidation states of amino terminal cysteines infer the oxidation states of other cysteines in the same protein chain. Satisfactory prediction results are also obtained with the newer and more inclusive SPX dataset, especially for chains with higher number of cysteines. Compared to literature methods, our approach is a one-step prediction system, which is easier to implement and use. A side by side comparison of SVM and ASNN is conducted. Results indicated that SVM outperform ASNN on this particular problem. For the prediction of correct pairings of cysteines to form disulfide bonds, we first study disulfide connectivity by calculating the local interaction potentials between the flanking sequences of the cysteine pairs. The obtained interaction potential is further adjusted by the coefficients related to the binding motif of enzymes during disulfide formation and also by the linear distance between the cysteine pairs. Finally, maximized weight matching algorithm is applied and performance of the interaction potentials evaluated. Overall prediction accuracy is unsatisfactory compared with the literature. SVM is used to predict the disulfide connectivity with the assumption that oxidation states of cysteines on the protein are known. Information on binding region during disulfide formation, distance between cysteine pairs, global information of the protein chain and the flanking sequences around the cysteine pairs are included in the SVM encoding. Prediction results illustrate the advantage of using possible anchor region information. protein disulphide oxidation states cysteines support vector machines Computer Sciences
32	Active Learning with Semi-Supervised Support Vector Machines Chinaei, Leila January 2007 (has links) A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive. Active Learning Semi-Supervised Support Vector Machines Computer Science
33	Convex Large Margin Training - Unsupervised, Semi-supervised, and Robust Support Vector Machines Xu, Linli January 2007 (has links) Support vector machines (SVMs) have been a dominant machine learning technique for more than a decade. The intuitive principle behind SVM training is to find the maximum margin separating hyperplane for a given set of binary labeled training data. Previously, SVMs have been primarily applied to supervised learning problems, where target class labels are provided with the data. Developing unsupervised extensions to SVMs, where no class labels are given, turns out to be a challenging problem. In this dissertation, I propose a principled approach for unsupervised and semi-supervised SVM training by formulating convex relaxations of the natural training criterion: find a (constrained) labeling that would yield an optimal SVM classifier on the resulting labeled training data. This relaxation yields a semidefinite program (SDP) that can be solved in polynomial time. The resulting training procedures can be applied to two-class and multi-class problems, and ultimately to the multivariate case, achieving high quality results in each case. In addition to unsupervised training, I also consider the problem of reducing the outlier sensitivity of standard supervised SVM training. Here I show that a similar convex relaxation can be applied to improve the robustness of SVMs by explicitly suppressing outliers in the training process. The proposed approach can achieve superior results to standard SVMs in the presence of outliers. Artificial Intelligence Machine Learning Support Vector Machines Computer Science
34	Active Learning with Semi-Supervised Support Vector Machines Chinaei, Leila January 2007 (has links) A significant problem in many machine learning tasks is that it is time consuming and costly to gather the necessary labeled data for training the learning algorithm to a reasonable level of performance. In reality, it is often the case that a small amount of labeled data is available and that more unlabeled data could be labeled on demand at a cost. If the labeled data is obtained by a process outside of the control of the learner, then the learner is passive. If the learner picks the data to be labeled, then this becomes active learning. This has the advantage that the learner can pick data to gain specific information that will speed up the learning process. Support Vector Machines (SVMs) have many properties that make them attractive to use as a learning algorithm for many real world applications including classification tasks. Some researchers have proposed algorithms for active learning with SVMs, i.e. algorithms for choosing the next unlabeled instance to get label for. Their approach is supervised in nature since they do not consider all unlabeled instances while looking for the next instance. In this thesis, we propose three new algorithms for applying active learning for SVMs in a semi-supervised setting which takes advantage of the presence of all unlabeled points. The suggested approaches might, by reducing the number of experiments needed, yield considerable savings in costly classification problems in the cases when finding the training data for a classifier is expensive. Active Learning Semi-Supervised Support Vector Machines Computer Science
35	Convex Large Margin Training - Unsupervised, Semi-supervised, and Robust Support Vector Machines Xu, Linli January 2007 (has links) Support vector machines (SVMs) have been a dominant machine learning technique for more than a decade. The intuitive principle behind SVM training is to find the maximum margin separating hyperplane for a given set of binary labeled training data. Previously, SVMs have been primarily applied to supervised learning problems, where target class labels are provided with the data. Developing unsupervised extensions to SVMs, where no class labels are given, turns out to be a challenging problem. In this dissertation, I propose a principled approach for unsupervised and semi-supervised SVM training by formulating convex relaxations of the natural training criterion: find a (constrained) labeling that would yield an optimal SVM classifier on the resulting labeled training data. This relaxation yields a semidefinite program (SDP) that can be solved in polynomial time. The resulting training procedures can be applied to two-class and multi-class problems, and ultimately to the multivariate case, achieving high quality results in each case. In addition to unsupervised training, I also consider the problem of reducing the outlier sensitivity of standard supervised SVM training. Here I show that a similar convex relaxation can be applied to improve the robustness of SVMs by explicitly suppressing outliers in the training process. The proposed approach can achieve superior results to standard SVMs in the presence of outliers. Artificial Intelligence Machine Learning Support Vector Machines Computer Science
36	Text Categorization for E-Government Applications: The Case of City Mayor¡¦s Mailbox Kuo, Chiung-Jung 29 August 2006 (has links) The central government and most of local governments in Taiwan have adopted the e-mail services to provide citizens for requesting services or expressing their opinions through Internet. Traditionally, these requests/opinions need to be manually classified into appropriate departments for service rendering. However, due to the ever-increasing number of requests/opinions received, the manual classification approach is time consuming and becomes impractical. Therefore, in this study, we attempt to apply text categorization techniques for constructing automatically a classification mechanism in order to establish an efficient e-government service portal. The purpose of this thesis is to investigate effectiveness of different text categorization methods in supporting automatic classification of service requests/opinions emails sent to Mayor¡¦s mailbox. Specifically, in each phase of text categorization learning, we adopt and evaluate two methods commonly employed in prior research. In the feature selection phase, both the maximal x2¡@statistic method and the weighted average x2¡@statistic method of x2¡@statistic are evaluated. We consider the Binary and TFxIDF representation schemes in the document representation phase. Finally, we adopt the decision tree induction technique and the support vector machines (SVM) technique for inducing a text categorization model for our target e-government application. Our empirical evaluation results show that the text categorization method that employs the maximal x2 statistic method for feature selection, the Binary representation scheme, and the support vector machines as the underlying induction algorithm can reach an accuracy rate of 77.28% and an recall and precision rates of more than 77%. Such satisfactory classification effectiveness suggests that the text categorization approach can be employed to establish an effective and intelligent e-government service portal. Decision Tree Induction Support Vector Machines E-government Text categorization
37	SVM-based Robust Template Design of Cellular Neural Networks and Primary Study of Wilcoxon Learning Machines Lin, Yih-Lon 01 January 2007 (has links) This thesis is divided into two parts. In the first part, a general problem of the robust template decomposition with restricted weights for cellular neural networks (CNNs) implementing an arbitrary Boolean function is investigated. In the second part, some primary study of the novel Wilcoxon learning machines is made. In the first part of the thesis for the robust CNN template design, the geometric margin of a linear classifier with respect to a training data set, a notion borrowed from the machine learning theory, is used to define the robustness of an uncoupled CNN implementing a linearly separable Boolean function. Consequently, the so-called maximal margin classifiers can be devised via support vector machines (SVMs) to provide the most robust template design for uncoupled CNNs implementing linearly separable Boolean functions. Some general properties of robust CNNs with or without restricted weights are discussed. Moreover, all robust CNNs with restricted weights are characterized. For an arbitrarily given Boolean function, we propose an algorithm, which is the generalized version of the well known CFC algorithm, to find a sequence of robust uncoupled CNNs implementing the given Boolean function. In the second part of the thesis, we investigate the novel Wilcoxon learning machines (WLMs). The invention of these learning machines was motivated by the Wilcoxon approach to linear regression problems in statistics. The resulting linear regressors are quits robust against outliers, as is well known in statistics. The Wilcoxon learning machines investigated in this thesis include Wilcoxon Neural Network (WNN), Wilcoxon Generalized Radial Basis Function Network (WGRBFN), Wilcoxon Fuzzy Neural Network (WFNN), and Kernel-based Wilcoxon Regressor (KWR). Cellular Neural Networks Support Vector Machines Wilcoxon Learning Machines
38	MaltParser -- An Architecture for Inductive Labeled Dependency Parsing Hall, Johan January 2006 (has links) <p>This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).</p><p>The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.</p> / <p>Denna licentiatavhandling presenterar en mjukvaruarkitektur för</p><p>datadriven dependensparsning, dvs. för att automatiskt skapa en</p><p>syntaktisk analys i form av dependensgrafer för meningar i texter</p><p>på naturligt språk. Arkitekturen bygger på idén att man ska kunna variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Till grund för denna arkitektur har vi använt det teoretiska ramverket för induktiv dependensparsning presenterat av Nivre \citeyear{nivre06c}. Arkitekturen har realiserats i programvaran MaltParser, där det är möjligt att definiera komplexa särdragsmodeller i ett speciellt beskrivningsspråk. I denna avhandling kommer vi att lägga extra tyngd vid att beskriva hur vi har integrerat inlärningsmetoden supportvektor-maskiner (SVM).</p><p>MaltParser valideras med tre experimentserier, där data från tre språk används (kinesiska, engelska och svenska). I den första experimentserien kontrolleras om implementationen realiserar den underliggande arkitekturen. Experimenten visar att MaltParser utklassar en trivial metod för dependensparsning (\emph{eng}. baseline) och de grundläggande kraven på välformade dependensgrafer uppfylls. Dessutom visar experimenten att det är möjligt att variera parsningsalgoritm, särdragsmodell och inlärningsmetod oberoende av varandra. Den andra experimentserien fokuserar på de speciella egenskaperna för SVM-gränssnittet. Experimenten visar att det är möjligt att reducera inlärnings- och parsningstiden utan att förlora i parsningskorrekthet genom att dela upp träningsdata enligt ordklasstaggen för nästa ord i nuvarande parsningskonfiguration. Den tredje och sista experimentserien presenterar en empirisk undersökning som jämför SVM med minnesbaserad inlärning (MBL). Studien använder sig av fem särdragsmodeller, där alla kombinationer av språk, inlärningsmetod och särdragsmodell</p><p>har genomgått omfattande parameteroptimering. Experimenten visar att SVM överträffar MBL för mer komplexa och lexikaliserade särdragsmodeller med avseende på parsningskorrekthet. Det finns även vissa indikationer på att SVM, med en uppdelningsstrategi, kan parsa en text snabbare än MBL. För svenska kan vi rapportera den högsta parsningskorrektheten hittills och för kinesiska och engelska är resultaten nära de bästa som har rapporterats.</p> Dependency Parsing Support Vector Machines Machine Learning Language technology Språkteknologi
39	Computational Prediction of Transposon Insertion Sites Ayat, Maryam 04 April 2013 (has links) Transposons are DNA segments that can move or transpose themselves to new positions within the genome of an organism. Biologists need to predict preferred insertion sites of transposons to devise strategies in functional genomics and gene therapy studies. It has been found that the deformability property of the local DNA structure of the integration sites, called Vstep, is of significant importance in the target-site selection process. We considered the Vstep profiles of insertion sites and developed predictors based on Artificial Neural Networks (ANN) and Support Vector Machines (SVM). We trained our ANN and SVM predictors with the Sleeping Beauty transposonal data, and used them for identifying preferred individual insertion sites (each 12bp in length) and regions (each 100bp in length). Running a five-fold cross-validation showed that (1) Both ANN and SVM predictors are more successful in recognizing preferred regions than preferred individual sites; (2) Both ANN and SVM predictors have excellent performance in finding the most preferred regions (more than 90% sensitivity and specificity); and (3) The SVM predictor outperforms the ANN predictor in recognizing preferred individual sites and regions. The SVM has 83% sensitivity and 72% specificity in identifying preferred individual insertion sites, and 85% sensitivity and 90% specificity in recognizing preferred insertion regions. Artificial Neural Networks Support Vector Machines Transposons Insertion Site Prediction
40	End-to-End Single-rate Multicast Congestion Detection Using Support Vector Machines. Liu, Xiaoming. January 2008 (has links) <p> <p>&nbsp / </p> </p> <p align="left">IP multicast is an efficient mechanism for simultaneously transmitting bulk data to multiple receivers. Many applications can benefit from multicast, such as audio and videoconferencing, multi-player games, multimedia broadcasting, distance education, and data replication. For either technical or policy reasons, IP multicast still has not yet been deployed in today&rsquo / s Internet. Congestion is one of the most important issues impeding the development and deployment of IP multicast and multicast applications.</p> Multicast Congestion Detection Machine Learning Accumulation Measurement Support Vector Machines.

Search results