Global ETD Search

1	Customizing kernels in Support Vector Machines Zhang, Zhanyang 18 May 2007 (has links) Support Vector Machines have been used to do classification and regression analysis. One important part of SVMs are the kernels. Although there are several widely used kernel functions, a carefully designed kernel will help to improve the accuracy of SVMs. We present two methods in terms of customizing kernels: one is combining existed kernels as new kernels, the other one is to do feature selection. We did theoretical analysis in the interpretation of feature spaces of combined kernels. Further an experiment on a chemical data set showed improvements of a linear-Gaussian combined kernel over single kernels. Though the improvements are not universal, we present a new idea of creating kernels in SVMs. classification SVMs kernels Statistics
2	Customizing kernels in Support Vector Machines Zhang, Zhanyang 18 May 2007 (has links) Support Vector Machines have been used to do classification and regression analysis. One important part of SVMs are the kernels. Although there are several widely used kernel functions, a carefully designed kernel will help to improve the accuracy of SVMs. We present two methods in terms of customizing kernels: one is combining existed kernels as new kernels, the other one is to do feature selection. We did theoretical analysis in the interpretation of feature spaces of combined kernels. Further an experiment on a chemical data set showed improvements of a linear-Gaussian combined kernel over single kernels. Though the improvements are not universal, we present a new idea of creating kernels in SVMs. classification SVMs kernels Statistics
3	Age and Gender Recognition for Speech Applications based on Support Vector Machines Erokyar, Hasan 30 October 2014 (has links) Automatic age and gender recognition for speech applications is very important for a number of reasons. One of the reasons is that it can improve human-machine interaction. For example, the advertisements can be specialized based on the age and the gender of the person on the phone. It also can help identify suspects in criminal cases or at least it can minimize the number of suspects. Some other uses of this system can be applied for adaptation of waiting queue music where a different type of music can be played according to the person's age and gender. And also using this age and gender recognition system, the statistics about age and gender information for a specific population can be learned. Machine learning is part of artificial intelligence which aims to learn from data. Machine Learning has a long history. But due to some limitations, for ex. , the cost of computation and due to some inefficient algorithms, it was not applied to speech recognition tasks. Only for a decade, researchers started to apply these algorithms to some real world tasks, for ex., speech recognition, computer vision, finance, banking, robotics etc. In this thesis, recognition of age and gender was done using a popular machine learning algorithm and the performance of the system was compared. Also the dataset included real -life examples, so that the system is adaptable to real world applications. To remove the noise and to get the features of speech examples, some digital signal processing techniques were used. Useful speech features that were used in this work were: pitch frequency and cepstral representations. The performance of the age and gender recognition system depends on the speech features used. As the first speech feature, the fundamental frequency was selected. Fundamental frequency is the main differentiating factor between male and female speakers. Also, fundamental frequency for each age group is different. So in order to build age and gender recognition system, fundamental frequency was used. To get the fundamental frequency of speakers, harmonic to sub harmonic ratio method was used. The speech was divided into frames and fundamental frequency for each frame was calculated. In order to get the fundamental frequency of the speaker, the mean value of all the speech frames were taken. It turns out that, fundamental frequency is not only a good discriminator gender, but also it is a good discriminator of age groups simply because there is a distinction between age groups and the fundamental frequencies. Mel Frequency Cepstral Coefficients (MFCC) is a good feature for speech recognition and so it was selected. Using MFCC, the age and gender recognition accuracies were satisfactory. As an alternative to MFCC, Shifted Delta Cepstral (SDC) was used as a speech feature. SDC is extracted using MFCC and the advantage of SDC is that, it is more robust under noisy data. It captures the essential information in noisy speech better. From the experiments, it was seen that SDC did not give better recognition rates because the dataset did not contain too much noise. Lastly, a combination of pitch and MFCC was used to get even better recognition rates. The final fused system has an overall recognition value of 64.20% on ELSDSR [32] speech corpus. MFCC Pitch Pre-Processing SDC SVMs Electrical and Computer Engineering
4	Efficient Algorithms for Structured Output Learning Balamurugan, P January 2014 (has links) (PDF) Structured output learning is the machine learning task of building a classiﬁer to predict structured outputs. Structured outputs arise in several contexts in diverse applications like natural language processing, computer vision, bioinformatics and social networks. Unlike the simple two(or multi)-class outputs which belong to a set of distinct or univariate categories, structured outputs are composed of multiple components with complex interdependencies amongst them. As an illustrative example ,consider the natural language processing task of tagging a sentence with its corresponding part-of-speech tags. The part-of-speech tag sequence is an example of a structured output as it is made up of multiple components, the interactions among them being governed by the underlying properties of the language. This thesis provides eﬃcient solutions for diﬀerent problems pertaining to structured output learning. The classiﬁer for structured outputs is generally built by learning a suitable model from a set of training examples labeled with their associated structured outputs. Discriminative techniques like Structural Support Vector Machines(Structural SVMs) and Conditional Random Fields(CRFs) are popular alternatives developed for structured output learning. The thesis contributes towards developing eﬃcient training strategies for structural SVMs. In particular, an eﬃcient sequential optimization method is proposed for structural SVMs, which is faster than several competing methods. An extension of the sequential method to CRFs is also developed. The sequential method is adapted to a variant of structural SVM with linear cumulative loss. The thesis also presents a systematic empirical evaluation of various training methods available for structured output learning, which will be useful to the practitioner. To train structural SVMs in the presence of a vast number of training examples without labels, the thesis develops a simple semi-supervised technique based on switching the labels of the components of the structured output. The proposed technique is general and its eﬃcacy is demonstrated using experiments on diﬀerent benchmark applications. Another contribution of the thesis is towards the design of fast algorithms for sparse structured output learning. Eﬃcient alternating optimization algorithms are developed for sparse classiﬁer design. These algorithms are shown to achieve sparse models faster, when compared to existing methods. Structured Output Learning Structured Output Learning Algorithms Machine learning Structural Support Vector Machines Sparse Structured Output Learning Sequential Dual Methods Semi-supervised Structural SVMs Structural SVMs Sparse Structural SVMs Computer Science
5	Nouveaux Algorithmes pour l'Apprentissage de Machines à Vecteurs Supports sur de Grandes Masses de Données Bordes, Antoine 09 February 2010 (has links) (PDF) Internet ainsi que tous les moyens numériques modernes disponibles pour communiquer, s'informer ou se divertir génèrent des données en quantités de plus en plus importantes. Dans des domaines aussi variés que la recherche d'information, la bio-informatique, la linguistique computationnelle ou la sécurité numérique, des méthodes automatiques capables d'organiser, classiﬁer, ou transformer des téraoctets de données apportent une aide précieuse. L'apprentissage artiﬁciel traite de la conception d'algorithmes qui permettent d'entraîner de tels outils à l'aide d'exemples d'apprentissage. Utiliser certaines de ces méthodes pour automatiser le traitement de problèmes complexes, en particulier quand les quantités de données en jeu sont insurmontables pour des opérateurs humains, paraît inévitable. Malheureusement, la plupart des algorithmes d'apprentissage actuels, bien qu'efficaces sur de petites bases de données, présentent une complexité importante qui les rend inutilisables sur de trop grandes masses de données. Ainsi, il existe un besoin certain dans la communauté de l'apprentissage artiﬁciel pour des méthodes capables d'être entraînées sur des ensembles d'apprentissage de grande échelle, et pouvant ainsi gérer les quantités colossales d'informations générées quotidiennement. Nous développons ces enjeux et déﬁs dans le Chapitre 1. Dans ce manuscrit, nous proposons des solutions pour réduire le temps d'entraînement et les besoins en mémoire d'algorithmes d'apprentissage sans pour autant dégrader leur précision. Nous nous intéressons en particulier aux Machines à Vecteurs Supports (SVMs), des méthodes populaires utilisées en général pour des tâches de classiﬁcation automatique mais qui peuvent être adaptées à d'autres applications. Nous décrivons les SVMs en détail dans le Chapitre 2. Ensuite, dans le Chapitre 3, nous étudions le processus d'apprentissage par descente de gradient stochastique pour les SVMs linéaires. Cela nous amène à déﬁnir et étudier le nouvel algorithme, SGD-QN. Après cela, nous introduisons une nouvelle procédure d'apprentissage : le principe du “Process/Reprocess”. Nous déclinons alors trois algorithmes qui l'utilisent. Le Huller et LaSVM sont présentés dans le Chapitre 4. Ils servent à apprendre des SVMs destinés à traiter des problèmes de classiﬁcation binaire (décision entre deux classes). Pour la tˆache plus complexe de prédiction de sorties structurées, nous modiﬁons par la suite en profondeur l'algorithme LaSVM, ce qui conduit à l'algorithme LaRank présenté dans le Chapitre 5. Notre dernière contribution concerne le problème récent de l'apprentissage avec une supervision ambigüe pour lequel nous proposons un nouveau cadre théorique (et un algorithme associé) dans le Chapitre 6. Nous l'appliquons alors au problème de l'étiquetage sémantique du langage naturel. Tous les algorithmes introduits dans cette thèse atteignent les performances de l'état-de-l'art, en particulier en ce qui concerne les vitesses d'entraînement. La plupart d'entre eux ont été publiés dans des journaux ou actes de conférences internationaux. Des implantations efficaces de chaque méthode ont également été rendues disponibles. Dans la mesure du possible, nous décrivons nos nouveaux algorithmes de la manière la plus générale possible aﬁn de faciliter leur application à des tâches nouvelles. Nous esquissons certaines d'entre elles dans le Chapitre 7. [INFO] Computer Science Machines à Vecteurs Supports (SVMs) apprentissage statistique traitement automatique de l'information algorithmes d'apprentissage
6	Fundamental Issues in Support Vector Machines McWhorter, Samuel P. 05 1900 (has links) This dissertation considers certain issues in support vector machines (SVMs), including a description of their construction, aspects of certain exponential kernels used in some SVMs, and a presentation of an algorithm that computes the necessary elements of their operation with proof of convergence. In its first section, this dissertation provides a reasonably complete description of SVMs and their theoretical basis, along with a few motivating examples and counterexamples. This section may be used as an accessible, stand-alone introduction to the subject of SVMs for the advanced undergraduate. Its second section provides a proof of the positive-definiteness of a certain useful function here called E and dened as follows: Let V be a complex inner product space. Let N be a function that maps a vector from V to its norm. Let p be a real number between 0 and 2 inclusive and for any in V , let ( be N() raised to the p-th power. Finally, let a be a positive real number. Then E() is exp(()). Although the result is not new (other proofs are known but involve deep properties of stochastic processes) this proof is accessible to advanced undergraduates with a decent grasp of linear algebra. Its final section presents an algorithm by Dr. Kallman (preprint), based on earlier Russian work by B.F. Mitchell, V.F Demyanov, and V.N. Malozemov, and proves its convergence. The section also discusses briefly architectural features of the algorithm expected to result in practical speed increases. Support vector machines SVMs algorithm exponential kernel elementary proof Support vector machines. Algorithms.
7	From protein sequence to structural instability and disease Wang, Lixiao January 2010 (has links) A great challenge in bioinformatics is to accurately predict protein structure and function from its amino acid sequence, including annotation of protein domains, identification of protein disordered regions and detecting protein stability changes resulting from amino acid mutations. The combination of bioinformatics, genomics and proteomics becomes essential for the investigation of biological, cellular and molecular aspects of disease, and therefore can greatly contribute to the understanding of protein structures and facilitating drug discovery. In this thesis, a PREDICTOR, which consists of three machine learning methods applied to three different but related structure bioinformatics tasks, is presented: using profile Hidden Markov Models (HMMs) to identify remote sequence homologues, on the basis of protein domains; predicting order and disorder in proteins using Conditional Random Fields (CRFs); applying Support Vector Machines (SVMs) to detect protein stability changes due to single mutation. To facilitate structural instability and disease studies, these methods are implemented in three web servers: FISH, OnD-CRF and ProSMS, respectively. For FISH, most of the work presented in the thesis focuses on the design and construction of the web-server. The server is based on a collection of structure-anchored hidden Markov models (saHMM), which are used to identify structural similarity on the protein domain level. For the order and disorder prediction server, OnD-CRF, I implemented two schemes to alleviate the imbalance problem between ordered and disordered amino acids in the training dataset. One uses pruning of the protein sequence in order to obtain a balanced training dataset. The other tries to find the optimal p-value cut-off for discriminating between ordered and disordered amino acids. Both these schemes enhance the sensitivity of detecting disordered amino acids in proteins. In addition, the output from the OnD-CRF web server can also be used to identify flexible regions, as well as predicting the effect of mutations on protein stability. For ProSMS, we propose, after careful evaluation with different methods, a clustered by homology and a non-clustered model for a three-state classification of protein stability changes due to single amino acid mutations. Results for the non-clustered model reveal that the sequence-only based prediction accuracy is comparable to the accuracy based on protein 3D structure information. In the case of the clustered model, however, the prediction accuracy is significantly improved when protein tertiary structure information, in form of local environmental conditions, is included. Comparing the prediction accuracies for the two models indicates that the prediction of mutation stability of proteins that are not homologous is still a challenging task. Benchmarking results show that, as stand-alone programs, these predictors can be comparable or superior to previously established predictors. Combined into a program package, these mutually complementary predictors will facilitate the understanding of structural instability and disease from protein sequence. protein domain remote homologue protein function point mutation protein family protein stability HMMs CRFs SVMs
8	A Segment-based Approach To Classify Agricultural Lands Using Multi-temporal Kompsat-2 And Envisat Asar Data Ozdarici Ok, Asli 01 February 2012 (has links) (PDF) Agriculture has an important role in Turkey / hence automated approaches are crucial to maintain sustainability of agricultural activities. The objective of this research is to classify eight crop types cultivated in Karacabey Plain located in the north-west of Turkey using multi-temporal Kompsat-2 and Envisat ASAR satellite data. To fulfill this objective, first, the fused Kompsat-2 images were segmented separately to define homogenous agricultural patches. The segmentation results were evaluated using multiple goodness measures to find the optimum segments. Next, multispectral single-date Kompsat-2 images with the Envisat ASAR data were classified by MLC and SVMs algorithms. To combine the thematic information of the multi-temporal data set, probability maps were generated for each classification result and the accuracies of the thematic maps were then evaluated using segment-based manner. The results indicated that the segment-based approach based on the SVMs method using the multispectral Kompsat-2 and Envisat ASAR data provided the best classification accuracies. The combined thematic maps of June-August and June-July-August provided the highest overall accuracy and kappa value around 92% and 0.90, respectively, which was 4% better than the highest result computed with the MLC method. The produced thematic maps were also evaluated based on field-based manner and the analysis revealed that the classification performances are directly proportional to the size of the agricultural fields.
9	Σύγχρονες τεχνικές στις διεπαφές ανθρώπινου εγκεφάλου - υπολογιστή Τσιλιγκιρίδης, Βασίλειος 16 June 2011 (has links) Τα συστήματα διεπαφών ανθρώπινου εγκεφάλου-υπολογιστή (BCIs: Brain-Computer Interfaces) απαιτούν την πραγματικού χρόνου, αποτελεσματική επεξεργασία των μετρήσεων των ηλεκτροεγκεφαλογραφικών (ΗΕΓ) σημάτων του χρήστη τους, προκειμένου να μεταφράσουν τις νοητικές διεργασίες/προθέσεις του σε σήματα ελέγχου εξωτερικών διατάξεων ή συστημάτων. Στο πλαίσιο της εργασίας αυτής μελετήθηκε το θεωρητικό υπόβαθρο του προβλήματος και αναλύθηκαν συνοπτικά οι κυριότερες τεχνικές που χρησιμοποιούνται σήμερα. Επιπρόσθετα, παρουσιάστηκε μία μέθοδος ταξινόμησης των νοητικών προθέσεων της αριστερής και δεξιάς κίνησης των χεριών ενός χρήστη η οποία εφαρμόστηκε σε πραγματικά ιατρικά δεδομένα. Η εξαγωγή των χαρακτηριστικών που διαφοροποιούνται μεταξύ των δύο καταστάσεων βασίστηκε σε πληροφορίες του πεδίου χρόνου-συχνότητας, οι οποίες αντλούνται με το φιλτράρισμα των ακατέργαστων ΗΕΓ δεδομένων και με τη βοήθεια των αιτιατών κυματιδίων Morlet, ενώ για την επακόλουθη ταξινόμηση των χαρακτηριστικών αναπτύχθηκαν και συγκρίθηκαν δύο αξιόπιστες μέθοδοι. Η πρώτη αφορά στη δημιουργία πιθανοθεωρητικών προτύπων κανονικής κατανομής για κάθε κατηγορία πρόθεσης κίνησης, με την τελική απόφαση ταξινόμησης να λαμβάνεται με εφαρμογή του απλού ταξινομητή του Bayes, ενώ η δεύτερη δημιουργεί ένα πρότυπο ταξινόμησης με βάση το θεωρητικό πλαίσιο των Μηχανών Διανυσμάτων Υποστήριξης (SVM). Στόχος του προβλήματος της δυαδικής ταξινόμησης είναι να αποφασίζεται σε ποια από τις δύο κατηγορίες ανήκει μία δεδομένη νοητική πρόθεση όσο το δυνατόν ταχύτερα και αξιόπιστα, έτσι ώστε ο σχεδιαζόμενος αλγόριθμος να εξυπηρετήσει ένα πλαίσιο ανατροφοδότησης της τελικής απόφασης στο χρήστη σε συνθήκες πραγματικού χρόνου. / Brain-Computer Interfaces (BCIs) demand the efficient processing of EEG data in order to translate one's thought or wish into a control signal that can be applied as input to external devices. Here we present a method to classify left from right hand movements, by extracting features from the data with Morlet wavelets and classifying with two different models, SVMs and Naive Bayes Classifier. Κυματίδια Morlet Απλός ταξινομητής Bayes 573.860 113 Brain - computer interface Morlet wavelets Naive Bayes classifier Support vector machines (SVMs)
10	Textual data mining applications for industrial knowledge management solutions Ur-Rahman, Nadeem January 2010 (has links) In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes. 020

Search results