401 |
Pénalités hiérarchiques pour l'ntégration de connaissances dans les modèles statistiquesSzafranski, Marie 21 November 2008 (has links) (PDF)
L'apprentissage statistique vise à prédire, mais aussi analyser ou interpréter un phénomène. Dans cette thèse, nous proposons de guider le processus d'apprentissage en intégrant une connaissance relative à la façon dont les caractéristiques d'un problème sont organisées. Cette connaissance est représentée par une structure arborescente à deux niveaux, ce qui permet de constituer des groupes distincts de caractéristiques. Nous faisons également l'hypothèse que peu de (groupes de) caractéristiques interviennent pour discriminer les observations. L'objectif est donc de faire émerger les groupes de caractéristiques pertinents, mais également les caractéristiques significatives associées à ces groupes. Pour cela, nous utilisons une formulation variationnelle de type pénalisation adaptative. Nous montrons que cette formulation conduit à minimiser un problème régularisé par une norme mixte. La mise en relation de ces deux approches offre deux points de vues pour étudier les propriétés de convexité et de parcimonie de cette méthode. Ces travaux ont été menés dans le cadre d'espaces de fonctions paramétriques et non paramétriques. L'intérêt de cette méthode est illustré sur des problèmes d'interfaces cerveaux-machines.
|
402 |
Efficient Kernel Methods For Large Scale ClassificationAsharaf, S 07 1900 (has links)
Classification algorithms have been widely used in many application domains. Most of these domains deal with massive collection of data and hence demand classification algorithms that scale well with the size of the data sets involved. A classification algorithm is said to be scalable if there is no significant increase in time and space requirements for the algorithm (without compromising the generalization performance) when dealing with an increase in the training set size. Support Vector Machine (SVM) is one of the most celebrated kernel based classification methods used in Machine Learning. An SVM capable of handling large scale classification problems will definitely be an ideal candidate in many real world applications. The training process involved in SVM classifier is usually formulated as a Quadratic Programing(QP) problem. The existing solution strategies for this problem have an associated time and space complexity that is (at least) quadratic in the number of training points. This makes the SVM training very expensive even on classification problems having a few thousands of training examples.
This thesis addresses the scalability of the training algorithms involved in both two class and multiclass Support Vector Machines. Efficient training schemes reducing the space and time requirements of the SVM training process are proposed as possible solutions. The classification schemes discussed in the thesis for handling large scale two class classification problems are a) Two selective sampling based training schemes for scaling Non-linear SVM and b) Clustering based approaches for handling unbalanced data sets with Core Vector Machine. To handle large scale multicalss classification problems, the thesis proposes Multiclass Core Vector Machine (MCVM), a scalable SVM based multiclass classifier. In MVCM, the multiclass SVM problem is shown to be equivalent to a Minimum Enclosing Ball (MEB) problem and is then solved using a fast approximate MEB finding algorithm. Experimental studies were done with several large real world data sets such as IJCNN1 and Acoustic data sets from LIBSVM page, Extended USPS data set from CVM page and network intrusion detection data sets of DARPA, US Defense used in KDD 99 contest. From the empirical results it is observed that the proposed classification schemes achieve good generalization performance at low time and space requirements. Further, the scalability experiments done with large training data sets have demonstrated that the proposed schemes scale well. A novel soft clustering scheme called Rough Support Vector Clustering (RSVC) employing the idea of Soft Minimum Enclosing Ball Problem (SMEB) is another contribution discussed in this thesis. Experiments done with a synthetic data set and the real world data set namely IRIS, have shown that RSVC finds meaningful soft cluster abstractions.
|
403 |
Saillance de la signalisation verticale dans les images routières : étude de la faisabilité d'un outil de diagnosticSimon, Ludovic 07 December 2009 (has links) (PDF)
La signalisation routière a un rôle dans la sécurité et l'exploitation des infrastructures. Les panneaux de signalisation doivent être suffisamment saillants pour attirer l'attention du conducteur. Nous proposons dans le cadre de cette thèse, d'étudier la faisabilité d'un algorithme d'estimation automatique de la saillance de la signalisation verticale dans les images routières pour le diagnostic des réseaux routiers, via une caméra embarquée dans un véhicule.Notre paradigme est de s'appuyer sur les valeurs de confiance d'un algorithme d'apprentissage, nommé << Support Vector Machines >>, pour modéliser la saillance de recherche d'un objet défini : un (ensemble de) panneau(x) de police. Nous avons réalisé une étude statistique sur des données issues d'une expérimentation d'oculométrie cognitive. La corrélation du modèle avec les performances visuelles humaines en situation proche de la conduite prouve ses qualités afin de mesurer la saillance de la signalisation verticale.
|
404 |
Automated Ice-Water Classification using Dual Polarization SAR ImageryLeigh, Steve January 2013 (has links)
Mapping ice and open water in ocean bodies is important for numerous purposes including environmental analysis and ship navigation. The Canadian Ice Service (CIS) currently has several expert ice analysts manually generate ice maps on a daily basis. The CIS would like to augment their current process with an automated ice-water discrimination algorithm capable of operating on dual-pol synthetic aperture radar (SAR) images produced by RADARSAT-2. Automated methods can provide mappings in larger volumes, with more consistency, and in finer resolutions that are otherwise impractical to generate.
We have developed such an automated ice-water discrimination system called MAGIC. The algorithm first classifies the HV scene using the glocal method, a hierarchical region-based classification method. The glocal method incorporates spatial context information into the classification model using a modified watershed segmentation and a previously developed MRF classification algorithm called IRGS. Second, a pixel-based support vector machine (SVM) using a nonlinear RBF kernel classification is performed exploiting SAR grey-level co-occurrence matrix (GLCM) texture and backscatter features. Finally, the IRGS and SVM classification results are combined using the IRGS approach but with a modified energy function to accommodate the SVM pixel-based information.
The combined classifier was tested on 61 ground truthed dual-pol RADARSAT-2 scenes of the Beaufort Sea containing a variety of ice types and water patterns across melt, summer, and freeze-up periods. The average leave-one-out classification accuracy with respect to these ground truths is 95.8% and MAGIC attains an accuracy of 90% or above on 88% of the scenes. The MAGIC system is now under consideration by CIS for operational use.
|
405 |
Analysis And Classification Of Spelling Paradigm Eeg Data And An Attempt For Optimization Of Channels UsedYildirim, Asil 01 December 2010 (has links) (PDF)
Brain Computer Interfaces (BCIs) are systems developed in order to control devices by using only brain signals. In BCI systems, different mental activities to be performed by the users are associated with different actions on the device to be controlled. Spelling Paradigm is a BCI application which aims to construct the words by finding letters using P300 signals recorded via channel electrodes attached to the diverse points of the scalp. Reducing the letter detection error rates and increasing the speed of letter detection are crucial for Spelling Paradigm. By this way, disabled people can express their needs more easily using this application.
In this thesis, two different methods, Support Vector Machine (SVM) and AdaBoost, are used for classification in the analysis. Classification and Regression Trees is used as the weak classifier of the AdaBoost. Time-frequency domain characteristics of P300 evoked potentials are analyzed in addition to time domain characteristics. Wigner-Ville Distribution is used for transforming time domain signals into time-frequency domain. It is observed that classification results are better in time domain. Furthermore, optimum subset of channels that models P300 signals with minimum error rate is searched. A method that uses both SVM and AdaBoost is proposed to select channels. 12 channels are selected in time domain with this method. Also, effect of dimension reduction is analyzed using Principal Component Analysis (PCA) and AdaBoost methods.
|
406 |
Discovering Discussion Activity Flows in an On-line Forum Using Data Mining TechniquesHsieh, Lu-shih 22 July 2008 (has links)
In the Internet era, more and more courses are taught through a course management system (CMS) or learning management system (LMS). In an asynchronous virtual learning environment, an instructor has the need to beware the progress of discussions in forums, and may intervene if ecessary in order to facilitate students¡¦ learning. This research proposes a discussion forum activity flow tracking system, called FAFT (Forum Activity Flow Tracer), to utomatically monitor the discussion activity flow of threaded forum postings in CMS/LMS. As CMS/LMS is getting popular in facilitating learning activities, the proposedFAFT can be used to facilitate instructors to identify students¡¦ interaction types in discussion forums.
FAFT adopts modern data/text mining techniques to discover the patterns of forum discussion activity flows, which can be used for instructors to facilitate the online learning activities. FAFT consists of two subsystems: activity classification (AC) and activity flow discovery (AFD). A posting can be perceived as a type of announcement, questioning, clarification, interpretation, conflict, or assertion. AC adopts a cascade model to classify various activitytypes of posts in a discussion thread. The empirical evaluation of the classified types from a repository of postings in earth science on-line courses in a senior high school shows that AC can effectively facilitate the coding rocess, and the cascade model can deal with the imbalanced distribution nature of discussion postings.
AFD adopts a hidden Markov model (HMM) to discover the activity flows. A discussion activity flow can be presented as a hidden Markov model (HMM) diagram that an instructor can adopt to predict which iscussion activity flow type of a discussion thread may be followed. The empirical results of the HMM from an online forum in earth science subject in a senior high school show that FAFT can effectively predict the type of a discussion activity flow. Thus, the proposed FAFT can be embedded in a course management system to automatically predict the activity flow type of a discussion thread, and in turn reduce the teachers¡¦ loads on managing online discussion forums.
|
407 |
Constraint-driven RF test stimulus generation and built-in testAkbay, Selim Sermet 09 December 2009 (has links)
With the explosive growth in wireless applications, the last decade witnessed an ever-increasing test challenge for radio frequency (RF) circuits. While the design community has pushed the envelope far into the future, by expanding CMOS process to be used with high-frequency wireless devices, test methodology has not advanced at the same pace. Consequently, testing such devices has become a major bottleneck in high-volume production, further driven by the growing need for tighter quality control.
RF devices undergo testing during the prototype phase and during high-volume manufacturing (HVM). The benchtop test equipment used throughout prototyping is very precise yet specialized for a subset of functionalities. HVM calls for a different kind of test paradigm that emphasizes throughput and sufficiency, during which the projected performance parameters are measured one by one for each device by automated test equipment (ATE) and compared against defined limits called specifications. The set of tests required for each product differs greatly in terms of the equipment required and the time taken to test individual devices. Together with signal integrity, precision, and repeatability concerns, the initial cost of RF ATE is prohibitively high. As more functionality and protocols are integrated into a single RF device, the required number of specifications to be tested also increases, adding to the overall cost of testing, both in terms of the initial and recurring operating costs.
In addition to the cost problem, RF testing proposes another challenge when these components are integrated into package-level system solutions. In systems-on-packages (SOP), the test problems resulting from signal integrity, input/output bandwidth (IO), and limited controllability and observability have initiated a paradigm shift in high-speed analog testing, favoring alternative approaches such as built-in tests (BIT) where the test functionality is brought into the package. This scheme can make use of a low-cost external tester connected through a low-bandwidth link in order to perform demanding response evaluations, as well as make use of the analog-to-digital converters and the digital signal processors available in the package to facilitate testing. Although research on analog built-in test has demonstrated hardware solutions for single specifications, the paradigm shift calls for a rather general approach in which a single methodology can be applied across different devices, and multiple specifications can be verified through a single test hardware unit, minimizing the area overhead.
Specification-based alternate test methodology provides a suitable and flexible platform for handling the challenges addressed above. In this thesis, a framework that integrates ATE and system constraints into test stimulus generation and test response extraction is presented for the efficient production testing of high-performance RF devices using specification-based alternate tests. The main components of the presented framework are as follows:
Constraint-driven RF alternate test stimulus generation: An automated test stimulus generation algorithm for RF devices that are evaluated by a specification-based alternate test solution is developed. The high-level models of the test signal path define constraints in the search space of the optimized test stimulus. These models are generated in enough detail such that they inherently define limitations of the low-cost ATE and the I/O restrictions of the device under test (DUT), yet they are simple enough that the non-linear optimization problem can be solved empirically in a reasonable amount of time.
Feature extractors for BIT: A methodology for the built-in testing of RF devices integrated into SOPs is developed using additional hardware components. These hardware components correlate the high-bandwidth test response to low bandwidth signatures while extracting the test-critical features of the DUT. Supervised learning is used to map these extracted features, which otherwise are too complicated to decipher by plain mathematical analysis, into the specifications under test.
Defect-based alternate testing of RF circuits: A methodology for the efficient testing of RF devices with low-cost defect-based alternate tests is developed. The signature of the DUT is probabilistically compared with a class of defect-free device signatures to explore possible corners under acceptable levels of process parameter variations. Such a defect filter applies discrimination rules generated by a supervised classifier and eliminates the need for a library of possible catastrophic defects.
|
408 |
Classification Dynamique de données non-stationnaires :<br />Apprentissage et Suivi de Classes évolutivesAmadou Boubacar, Habiboulaye 28 June 2006 (has links) (PDF)
La plupart des processus naturels ou artificiels ont des comportements évolutifs décrits par des données non-stationnaires. La problématique étudiée dans cette thèse concerne la classification dynamique de données non-stationnaires. Nous proposons une description générique de classifieurs dynamiques conçue à l'aide d'un réseau neuronal à architecture évolutive. Elle est élaborée en quatre procédures d'apprentissage : création, adaptation, fusion, et évaluation. Deux algorithmes sont développés à partir de cette description générique. Le premier est une nouvelle version de l'algorithme AUDyC (AUto-adaptive and Dynamical Clustering). Il utilise un modèle de mélange décrit suivant l'approche multimodale. Le second, nommé SAKM (Self-Adaptive Kernel Machine), est basé sur les SVM et méthodes à noyau. Ces deux algorithmes sont dotés de règles de mise à jour récursives permettant la modélisation adaptative et le suivi de classes évolutives. Ils disposent de capacités d'auto-adaptation en environnement dynamique et de bonnes performances en terme de convergence et de complexité algorithmique. Ces dernières sont prouvées théoriquement et montrées par la simulation des algorithmes.
|
409 |
運用支持向量機和決策樹預測台指期走勢 / Predicting Taiwan Stock Index Future Trend Using SVM and Decision Tree吳永樂, Wu, Yong Le Unknown Date (has links)
本研究利用479個全球指標對台指期建立預測模型。該模型可以預測台指期在未來K天的漲跌走勢。我們使用了兩種演算法(支持向量機和決策樹)以及兩種取樣方式(交叉驗證和移動視窗)進行預測。在交叉驗證的建模過程中,決策樹展現了較高的預測力,最高準確度達到了93.4%。在移動視窗的建模過程中,支持向量機表現較好,達到了79.97%的預測准確度。於此同時,不管是哪一種條件設定都表明當我們預測的週期拉長時,預測的效果相對較好。這說明全球市場對台灣市場的影響很大,但是需要一定的市場反應時間。該研究結果對投資人有一定的參考作用。在未來方向裡,可以嘗試使用改進的決策樹演算法,也可以結合回歸預測進行深入研究。 / In this research, we build a stock price direction forecasting model with Taiwan Stock Index Future (TXF). The input data we used is 479 global indices. The classification algorithms we used are SVM and Decision Tree. This model can predict the up and down trend in the next k days. In the model building process, both cross validation and moving window are taking into account. As for the time period, both short term prediction (i.e. 1 day) and long term prediction (i.e. 100 days) are tested for comparison. The results showed that cross validation performs best with 93.4% in precision, and moving window reached 79.97% in precision when we use the last 60 days historical data to predict the up and down trend in the next 20 days. The results imply Taiwan stock market is significantly influenced by the global market in the long run. This finding could be further used by investors and also be studied with regression algorithms as a combination model to enhance its performance.
|
410 |
Automated Ice-Water Classification using Dual Polarization SAR ImageryLeigh, Steve January 2013 (has links)
Mapping ice and open water in ocean bodies is important for numerous purposes including environmental analysis and ship navigation. The Canadian Ice Service (CIS) currently has several expert ice analysts manually generate ice maps on a daily basis. The CIS would like to augment their current process with an automated ice-water discrimination algorithm capable of operating on dual-pol synthetic aperture radar (SAR) images produced by RADARSAT-2. Automated methods can provide mappings in larger volumes, with more consistency, and in finer resolutions that are otherwise impractical to generate.
We have developed such an automated ice-water discrimination system called MAGIC. The algorithm first classifies the HV scene using the glocal method, a hierarchical region-based classification method. The glocal method incorporates spatial context information into the classification model using a modified watershed segmentation and a previously developed MRF classification algorithm called IRGS. Second, a pixel-based support vector machine (SVM) using a nonlinear RBF kernel classification is performed exploiting SAR grey-level co-occurrence matrix (GLCM) texture and backscatter features. Finally, the IRGS and SVM classification results are combined using the IRGS approach but with a modified energy function to accommodate the SVM pixel-based information.
The combined classifier was tested on 61 ground truthed dual-pol RADARSAT-2 scenes of the Beaufort Sea containing a variety of ice types and water patterns across melt, summer, and freeze-up periods. The average leave-one-out classification accuracy with respect to these ground truths is 95.8% and MAGIC attains an accuracy of 90% or above on 88% of the scenes. The MAGIC system is now under consideration by CIS for operational use.
|
Page generated in 0.0221 seconds