Global ETD Search

371	Multi color space LBP-based feature selection for texture classification / Sélection d'attributs multi-espace à partir de motifs binaires locaux pour la classification de textures couleur Truong Hoang, Vinh 15 February 2018 (has links) L'analyse de texture a été largement étudiée dans la littérature et une grande variété de descripteurs de texture ont été proposés. Parmi ceux-ci, les motifs binaires locaux (LBP) occupent une part importante dans la plupart des applications d'imagerie couleur ou de reconnaissance de formes et sont particulièrement exploités dans les problèmes d'analyse de texture. Généralement, les images couleurs acquises sont représentées dans l'espace colorimétrique RGB. Cependant, il existe de nombreux espaces couleur pour la classification des textures, chacun ayant des propriétés spécifiques qui impactent les performances. Afin d'éviter la difficulté de choisir un espace pertinent, la stratégie multi-espace couleur permet d'utiliser simultanémentles propriétés de plusieurs espaces. Toutefois, cette stratégie conduit à augmenter le nombre d'attributs, notamment lorsqu'ils sont extraits de LBP appliqués aux images couleur. Ce travail de recherche est donc axé sur la réduction de la dimension de l'espace d'attributs générés à partir de motifs binaires locaux par des méthodes de sélection d'attributs. Dans ce cadre, nous considérons l'histogramme des LBP pour la représentation des textures couleur et proposons des approches conjointes de sélection de bins et d'histogrammes multi-espace pour la classification supervisée de textures. Les nombreuses expériences menées sur des bases de référence de texture couleur, démontrent que les approches proposées peuvent améliorer les performances en classification comparées à l'état de l'art. / Texture analysis has been extensively studied and a wide variety of description approaches have been proposed. Among them, Local Binary Pattern (LBP) takes an essential part of most of color image analysis and pattern recognition applications. Usually, devices acquire images and code them in the RBG color space. However, there are many color spaces for texture classification, each one having specific properties. In order to avoid the difficulty of choosing a relevant space, the multi color space strategy allows using the properties of several spaces simultaneously. However, this strategy leads to increase the number of features extracted from LBP applied to color images. This work is focused on the dimensionality reduction of LBP-based feature selection methods. In this framework, we consider the LBP histogram and bin selection approaches for supervised texture classification. Extensive experiments are conducted on several benchmark color texture databases. They demonstrate that the proposed approaches can improve the state-of-the-art results. Classification de textures Espaces couleur Opérateur LBP Sélection d'attributs Sélection d'histogramme Sélection de bins Apprentissage supervisé Texture classification Color spaces LBP operator Feature selection Histogram selection Bin selection Supervised learning
372	Feature Selection and Classification Methods for Decision Making: A Comparative Analysis Villacampa, Osiris 01 January 2015 (has links) The use of data mining methods in corporate decision making has been increasing in the past decades. Its popularity can be attributed to better utilizing data mining algorithms, increased performance in computers, and results which can be measured and applied for decision making. The effective use of data mining methods to analyze various types of data has shown great advantages in various application domains. While some data sets need little preparation to be mined, whereas others, in particular high-dimensional data sets, need to be preprocessed in order to be mined due to the complexity and inefficiency in mining high dimensional data processing. Feature selection or attribute selection is one of the techniques used for dimensionality reduction. Previous research has shown that data mining results can be improved in terms of accuracy and efficacy by selecting the attributes with most significance. This study analyzes vehicle service and sales data from multiple car dealerships. The purpose of this study is to find a model that better classifies existing customers as new car buyers based on their vehicle service histories. Six different feature selection methods such as; Information Gain, Correlation Based Feature Selection, Relief-F, Wrapper, and Hybrid methods, were used to reduce the number of attributes in the data sets are compared. The data sets with the attributes selected were run through three popular classification algorithms, Decision Trees, k-Nearest Neighbor, and Support Vector Machines, and the results compared and analyzed. This study concludes with a comparative analysis of feature selection methods and their effects on different classification algorithms within the domain. As a base of comparison, the same procedures were run on a standard data set from the financial institution domain. Classification Data Mining Feature Selection Performance Measures Information science Business Business Intelligence Computer Sciences Management Information Systems Technology and Innovation Theory and Algorithms
373	Caractérisation et cartographie de la structure forestière à partir d'images satellitaires à très haute résolution spatiale / Quantification and mapping of forest structure from Very High Resolution (VHR) satellite images Beguet, Benoît 06 October 2014 (has links) Les images à très haute résolution spatiale (THR) telles que les images Pléiades (50 cm en Panchromatique, 2m en multispectral) rendent possible une description fine de la structure forestière (distribution et dimensions des arbres) à l'échelle du peuplement, en exploitant la relation entre la structure spatiale des arbres et la texture d'image quand la taille du pixel est inférieure à la dimension des arbres. Cette attente répond au besoin d'inventaire spatialisé de la ressource forestière à l'échelle du peuplement et de ses changements dus à la gestion forestière, à l'aménagement du territoire ou aux événements catastrophiques. L'objectif est double: (1) évaluer le potentiel de la texture d'images THR pour estimer les principales variables de structure forestière (diamètre des couronnes, diamètre du tronc, hauteur, densité ou espacement des arbres) à l'échelle du peuplement; (2) sur ces bases, classer les données image, au niveau pixel, par types de structure forestière afin de produire l'information spatialisée la plus fine possible. Les principaux développements portent sur l'automatisation du paramètrage, la sélection de variables, la modélisation par régression multivariable et une approche de classification par classifieurs d'ensemble (Forêts Aléatoires ou Random Forests). Ils sont testés et évalués sur deux sites de la forêt landaise de pin maritime à partir de trois images Pléiades et une Quickbird, acquises dans diverses conditions (saison, position du soleil, angles de visée). La méthodologie proposée est générique. La robustesse aux conditions d'acquisition des images est évaluée. Les résultats montrent que des variations fines de texture caractéristiques de celles de la structure forestière sont bien identifiables. Les performances en terme d'estimation des variables forestières (RMSE) : ~1.1 m pour le diamètre des couronnes, ~3 m pour la hauteur des arbres ou encore ~0.9 m pour leur espacement, ainsi qu'en cartographie des structures forestières (~82 % de taux de bonne classification pour la reconnaissance des 5 classes principales de la structure forestière) sont satisfaisantes d'un point de vue opérationnel. L'application à des images multi-annuelles permettra d'évaluer leur capacité à détecter et cartographier des changements tels que coupe forestière, mitage urbain ou encore dégâts de tempête. / Very High spatial Resolution (VHR) images like Pléiades imagery (50 cm panchromatic, 2m multispectral) allows a detailed description of forest structure (tree distribution and size) at stand level, by exploiting the spatial relationship between tree structure and image texture when the pixel size is smaller than tree dimensions. This information meets the expected strong need for spatial inventory of forest resources at the stand level and its changes due to forest management, land use or catastrophic events. The aim is twofold : (1) assess the VHR satellite images potential to estimate the main variables of forest structure from the image texture: crown diameter, stem diameter, height, density or tree spacing, (2) on these bases, a pixel-based image classification of forest structure is processed in order to produce the finest possible spatial information. The main developments concern parameter optimization, variable selection, multivariate regression modelling and ensemble-based classification (Random Forests). They are tested and evaluated on the Landes maritime pine forest with three Pléiades images and a Quickbird image acquired under different conditions (season, sun angle, view angle). The method is generic. The robustness of the proposed method to image acquisition parameters is evaluated. Results show that fine variations of texture characteristics related to those of forest structure are clearly identifiable. Performances in terms of forest variable estimation (RMSE): ~1,1m for crown diameter, ~3m for tree height and ~0,9m for tree spacing, as well as forest structure mapping (~82% Overall accuracy for the classification of the five main forest structure classes) are satisfactory from an operational perspective. Their application to multi- annual images will assess their ability to detect and map forest changes such as clear cut, urban sprawl or storm damages. Classification Sélection de variables Forêts aléatoires Texture Forêt Pléiades Très haute résolution spatiale Classification Feature selection Random forest Texture Forestry Pléiades Very high spatial resolution
374	Využití Bayesovských sítí pro predikci korporátních bankrotů / Corporate Bankruptcy Prediction Using Bayesian Classifiers Hátle, Lukáš January 2014 (has links) The aim of this study is to evaluate feasibility of using Bayes classifiers for predicting corporate bankruptcies. The results obtain show that Bayes classifiers do reach comparable results to then more commonly used methods such the logistic regression and the decision trees. The comparison has been carried out based on Czech and Polish data sets. The overall accuracy rate of these so called naive Bayes classifiers, using entropic discretization along with the hybrid pre-selection of the explanatory attributes, reaches 77.19 % for the Czech dataset and 79.76 % for the Polish set respectively. The AUC values for these data sets are 0.81 and 0.87. The results obtained for the Polish data set have been compared to the already published articles by Tsai (2009) and Wang et al. (2014) who applied different classification algorithms. The method proposed in my study, when compared to the above earlier works, comes out as quite successful. The thesis also includes comparing various approaches as regards the discretisation of numerical attributes and selecting the relevant explanatory attributes. These are the key issues for increasing performance of the naive Bayes classifiers
375	Metaheuristics for the feature selection problem : adaptive, memetic and swarm approaches / Métaheuristiques pour le problème de sélection d'attributs Esseghir, Mohamed Amir 29 November 2011 (has links) Afin d’améliorer la qualité de prédiction des techniques de classification automatique et de fouilles de données, plusieurs modèles ont été proposés dans la littérature en vue d’extraire des connaissances à partir des données. Toutefois, avec l’expansion des systèmes d’information et des technologies associées, ces techniques d’apprentissage s’avèrent de moins en moins adaptées aux nouvelles tailles et dimensions des données. On s’intéresse dans cette étude aux problèmes de grande dimensionnalité et à l’amélioration du processus d’apprentissage des méthodes de classification à travers les techniques de filtrage et de sélection d’attributs. Le problème « d’identification d’attributs pertinents » (Feature Selection Problem), tel qu’il est défini dans la littérature, relève d’une nature combinatoire. Dans le cadre de cette thèse, on s’est intéressé au développement de nouvelles techniques d’optimisation approchées et spécifiques au problème traité ainsi qu’à l’amélioration d’algorithmes existants. La conception, l’implémentation et l’étude empirique ont montré l’efficacité et la pertinence des métaheuristiques proposées. / Although the expansion of storage technologies, networking systems, and information system methodologies, the capabilities of conventional data processing techniques remain limited. The need to knowledge extraction, compact representation and data analysis are highly motivated by data expansion. Nevertheless, learning from data might be a complex task, particularly when it includes noisy, redundant and information-less attributes. Feature Selection (FS) tries to select the most relevant attributes from raw data, and hence guides the construction of final classification models or decision support systems. Selected features should be representative of the underlying data and provide effective usefulness to the targeted learning paradigm (i.e. classification). In this thesis, we investigate different optimization paradigms as well as its adaptation to the requirements of the feature selection challenges, namely the problem combinatorial nature. Both theoritical and empirical aspects were studied, and confirm the effectiveness of the adopted methodology as well as the proposed metaheuristic based approaches. Sélection d'attributs Métaheuristiques Classification Optimisation combinatoire Algorithmes génétiques GRASP PSO Recherche locale Feature selection Metaheuristics Classification Combinatorial optimization Genetic algorithms GRASP PSO Local search
376	Genetic generation of fuzzy knowledge bases: new perspectives / Geração genética de bases de conhecimento fuzzy: novas perspectivas Marcos Evandro Cintra 10 April 2012 (has links) This work focus on the genetic generation of fuzzy systems. One of the main contribution of this work is the proposal of the FCA-BASED method, which generates the genetic search space using the formal concept analysis theory by extracting rules from data. The experimental evaluation results of the FCA-BASED method show its robustness, producing a good trade-off between the accuracy and the interpretability of the generated models. Moreover, the FCA-BASED method presents improvements to the DOC-BASED method, a previously proposed approach, related to the reduction of the computational cost for the generation of the genetic search space. In order to tackle high dimensional datasets, we also propose the FUZZYDT method, a fuzzy version of the classic C4.5 decision tree, a highly scalable method that presents low computational cost and competitive accuracy. Due to these characteristics, FUZZYDT is used in this work as a baseline method for the experimental evaluation and comparisons of other classic and fuzzy classification methods. We also include in this work the use of the FUZZYDT method to a real world problem, the warning of the coffee rust disease in Brazilian crops. Furthermore, this work investigates the task of feature subset selection to address the dimensionality issue of fuzzy systems. To this end, we propose the FUZZYWRAPPER method, a wrapper-based approach that selects features taking the relevant information regarding the fuzzyfication of the attributes into account, in the feature selection process. This work also investigates the automatic design of fuzzy data bases, proposing the FUZZYDBD method, which estimates the number of fuzzy sets defining all the attributes of a dataset and evenly distributing the fuzzy sets in the domains of the attributes. A modified version of the FUZZYDBD method, FUZZYDBD-II, which defines independent numbers of fuzzy sets for each attribute of a dataset, by means of estimation functions, is also proposed in this work / Este trabalho foca na geração genética de sistemas fuzzy. Uma das principais contribuições deste trabalho é a proposta do método FCA-BASED, que gera o espaço de busca genético usando a teoria de análise de conceitos formais por meio da extração de regras dos dados. Os resultados da avaliação experimental do método FCA-BASED demonstram sua robustez. O método FCABASED também produz um bom trade-off entre acurácia e interpretabilidade dos modelos gerados. Além disso, o método FCA-BASED apresenta melhorias em relação ao método DOC-BASED, uma abordagem proposta anteriormente. Essas melhorias estão relacionadas à redução do custo computacional para a geração do espaço de busca genético. Para ser capaz de trabalhar com conjuntos de dados de alta dimensão, foi também proposto o método FUZZYDT, uma versão fuzzy da clássica árvore de decisão C4.5. FUZZYDT é um método altamente escalável que apresenta baixo custo computacional e acurácia competitiva. Devido a essas características, o FUZZYDT é usado nesse trabalho como um método baseline para a avaliação experimental e comparações de outros métodos de classificação, fuzzy e clássicos. Também está incluido nesse trabalho a aplicação do método FUZZYDT em um problema do mundo real, o alerta da doença da ferrugem cafeeira em plantações brasileiras. Além disso, esse trabalho investiga a tarefa de seleção de atributos como forma de atacar o problema da dimensionalidade de sistemas fuzzy. Para esse fim, foi proposto o método FUZZYWRAPPER, uma abordagem baseada em wrapper que seleciona atributos levando em consideração as informações relevantes sobre a fuzificação dos atributos durante o processo de seleção. Esse trabalho também investiga a construção automática de bases de dados fuzzy, incluindo a proposta do método FUZZYDBD, que estima o número de conjuntos fuzzy que define todos os atributos de um conjunto de dados e distribui os conjuntos fuzzy proporcionalmente nos domínios dos atributos. Uma versão modificada do método FUZZYDBD, o método FUZZYDBD-II, também é proposta nesse trabalho. O método FUZZYDBD-II define números independentes de conjuntos fuzzy para cada atributo de um conjunto de dados por meio de funções de estimação Algoritmos genéticos Aprendizado de máquina Classificação Seleção de atributos Sistemas fuzzy Sistemas fuzzy genéticos Classification Feature selection Fuzzy systems Genetic algorithms Genetic fuzzy systems Machine learning
377	VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION Hrabina, Martin January 2019 (has links) Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
378	Metody klasifikace webových stránek / Methods of Web Page Classification Nachtnebl, Viktor January 2012 (has links) This work deals with methods of web page classification. It explains the concept of classification and different features of web pages used for their classification. Further it analyses representation of a page and in detail describes classification method that deals with hierarchical category model and is able to dynamically create new categories. In the second half it shows implementation of chosen method and describes the results.
379	Feature Set Selection for Improved Classification of Static Analysis Alerts Goeschel, Kathleen 01 January 2019 (has links) With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software can be successfully attacked resulting in the compromise of one or several information security principles such as confidentiality, availability, and integrity. Feature selection methods have the potential to improve the classification of static analysis alerts and thereby reduce the false positive rates. Thus, the goal of this research effort was to improve the classification of static analysis alerts by proposing and testing a novel method leveraging feature selection. The proposed model was developed and subsequently tested on three open source PHP applications spanning several years. The results were compared to a classification model utilizing all features to gauge the classification improvement of the feature selection model. The model presented did result in the improved classification accuracy and reduction of the false positive rate on a reduced feature set. This work contributes a real-world static analysis dataset based upon three open source PHP applications. It also enhanced an existing data set generation framework to include additional predictive software features. However, the main contribution is a feature selection methodology that may be used to discover optimal feature sets that increase the classification accuracy of static analysis alerts. feature selection information security SAST software security software testing static analysis Artificial Intelligence and Robotics Computer Sciences Databases and Information Systems Physical Sciences and Mathematics
380	Machine Learning for Credit Risk Analytics Kozodoi, Nikita 03 June 2022 (has links) Der Aufstieg des maschinellen Lernens (ML) und die rasante Digitalisierung der Wirtschaft haben die Entscheidungsprozesse in der Finanzbranche erheblich verändert. Finanzinstitute setzen zunehmend auf ML, um die Entscheidungsfindung zu unterstützen. Kreditscoring ist eine der wichtigsten ML-Anwendungen im Finanzbereich. Die Aufgabe von Kreditscoring ist die Unterscheidung ob ein Antragsteller einen Kredit zurückzahlen wird. Finanzinstitute verwenden ML, um Scorecards zu entwickeln, die die Ausfallwahrscheinlichkeit eines Kreditnehmers einschätzen und Genehmigungsentscheidungen automatisieren. Diese Dissertation konzentriert sich auf drei große Herausforderungen, die mit dem Aufbau von ML-basierten Scorekarten für die Bewertung von Verbraucherkrediten verbunden sind: (i) Optimierung von Datenerfassungs- und -speicherkosten bei hochdimensionalen Daten von Kreditantragstellern; (ii) Bewältigung der negativen Auswirkungen von Stichprobenverzerrungen auf das Training und die Bewertung von Scorekarten; (iii) Messung und Sicherstellung der Fairness von Instrumenten bei gleichzeitig hoher Rentabilität. Die Arbeit bietet und testet eine Reihe von Instrumenten, um jede dieser Herausforderungen zu lösen und die Entscheidungsfindung in Finanzinstituten zu verbessern. Erstens entwickeln wir Strategien zur Auswahl von Merkmalen, die mehrere unternehmensbezogene Zielfunktionen optimieren. Unsere Vorschläge reduzieren die Kosten der Datenerfassung und verbessern die Rentabilität der Modelle. Zweitens schlagen wir Methoden zur Abschwächung der negativen Auswirkungen von Stichprobenverzerrungen vor. Unsere Vorschläge gleichen die Verluste aufgrund von Verzerrungen teilweise aus und liefern zuverlässigere Schätzungen der künftigen Scorecard-Leistung. Drittens untersucht die Arbeit faire ML-Praktiken in Kreditscoring. Wir katalogisieren geeignete algorithmische Optionen für die Einbeziehung von Fairness-Zielen und verdeutlichen den Kompromiss zwischen Gewinn und Fairness. / The rise of machine learning (ML) and the rapid digitization of the economy has substantially changed decision processes in the financial industry. Financial institutions increasingly rely on ML to support decision-making. Credit scoring is one of the prominent ML applications in finance. The task of credit scoring is to distinguish between applicants who will pay back the loan or default. Financial institutions use ML to develop scoring models to estimate a borrower's probability of default and automate approval decisions. This dissertation focuses on three major challenges associated with building ML-based scorecards in consumer credit scoring: (i) optimizing data acquisition and storage costs when dealing with high-dimensional data of loan applicants; (ii) addressing the adverse effects of sampling bias on training and evaluation of scoring models; (iii) measuring and ensuring the scorecard fairness while maintaining high profitability. The thesis offers a set of tools to remedy each of these challenges and improve decision-making practices in financial institutions. First, we develop feature selection strategies that optimize multiple business-inspired objectives. Our propositions reduce data acquisition costs and improve model profitability and interpretability. Second, the thesis illustrates the adverse effects of sampling bias on model training and evaluation and suggests novel bias correction frameworks. The proposed methods partly recover the loss due to bias, provide more reliable estimates of the future scorecard performance and increase the resulting model profitability. Third, the thesis investigates fair ML practices in consumer credit scoring. We catalog algorithmic options for incorporating fairness goals in the model development pipeline and perform empirical experiments to clarify the profit-fairness trade-off in lending decisions and identify suitable options to implement fair credit scoring and measure the scorecard fairness. Kreditscoring maschinelles Lernen Merkmalsauswahl Stichprobenverzerrung Fairness credit scoring machine learning feature selection sampling bias fairness QK 320 ddc:000

Search results