Global ETD Search

221	基於資訊理論熵之特徵選取 / Entropy based feature selection 許立農 Unknown Date (has links) 特徵選取為機器學習常見的資料前處理的方法，現今已有許多不同的特徵選取演算法，然而並不存在一個在所有資料上都優於其他方法的演算法，且由於現今的資料種類繁多，所以研發新的方法能夠帶來更多有關資料的資訊並且根據資料的特性採用不同的變數選取演算法是較好的做法。本研究使用資訊理論entropy的概念依照變數之間資料雲幾何樹的分群結果定義變數之間的相關性，且依此選取資料的特徵，並與同樣使用entropy概念的FCBF方法、Lasso、F-score、隨機森林、基因演算法互相比較，本研究使用階層式分群法與多數決投票法套用在真實的資料上判斷預測率。結果顯示，本研究使用的entropy方法在各個不同的資料集上有較穩定的預測率提升表現，同時資料縮減的維度也相對穩定。 / Feature selection is a common preprocessing technique in machine learning. Although a large pool of feature selection techniques has existed, there is no such a dominant method in all datasets. Because of the complexity of various data formats, establishing a new method can bring more insight into data, and applying proper techniques to analyzing data would be the best choice. In this study, we used the concept of entropy from information theory to build a similarity matrix between features. Additionally, we constructed a DCG-tree to separate variables into clusters. Each core cluster consists of rather uniform variables, which share similar covariate information. With the core clusters, we reduced the dimension of a high-dimensional dataset. We assessed our method by comparing it with FCBF, Lasso, F-score, random forest and genetic algorithm. The performances of prediction were demonstrated through real-world datasets using hierarchical clustering with voting algorithm as the classifier. The results showed that our entropy method has more stable prediction performances and reduces sufficient dimensions of the datasets simultaneously. 機器學習特徵選取維度縮減 entropy Machine learning Feature selection Dimension reduction Entropy
222	Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data. Peng, Liang January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types. Feature selection Classification Hypothesis testing Cross validation Multiple comparison Genomic data Bioinformatics (0715) Computer Science (0984) Statistics (0463)
223	Feature selection and clustering for malicious and benign software characterization Chhabra, Dalbir Kaur R 13 August 2014 (has links) Malware or malicious code is design to gather sensitive information without knowledge or permission of the users or damage files in the computer system. As the use of computer systems and Internet is increasing, the threat of malware is also growing. Moreover, the increase in data is raising difficulties to identify if the executables are malicious or benign. Hence, we have devised a method that collects features from portable executable file format using static malware analysis technique. We have also optimized the important or useful features by either normalizing or giving weightage to the feature. Furthermore, we have compared accuracy of various unsupervised learning algorithms for clustering huge dataset of samples. So once the clusters are created we can use antivirus (AV) to identify one or two file and if they are detected by AV then all the files in cluster are malicious even if the files contain novel or unknown malware; otherwise all are benign. Static malware analysis Portable Executable unsupervised learning algorithm malicious or benign samples feature selection clustering Information Security
224	Ensemble multi-label learning in supervised and semi-supervised settings / Apprentissage multi-label ensembliste dans le context supervisé et semi-supervisé Gharroudi, Ouadie 21 December 2017 (has links) L'apprentissage multi-label est un problème d'apprentissage supervisé où chaque instance peut être associée à plusieurs labels cibles simultanément. Il est omniprésent dans l'apprentissage automatique et apparaÃ®t naturellement dans de nombreuses applications du monde réel telles que la classification de documents, l'étiquetage automatique de musique et l'annotation d'images. Nous discutons d'abord pourquoi les algorithmes multi-label de l'etat-de-l'art utilisant un comité de modèle souffrent de certains inconvénients pratiques. Nous proposons ensuite une nouvelle stratégie pour construire et agréger les modèles ensemblistes multi-label basés sur k-labels. Nous analysons ensuite en profondeur l'effet de l'étape d'agrégation au sein des approches ensemblistes multi-label et étudions comment cette agrégation influece les performances de prédictive du modèle enfocntion de la nature de fonction cout à optimiser. Nous abordons ensuite le problème spécifique de la selection de variables dans le contexte multi-label en se basant sur le paradigme ensembliste. Trois méthodes de sélection de caractéristiques multi-label basées sur le paradigme des forêts aléatoires sont proposées. Ces méthodes diffèrent dans la façon dont elles considèrent la dépendance entre les labels dans le processus de sélection des varibales. Enfin, nous étendons les problèmes de classification et de sélection de variables au cadre d'apprentissage semi-supervisé. Nous proposons une nouvelle approche de sélection de variables multi-label semi-supervisée basée sur le paradigme de l'ensemble. Le modèle proposé associe des principes issues de la co-training en conjonction avec une métrique interne d'évaluation d'importnance des varaibles basée sur les out-of-bag. Testés de manière satisfaisante sur plusieurs données de référence, les approches développées dans cette thèse sont prometteuses pour une variété d'ap-plications dans l'apprentissage multi-label supervisé et semi-supervisé. Testés de manière satisfaisante sur plusieurs jeux de données de référence, les approches développées dans cette thèse affichent des résultats prometteurs pour une variété domaine d'applications de l'apprentissage multi-label supervisé et semi-supervisé / Multi-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning Classification multi-label Apprentissage supervisé Apprentissage semi-supervisé Multi-label classification Ensemble models Semi-supervised learning Feature selection 004
225	Seleção de características apoiada por mineração visual de dados / Feature selection supported by visual data mining Botelho, Glenda Michele 17 February 2011 (has links) Devido ao crescimento do volume de imagens e, consequentemente, da grande quantidade e complexidade das características que as representam, surge a necessidade de selecionar características mais relevantes que minimizam os problemas causados pela alta dimensionalidade e correlação e que melhoram a eficiência e a eficácia das atividades que utilizarão o conjunto de dados. Existem diversos métodos tradicionais de seleção que se baseiam em análises estatísticas dos dados ou em redes neurais artificiais. Este trabalho propõe a inclusão de técnicas de mineração visual de dados, particularmente, projeção de dados multidimensionais, para apoiar o processo de seleção. Projeção de dados busca mapear dados de um espaço m-dimensional em um espaço p-dimensional, p < m e geralmente igual a 2 ou 3, preservando ao máximo as relações de distância existentes entre os dados. Tradicionalmente, cada imagem é representada por um ponto e pontos projetados próximos uns aos outros indicam agrupamentos de imagens que compartilham as mesmas propriedades. No entanto, este trabalho propõe a projeção de características. Dessa forma, ao selecionarmos apenas algumas amostras de cada agrupamento da projeção, teremos um subconjunto de características, configurando um processo de seleção. A qualidade dos subconjuntos de características selecionados é avaliada comparando-se as projeções obtidas para estes subconjuntos com a projeção obtida com conjunto original de dados. Isto é feito quantitativamente, por meio da medida de silhueta, e qualitativamente, pela observação visual da projeção. Além da seleção apoiada por projeção, este trabalho propõe um aprimoramento no seletor de características baseado no cálculo de saliências de uma rede neural Multilayer Perceptron. Esta alteração, que visa selecionar características mais discriminantes e reduzir a quantidade de cálculos para se obter as saliências, utiliza informações provenientes dos agrupamentos de características, de forma a alterar a topologia da rede neural em que se baseia o seletor. Os resultados mostraram que a seleção de características baseada em projeção obtém subconjuntos capazes de gerar novas projeções com qualidade visual satisfatória. Em relação ao seletor por saliência proposto, este também gera subconjuntos responsáveis por altas taxas de classificação de imagens e por novas projeções com bons valores de silhueta / Due to the ever growing amount of digital images and, consequently, the quantity and complexity of your features, there has been a need to select the most relevant features so that not only problems caused by high dimensional data sets, correlated features can be minimized, and also the efficiency of the tasks that may employ such features can be enhanced. Many feature selection methods are based on statistical analysis or neural network approaches. This work proposes the addition of visual data mining techniques, particularly multidimensional data projection approaches, to aid the feature selection process. Multidimensional data projection seeks to map a m-dimensional data space onto a p-dimensional space, so that p < m, usually 2 or 3, while preserving distance relationship among data instances. Traditionally, each image is represented by a point, and points projected close to each other indicate clusters of images which share a common properties. However, this work proposes the projection of features. Hence, if we select only a few samples of each cluster of features from the projection, we will end up with a subset of features, revealing a feature selection process. The quality of the feature subset may be assessed by comparing such projections with those obtained with the original data set. This can be achieved either quantitatively, by means of silhouette measures, or qualitatively, by means of visual inspection of the projection. As well as the projection based feature selection, this work proposes an enhancement in the Multilayer Perceptron salience based feature selector. This enhancement, whose aim is to perfect the selection of more discriminant features at the expenses of less computing power, employs information from feature clusters, so as to change the topology of the neural network on which the selector is based. Results have shown that projection-based feature selection produces subsets capable of generating new data projections of satisfactory visual quality. As for the proposed salience-based selector, new subsets with high image classification rates and good silhouette measures have been reported Agrupamento Clustering Feature selection Multidimensional data projection Projeção de dados multidimensionais Salience selection Seleção de características Seleção por saliência Silhueta Siulhouette
226	Avaliação de métodos ótimos e subótimos de seleção de características de texturas em imagens / Evaluation of optimal and suboptimal feature selection methods applied to image textures Roncatti, Marco Aurelio 10 July 2008 (has links) Características de texturas atuam como bons descritores de imagens e podem ser empregadas em diversos problemas, como classificação e segmentação. Porém, quando o número de características é muito elevado, o reconhecimento de padrões pode ser prejudicado. A seleção de características contribui para a solução desse problema, podendo ser empregada tanto para redução da dimensionalidade como também para descobrir quais as melhores características de texturas para o tipo de imagem analisada. O objetivo deste trabalho é avaliar métodos ótimos e subótimos de seleção de características em problemas que envolvem texturas de imagens. Os algoritmos de seleção avaliados foram o branch and bound, a busca exaustiva e o sequential oating forward selection (SFFS). As funções critério empregadas na seleção foram a distância de Jeffries-Matusita e a taxa de acerto do classificador de distância mínima (CDM). As características de texturas empregadas nos experimentos foram obtidas com estatísticas de primeira ordem, matrizes de co-ocorrência e filtros de Gabor. Os experimentos realizados foram a classificação de regiôes de uma foto aérea de plantação de eucalipto, a segmentação não-supervisionada de mosaicos de texturas de Brodatz e a segmentação supervisionada de imagens médicas (MRI do cérebro). O branch and bound é um algoritmo ótimo e mais efiiente do que a busca exaustiva na maioria dos casos. Porém, continua sendo um algoritmo lento. Este trabalho apresenta uma nova estratégia para o branch and bound, nomeada floresta, que melhorou significativamente a eficiência do algoritmo. A avaliação dos métodos de seleção de características mostrou que os melhores subconjuntos foram aqueles obtidos com o uso da taxa de acerto do CDM. A busca exaustiva e o branch and bound, mesmo com a estratégia floresta, foram considerados inviáveis devido ao alto tempo de processamento nos casos em que o número de característica é muito grande. O SFFS apresentou os melhores resultados, pois, além de mais rápido, encontrou as soluções ótimas ou próximas das ótimas. Pôde-se concluir também que a precisão no reconhecimento de padrões aumenta com a redução do número de características e que os melhores subconjuntos freqüentemente são formados por características de texturas obtidas com técnicas diferentes / Texture features are eficient image descriptors and can be employed in a wide range of applications, such as classification and segmentation. However, when the number of features is considerably high, pattern recognition tasks may be compromised. Feature selection helps prevent this problem, as it can be used to reduce data dimensionality and reveal features which best characterise images under investigation. This work aims to evaluate optimal and suboptimal feature selection algorithms in the context of textural features extracted from images. Branch and bound, exhaustive search and sequential floating forward selection (SFFS) were the algorithms investigated. The criterion functions employed during selection were the Jeffries-Matusita (JM) distance and the minimum distance classifier (MDC) accuracy rate. Texture features were computed from first-order statistics, co-occurrence matrices and Gabor filters. Three different experiments have been conducted: classification of aerial picture of eucalyptus plantations, unsupervised segmentation of mosaics of Brodatz texture samples and supervised segmentation of MRI images of the brain. The branch and bound is an optimal algorithm and many times more eficient than exhaustive search. But is still time consuming. This work proposed a novel strategy for the branch and bound algorithm, named forest, which has considerably improved its performance. The evaluation of the feature selection methods has revealed that the best feature subsets were those computed by the MDC accuracy rate criterion function. Exhaustive search and branch and bound approaches have been considered unfeasible, due to their high processing times, especially for high dimensional data. This statement holds even for the branch and bound with the forest strategy. The SFFS approach yielded the best results. Not only was it faster, as it also was capable of finding the optimal or nearly optimal solutions. Finally, it has been observed that the precision of pattern recognition tasks increases as the number of features decreases and that the best feature subsets are those which possess features computed from distinct texture feature methods Branch and bound Branch and bound Feature selection Pattern recognition Reconhecimento de padrões Seleção de características Sequential floating forward selection Texturas Textures
227	Seleção de características e aprendizado ativo para classificação de imagens de sensoriamento remoto / Feature selection and active learning for remote sensing image classification Jorge, Fábio Rodrigues 29 April 2015 (has links) Em aplicações de sensoriamento remoto, há diversos problemas nos quais há conhecimento predominante sobre uma categoria ou classe alvo, e pouco conhecimento sobre as demais categorias. Nesses casos, o treinamento de um classificador é prejudicado pelo desbalanceamento de classes. Assim, o estudo de características visuais para se definir o melhor subespaço de características pode ser uma alternativa viável para melhorar o desempenho dos classificadores. O uso de abordagens baseadas em detecção de anomalias também pode auxiliar por meio da modelagem da classe normal (comumente majoritária) enquanto todas as outras classes são consideradas como anomalias. Este estudo apresentou uma base de imagens de sensoriamento remoto, cuja aplicação é identificar entre regiões de cobertura vegetal e regiões de não cobertura vegetal. Para solucionar o problema de desbalanceamento entre as classes, foram realizados estudos das características visuais a fim de definir qual o conjunto de atributos que melhor representa os dados. Também foi proposta a criação de um pipeline para se tratar bases desbalanceadas de cobertura vegetal. Este pipeline fez uso de técnicas de seleção de características e aprendizado ativo. A análise de características apresentou que o subespaço usando o extrator BIC com o índice de vegetação ExG foi o que melhor distinguiu os dados. Além disso, a técnica de ordenação proposta mostrou bom desempenho com poucas dimensões. O aprendizado ativo também ajudou na criação de um modelo melhor, com resultados comparáveis com as melhores características visuais. / In remote sensing applications, there are several problems in which there is predominant knowledge about a target category or class, and little knowledge of the other categories. In such cases, the training of a classifier is hampered by the class imbalance. Thus, the study of visual characteristics to determine the best subspace characteristics may be a feasible alternative to improve the performance of classifiers. The use of anomaly detection-based approaches can also help through the normal class modeling (usually the major class) while considering all other classes as anomalies. This study presents a remote sensing image dataset, whose application is to classify regions of the image into vegetation coverage (related to plantation) and non-vegetation coverage. To solve the class imbalance problem, studies were conducted using several visual characteristics in order to define the set of attributes that best represent the data. A pipeline that deals with the vegetation classification problem and its class imbalance issues is also proposed. This pipeline made use of feature selection techniques and active learning. The visual features analysis showed that a subspace using the BIC extractor with EXG vegetation index was the best to distinguished the data. Also, and the proposed sorting-based feature selection achieved good results with a low dimensional subspaces. Furthermore, the active learning helped creating a better model, with results comparable with the best visual features. Aprendizado de máquina Bases desbalanceadas Extração de características Feature extraction Feature selection Machine learning Remote sensing Seleção de características Sensoriamento remoto Unbalanced bases
228	Seleção de características por meio de algoritmos genéticos para aprimoramento de rankings e de modelos de classificação / Feature selection by genetic algorithms to improve ranking and classification models Silva, Sérgio Francisco da 25 April 2011 (has links) Sistemas de recuperação de imagens por conteúdo (Content-based image retrieval { CBIR) e de classificação dependem fortemente de vetores de características que são extraídos das imagens considerando critérios visuais específicos. É comum que o tamanho dos vetores de características seja da ordem de centenas de elementos. Conforme se aumenta o tamanho (dimensionalidade) do vetor de características, também se aumentam os graus de irrelevâncias e redundâncias, levando ao problema da \"maldição da dimensionalidade\". Desse modo, a seleção das características relevantes é um passo primordial para o bom funcionamento de sistemas CBIR e de classificação. Nesta tese são apresentados novos métodos de seleção de características baseados em algoritmos genéticos (do inglês genetic algorithms - GA), visando o aprimoramento de consultas por similaridade e modelos de classificação. A família Fc (\"Fitness coach\") de funções de avaliação proposta vale-se de funções de avaliação de ranking, para desenvolver uma nova abordagem de seleção de características baseada em GA que visa aprimorar a acurácia de sistemas CBIR. A habilidade de busca de GA considerando os critérios de avaliação propostos (família Fc) trouxe uma melhora de precisão de consultas por similaridade de até 22% quando comparado com métodos wrapper tradicionais para seleção de características baseados em decision-trees (C4.5), naive bayes, support vector machine, 1-nearest neighbor e mineração de regras de associação. Outras contribuições desta tese são dois métodos de seleção de características baseados em filtragem, com aplicações em classificação de imagens, que utilizam o cálculo supervisionado da estatística de silhueta simplificada como função de avaliação: o silhouette-based greedy search (SiGS) e o silhouette-based genetic algorithm search (SiGAS). Os métodos propostos superaram os métodos concorrentes na literatura (CFS, FCBF, ReliefF, entre outros). É importante também ressaltar que o ganho em acurácia obtido pela família Fc, e pelos métodos SiGS e SiGAS propostos proporcionam também um decréscimo significativo no tamanho do vetor de características, de até 90% / Content-based image retrieval (CBIR) and classification systems rely on feature vectors extracted from images considering specific visual criteria. It is common that the size of a feature vector is of the order of hundreds of elements. When the size (dimensionality) of the feature vector is increased, a higher degree of redundancy and irrelevancy can be observed, leading to the \"curse of dimensionality\" problem. Thus, the selection of relevant features is a key aspect in a CBIR or classification system. This thesis presents new methods based on genetic algorithms (GA) to perform feature selection. The Fc (\"Fitness coach\") family of fitness functions proposed takes advantage of single valued ranking evaluation functions, in order to develop a new method of genetic feature selection tailored to improve the accuracy of CBIR systems. The ability of the genetic algorithms to boost feature selection by employing evaluation criteria (fitness functions) improves up to 22% the precision of the query answers in the analyzed databases when compared to traditional wrapper feature selection methods based on decision-tree (C4.5), naive bayes, support vector machine, 1-nearest neighbor and association rule mining. Other contributions of this thesis are two filter-based feature selection algorithms for classification purposes, which calculate the simplified silhouette statistic as evaluation function: the silhouette-based greedy search (SiGS) and the silhouette-based genetic algorithm search (SiGAS). The proposed algorithms overcome the state-of-the-art ones (CFS, FCBF and ReliefF, among others). It is important to stress that the gain in accuracy of the proposed methods family Fc, SiGS and SIGAS is allied to a significant decrease in the feature vector size, what can reach up to 90% Algoritmos genéticos Classificação Classification Consultas por similaridade Feature selection Genetic algorithms Imagens médicas Medical images Seleção de características Similarity search
229	Técnicas de seleção de características com aplicações em reconhecimento de faces. / Feature selection techniques with applications to face recognition. Campos, Teófilo Emídio de 25 May 2001 (has links) O reconhecimento de faces é uma área de pesquisa desafiadora que abre portas para a implementação de aplicações muito promissoras. Embora muitos algoritmos eficientes e robustos já tenham sido propostos, ainda restam vários desafios. Dentre os principais obstáculos a serem uperados, está a obtenção de uma representação robusta e compacta de faces que possibilite distinguir os indivíduos rapidamente. Visando abordar esse problema, foi realizado um estudo de técnicas de reconhecimento estatístico de padrões, principalmente na área de redução de dimensionalidade dos dados, além de uma revisão de métodos de reconhecimento de faces. Foi proposto (em colaboração com a pesquisadora Isabelle Bloch) um método de seleção de características que une um algoritmo de busca eficiente (métodos de busca seqüencial flutuante) com uma medida de distância entre conjuntos nebulosos (distância nebulosa baseada em tolerância). Essa medida de distância possui diversas vantagens, sendo possível considerar as diferentes tipicalidades de cada padrão dos conjuntos de modo a permitir a obtenção de bons resultados mesmo com conjuntos com sobreposição. Os resultados preliminares com dados sintéticos mostraram o caráter promissor dessa abordagem. Com o objetivo de verificar a eficiência de tal técnica com dados reais, foram efetuados testes com reconhecimento de pessoas usando imagens da região dos olhos. Nesse caso, em se tratando de um problema com mais de duas classes, nós propusemos uma nova função critério inspirada na distância supracitada. Além disso foi proposto (juntamente com o estudante de mestrado Rogério S. Feris) um esquema de reconhecimento a partir de seqüências de vídeo. Esse esquema inclui a utilização de um método eficiente de rastreamento de características faciais (Gabor Wavelet Networks) e o método proposto anteriormente para seleção de características. Dentro desse contexto, o trabalho desenvolvido nesta dissertação implementa uma parte dos módulos desse esquema. / Face recognition is an instigating research field that may lead to the development of many promising applications. Although many efficient and robust algorithms have been developed in this area, there are still many challenges to be overcome. In particular, a robust and compact face representation is still to be found, which would allow for quick classification of different individuals. In order to address this problem, we first studied pattern recognition techniques, especially regarding dimensionality reduction, followed by the main face recognition methods. We introduced a new feature selection approach in collaboration with the researcher Isabelle Bloch (TSI-ENST-Paris), that associates an efficient searching algorithm (sequential floating search methods), with a tolerance-based fuzzy distance. This distance measure presents some nice features for dealing with the tipicalities of each pattern in the sets, so that good results can be attained even when the sets are overlapping. Preliminary results with synthetic data have demonstrated that this method is quite promising. In order to verify the efficiency of this technique with real data, we applied it for improving the performance of a person recognition system based on eye images. Since this problem involves more than two classes, we also developed a new criterion function based on the above-mentioned distance. Moreover, we proposed (together with Rogério S. Feris) a system for person recognition based on video sequences. This mechanism includes the development of an efficient method for facial features tracking, in addition to our method for feature selection. In this context, the work presented here constitutes part of the proposed system. computer vision dimensionality reduction face recognition feature selection pattern recognition reconhecimento de faces reconhecimento de padrões redução de dimensionalidade seleção de características visão computacional
230	Identificação de florestas destinadas à produção de bioenergia no Estado do Tocantins utilizando imagens de satélite e mineração de dados Nonato, Carlos Tavares 26 August 2014 (has links) As florestas plantadas tem atraído grande interesse pela possibilidade de utilização em aplicações bioenergéticas frente à tendência mundial de priorizar fontes de energia que proporcionem maior sustentabilidade ambiental, mais qualidade e segurança. No Brasil, os deslocamentos na geografia da cadeia produtiva agroflorestal atual em direção às regiões de fronteira agrícola (Centro-Oeste e Norte) vem criando desafios de adequação dos conhecimentos técnico-científicos já consolidados em outras regiões. Nesse contexto, o objetivo desta dissertação é avaliar a acurácia da classificação e identificação de áreas cultivadas com florestas plantadas para fins energéticos, em imagens orbitais do sensor Landsat 5 TM. Por meio de técnicas estatísticas de mineração de dados, o presente trabalho também avaliou a utilização de um amplo conjunto de atributos para identificar melhorias nos resultados da classificação. A pesquisa se concentrou em amostras de áreas plantadas no estado do Tocantins, região norte do Brasil. As técnicas de mineração de dados utilizadas se mostraram eficientes na identificação precisa de florestas plantadas em imagens do satélite Landsat 5, tanto pelo desempenho da classificação, quanto pela redução da quantidade de informação necessária para a resolução deste tipo de problema. Assim, as técnicas empregadas neste estudo possibilitam o desenvolvimento de modelos de classificação robustos no auxílio ao planejamento e à tomada de decisão sobre a plantação de florestas no território brasileiro. / Planted forests have attracted a lot of attention because of possibility of use in bioenergy applications and due to the global trend of prioritizing energy sources that provide greater environmental sustainability, more quality and security. In Brazil, the shifts in the geography of current agroforestry production chain towards the agricultural frontier areas (Midwest and North) are creating challenges to the adequacy of technical and scientific knowledge already established in other regions. So, the aim of this work is to assess the accuracy of the identification and classification of areas cultivated with plantation forests for energy, inside TM Landsat 5 images. Using statistical techniques for data mining, this study also evaluated the use of a broad set of attributes to identify improvements in the classification results. The research focused on samples of planted areas in the state of Tocantins, Northern Brazil. The data mining techniques used were effective in identifying of planted forests in Landsat 5 satellite images, both the classification performance, such as by reducing the amount of information needed to solve this kind of problem. Thus, the techniques employed in this study enable the development of robust classification models to aid in the planning and decision making on forest plantations in Brazil. CNPQ::CIENCIAS AGRARIAS Florestas plantadas Mineração de dados, Seleção de atributos Classificação de imagens Planted Forests Data mining Feature selection Classification of pictures

Search results