Global ETD Search

51	Seleção de algoritmos para a tarefa de agrupamento de dados: uma abordagem via meta-aprendizagem Ferrari, Daniel Gomes 27 March 2014 (has links) Made available in DSpace on 2016-03-15T19:38:50Z (GMT). No. of bitstreams: 1 Daniel Gomes Ferrari.pdf: 2637416 bytes, checksum: 535856887beb7ff04af53570120bc1f9 (MD5) Previous issue date: 2014-03-27 / Natcomp Informatica e Equipamentos Eletronicos LTDA / Data clustering is an important data mining task that aims to segment a database into groups of objects based on their similarity or dissimilarity. Due to the unsupervised nature of clustering, the search for a good quality solution can become a complex process. There is currently a wide range of clustering algorithms and selecting the most suitable one for a given problem can be a slow and costly process. In 1976, Rice formulated the algorithm selection problem (PSA) postulating that a good performance algorithm can be chosen according to the problem s structural characteristics. Meta-learning brings the concept of learning about learning, that is, the meta-knowledge obtained from the algorithms learning process allows it to improve its performance. Meta-learning has a major intersection with data mining in classification problems, where it is used to select algorithms. This thesis proposes an approach to the algorithm selection problem by using meta-learning techniques for clustering. The characterization of 84 problems is performed by a classical approach, based on the problems, and a new proposal based on the similarity among the objects. Ten internal indices are used to provide different performance assessments of seven algorithms, where the combination of the indices determine the ranking for the algorithms. Several analyzes are performed in order to assess the quality of the obtained meta-knowledge in facilitating the mapping between the problem s features and the performance of the algorithms. The results show that the new characterization approach and method to combine the indices provide a good quality algorithm selection mechanism for data clustering problems. / Agrupamento é uma tarefa importante na mineração de dados, tendo como objetivo segmentar uma base de dados em grupos de objetos baseando-se na similaridade ou dissimilaridade entre os mesmos. Devido à natureza não supervisionada da tarefa, a busca por uma solução de boa qualidade pode se tornar um processo complexo. Atualmente, existe na literatura acadêmica uma grande quantidade de algoritmos que podem ser utilizados na resolução deste problema. A seleção do algoritmo mais adequado para um determinado problema pode ser um processo lento e custoso. Em 1976, Rice formulou o Problema de Seleção de Algoritmos (PSA), postulando que um algoritmo de bom desempenho pode ser escolhido de acordo com as características estruturais do problema em que o mesmo será aplicado. A meta-aprendizagem traz consigo o conceito de aprender sobre o aprender, isto é, por meio do meta-conhecimento obtido do processo de aprendizagem dos algoritmos é possível aprimorar o desempenho do processo. Meta-aprendizagem possui grande interseção com mineração de dados no que tange problemas de classificação, sendo utilizada no desenvolvimento de sistemas de seleção de algoritmos. Nesta tese é proposta a abordagem ao PSA por meio de técnicas de meta-aprendizagem para agrupamento de dados. A caracterização de 84 problemas é realizada pela abordagem clássica, baseada nos problemas, e por uma nova proposta baseada na similaridade entre os objetos. São utilizados dez índices internos para promover diferentes avaliações do desempenho de sete algoritmos, onde a combinação desses índices determina o ranking dos algoritmos. São realizadas diversas análises no intuito de avaliar a qualidade do meta-conhecimento obtido em viabilizar o mapeamento entre as características do problema e o desempenho dos algoritmos. Os resultados mostram que a nova caracterização e combinação dos índices proporcionam a seleção, com qualidade, de algoritmos para agrupamento de dados. agrupamento de dados meta-aprendizagem meta-conhecimento seleção de algoritmos data clustering meta-learning meta-knowledge algorithm selection CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
52	Um modelo dinâmico de clusterização de dados aplicado na detecção de intrusão Rogério Akiyoshi Furukawa 25 April 2003 (has links) Atualmente, a segurança computacional vem se tornando cada vez mais necessária devido ao grande crescimento das estatísticas que relatam os crimes computacionais. Uma das ferramentas utilizadas para aumentar o nível de segurança é conhecida como Sistemas de Detecção de Intrusão (SDI). A flexibilidade e usabilidade destes sistemas têm contribuído, consideravelmente, para o aumento da proteção dos ambientes computacionais. Como grande parte das intrusões seguem padrões bem definidos de comportamento em uma rede de computadores, as técnicas de classificação e clusterização de dados tendem a ser muito apropriadas para a obtenção de uma forma eficaz de resolver este tipo de problema. Neste trabalho será apresentado um modelo dinâmico de clusterização baseado em um mecanismo de movimentação dos dados. Apesar de ser uma técnica de clusterização de dados aplicável a qualquer tipo de dados, neste trabalho, este modelo será utilizado para a detecção de intrusão. A técnica apresentada neste trabalho obteve resultados de clusterização comparáveis com técnicas tradicionais. Além disso, a técnica proposta possui algumas vantagens sobre as técnicas tradicionais investigadas, como realização de clusterizações multi-escala e não necessidade de determinação do número inicial de clusters / Nowadays, the computational security is becoming more and more necessary due to the large growth of the statistics that describe computer crimes. One of the tools used to increase the safety level is named Intrusion Detection Systems (IDS). The flexibility and usability of these systems have contributed, considerably, to increase the protection of computational environments. As large part of the intrusions follows behavior patterns very well defined in a computers network, techniques for data classification and clustering tend to be very appropriate to obtain an effective solutions to this problem. In this work, a dynamic clustering model based on a data movement mechanism are presented. In spite of a clustering technique applicable to any data type, in this work, this model will be applied to the detection intrusion. The technique presented in this work obtained clustering results comparable to those obtained by traditional techniques. Besides the proposed technique presents some advantages on the traditional techniques investigated, like multi-resolution clustering and no need to previously know the number of clusters Análise dos componentes principais Clusterização de dados Sistemas de detecção de intrusão Data clustering Intrusion detection systems Principal analisys component
53	Partitioning A Graph In Alliances And Its Application To Data Clustering Hassan-Shafique, Khurram 01 January 2004 (has links) Any reasonably large group of individuals, families, states, and parties exhibits the phenomenon of subgroup formations within the group such that the members of each group have a strong connection or bonding between each other. The reasons of the formation of these subgroups that we call alliances differ in different situations, such as, kinship and friendship (in the case of individuals), common economic interests (for both individuals and states), common political interests, and geographical proximity. This structure of alliances is not only prevalent in social networks, but it is also an important characteristic of similarity networks of natural and unnatural objects. (A similarity network defines the links between two objects based on their similarities). Discovery of such structure in a data set is called clustering or unsupervised learning and the ability to do it automatically is desirable for many applications in the areas of pattern recognition, computer vision, artificial intelligence, behavioral and social sciences, life sciences, earth sciences, medicine, and information theory. In this dissertation, we study a graph theoretical model of alliances where an alliance of the vertices of a graph is a set of vertices in the graph, such that every vertex in the set is adjacent to equal or more vertices inside the set than the vertices outside it. We study the problem of partitioning a graph into alliances and identify classes of graphs that have such a partition. We present results on the relationship between the existence of such a partition and other well known graph parameters, such as connectivity, subgraph structure, and degrees of vertices. We also present results on the computational complexity of finding such a partition. An alliance cover set is a set of vertices in a graph that contains at least one vertex from every alliance of the graph. The complement of an alliance cover set is an alliance free set, that is, a set that does not contain any alliance as a subset. We study the properties of these sets and present tight bounds on their cardinalities. In addition, we also characterize the graphs that can be partitioned into alliance free and alliance cover sets. Finally, we present an approximate algorithm to discover alliances in a given graph. At each step, the algorithm finds a partition of the vertices into two alliances such that the alliances are strongest among all such partitions. The strength of an alliance is defined as a real number p, such that every vertex in the alliance has at least p times more neighbors in the set than its total number of neighbors in the graph). We evaluate the performance of the proposed algorithm on standard data sets. vertex partitions data clustering alliances defensive alliances offensive alliances powerful alliances alliance free sets alliance cover sets Computer Sciences Engineering
54	Learning Techniques For Information Retrieval And Mining In High-dimensional Databases Cheng, Hao 01 January 2009 (has links) The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues. Specifically, my work has five main contributions as follows. The first contribution is a novel manifold learning algorithm, Local and Global Structures Preserving Projection (LGSPP), which defines salient low-dimensional representations for the high-dimensional data. A small number of projection directions are sought in order to properly preserve the local and global structures for the original data. Specifically, two groups of points are extracted for each individual point in the dataset: the first group contains the nearest neighbors of the point, and the other set are a few sampled points far away from the point. These two point sets respectively characterize the local and global structures with regard to the data point. The objective of the embedding is to minimize the distances of the points in each local neighborhood and also to disperse the points far away from their respective remote points in the original space. In this way, the relationships between the data in the original space are well preserved with little distortions. The second contribution is a new constrained clustering algorithm. Conventionally, clustering is an unsupervised learning problem, which systematically partitions a dataset into a small set of clusters such that data in each cluster appear similar to each other compared with those in other clusters. In the proposal, the partial human knowledge is exploited to find better clustering results. Two kinds of constraints are integrated into the clustering algorithm. One is the must-link constraint, indicating that the involved two points belong to the same cluster. On the other hand, the cannot-link constraint denotes that two points are not within the same cluster. Given the input constraints, data points are arranged into small groups and a graph is constructed to preserve the semantic relations between these groups. The assignment procedure makes a best effort to assign each group to a feasible cluster without violating the constraints. The theoretical analysis reveals that the probability of data points being assigned to the true clusters is much higher by the new proposal, compared to conventional methods. In general, the new scheme can produce clusters which can better match the ground truth and respect the semantic relations between points inferred from the constraints. The third contribution is a unified framework for partition-based dimension reduction techniques, which allows efficient similarity retrieval in the high-dimensional data space. Recent similarity search techniques, such as Piecewise Aggregate Approximation (PAA), Segmented Means (SMEAN) and Mean-Standard deviation (MS), prove to be very effective in reducing data dimensionality by partitioning dimensions into subsets and extracting aggregate values from each dimension subset. These partition-based techniques have many advantages including very efficient multi-phased pruning while being simple to implement. They, however, are not adaptive to different characteristics of data in diverse applications. In this study, a unified framework for these partition-based techniques is proposed and the issue of dimension partitions is examined in this framework. An investigation of the relationships of query selectivity and the dimension partition schemes discovers indicators which can predict the performance of a partitioning setting. Accordingly, a greedy algorithm is designed to effectively determine a good partitioning of data dimensions so that the performance of the reduction technique is robust with regard to different datasets. The fourth contribution is an effective similarity search technique in the database of point sets. In the conventional model, an object corresponds to a single vector. In the proposed study, an object is represented by a set of points. In general, this new representation can be used in many real-world applications and carries much more local information, but the retrieval and learning problems become very challenging. The Hausdorff distance is the common distance function to measure the similarity between two point sets, however, this metric is sensitive to outliers in the data. To address this issue, a novel similarity function is defined to better capture the proximity of two objects, in which a one-to-one mapping is established between vectors of the two objects. The optimal mapping minimizes the sum of distances between each paired points. The overall distance of the optimal matching is robust and has high retrieval accuracy. The computation of the new distance function is formulated into the classical assignment problem. The lower-bounding techniques and early-stop mechanism are also proposed to significantly accelerate the expensive similarity search process. The classification problem over the point-set data is called Multiple Instance Learning (MIL) in the machine learning community in which a vector is an instance and an object is a bag of instances. The fifth contribution is to convert the MIL problem into a standard supervised learning in the conventional vector space. Specially, feature vectors of bags are grouped into clusters. Each object is then denoted as a bag of cluster labels, and common patterns of each category are discovered, each of which is further reconstructed into a bag of features. Accordingly, a bag is effectively mapped into a feature space defined by the distances from this bag to all the derived patterns. The standard supervised learning algorithms can be applied to classify objects into pre-defined categories. The results demonstrate that the proposal has better classification accuracy compared to other state-of-the-art techniques. In the future, I will continue to explore my research in large-scale data analysis algorithms, applications and system developments. Especially, I am interested in applications to analyze the massive volume of online data. similarity search dimension reduction data clustering constrained clustering manifold learning query processing multiple instance learning Computer Sciences Engineering
55	Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data Carraher, Lee A. 22 May 2018 (has links) No description available. Computer Engineering data clustering distributed data mining streaming data algorithms locality sensitive hashing count-min cut tree random projection
56	Complex network component unfolding using a particle competition technique / Desdobramento de componentes de redes complexas utilizando uma técnica de competição de partículas Urio, Paulo Roberto 12 June 2017 (has links) This work applies complex network theory to the problem of semi-supervised and unsupervised learning in networks that are representations of multivariate datasets. Complex networks allow the use of nonlinear dynamical systems to represent behaviors according to the connectivity patterns of networks. Inspired by behavior observed in nature, such as competition for limited resources, dynamical system models can be employed to uncover the organizational structure of a network. In this dissertation, we develop a technique for classifying data represented as interaction networks. As part of the technique, we model a dynamical system inspired by the biological dynamics of resource competition. So far, similar methods have focused on vertices as the resource of competition. We introduce edges as the resource of competition. In doing so, the connectivity pattern of a network might be used not only in the dynamical system simulation but in the learning task as well. / Este trabalho aplica a teoria de redes complexas para o estudo de uma técnica aplicada ao problema de aprendizado semissupervisionado e não-supervisionado em redes, especificamente, aquelas que representam conjuntos de dados multivariados. Redes complexas permitem o emprego de sistemas dinâmicos não-lineares que podem apresentar comportamentos de acordo com os padrões de conectividade de redes. Inspirado pelos comportamentos observados na natureza, tais como a competição por recursos limitados, sistema dinâmicos podem ser utilizados para revelar a estrutura da organização de uma rede. Nesta dissertação, desenvolve-se uma técnica aplicada ao problema de classificação de dados representados por redes de interação. Como parte da técnica, um sistema dinâmico inspirado na competição por recursos foi modelado. Métodos similares concentraram-se em vértices como o recurso da concorrência. Neste trabalho, introduziu-se arestas como o recurso-alvo da competição. Ao fazê-lo, utilizar-se-á o padrão de conectividade de uma rede tanto na simulação do sistema dinâmico, quanto na tarefa de aprendizado. Agrupamento de dados Aprendizado de máquina Aprendizado semissupervisionado Community detection Complex networks Data clustering Detecção de comunidades Machine learning Redes complexas Semi-supervised learning
57	Análise de agrupamentos baseada na topologia dos dados e em mapas auto-organizáveis. / Data clustering based on data topology and self organizing-maps. Boscarioli, Clodis 16 May 2008 (has links) Cada vez mais, na conjuntura das grandes tomadas de decisões, a análise de dados massivamente armazenados se torna uma necessidade das mais variadas áreas de conhecimento. A análise de dados envolve a realização de diferentes tarefas, que podem ser realizadas por diferentes técnicas e estratégias como análise de agrupamento de dados. Esta pesquisa enfatiza a realização da tarefa de análise de agrupamento de dados (Data Clustering) usando SOM (Self-Organizing Maps) como principal artefato. SOM é uma rede neural artificial baseada em aprendizado competitivo e não-supervisionado, o que significa que o treinamento é inteiramente guiado pelos dados e que os neurônios do mapa competem entre si. Essa rede neural possui a habilidade de formar mapeamentos que quantizam os dados, preservando a sua topologia. Este trabalho introduz uma nova metodologia de análise de agrupamentos a partir de SOM, que considera o mapa topológico gerado por ele e a topologia dos dados no processo de agrupamento. Uma análise experimental e comparativa é apresentada, evidenciando a potencialidade da proposta, destacando, por fim, as principais contribuições do trabalho. / More than ever, in environment of large decision making, the analysis of data stored massively becomes a real need in almost all knowledge areas. The data analyzing process covers the performing of different tasks that can be executed for different techniques and strategies as the data clustering analysis. This research is focused on the analysis task of data groups, called Data Clustering using Self Organizing Maps (SOM) as principal artifact. SOM is an artificial neural network based on competitive and unsupervised learning, what means that its training is entirely driven by the data, such the neurons of the map compete themselves for doing it. This neural network has the ability to build the mapping task that quantifies the source data, but preserving the topology. This work introduces a new clustering analysis methodology based on SOM, considering the topological map produced by it and also the topology of the data obtained in the clustering process. The experimental and comparative analysis are also presented to demonstrate the potential of the proposal, highlighting at the end the mainly contributions of the work. Análise de agrupamentos Análise exploratória de dados Data clustering Data mining Descoberta de conhecimento Exploratory data analysis Knowledge discovery Mapas Auto-organizáveis (SOM) Mineração de dados Self-organizing Maps (SOM)
58	Categorização de imagens médicas baseada em transformada wavelet e mapas auto-organizáveis. / Medical image categorization based in wavelet transform and self-organizing maps. Silva, Leandro Augusto da 25 March 2009 (has links) Nos tempos atuais, as imagens médicas são fonte de dados fundamentais na medicina moderna. As imagens armazenadas em uma base de dados de acordo com as respectivas categorias são um importante passo para aplicações como mineração de dados e recuperação de imagens por conteúdo. Estas aplicações podem apoiar médicos e estudantes na decisão de diagnóstico, permitir pesquisas e ser usadas como material didático. O trabalho propõe o uso de Mapas Auto-Organizáveis (SOM) e TransformadaWavelet combinada com momentos de Hu para a categorização de imagens médicas. Para tanto, são realizados experimentos para definição do tamanho do mapa SOM, uso do mesmo na categorização, definição da melhor família wavelet e nível de decomposição, sumarização dos coeficientes wavelets descartados por momento de Hu e experimentos comparativos com outras abordagens de categorização. Além dos experimentos de classificação comparativos em termos de taxa de acerto, é apresentada uma proposta de contribuição para uso do Mapa SOM na classificação. Nesta proposta, os resultados de classificação e o tempo de recurso computacional despendido pelo Mapa SOM mostram-se eficientes, quando comparados aos resultados e tempo apresentados pelo tradicional classificador K vizinhos mais próximos. / Nowadays, images are fundamental data source in modern medicine. The images stored in a database according with categories are an important step for data mining and contentbased image retrieval. They can support doctors and students in diagnostic decisions and provide research and didactic material. This work addresses the use of Self-Organizing Map (SOM) and discrete wavelet transform joint with Hus moments to medical image categorization. Furthermore, extensive experiments to define map size were done, employing the map in categorization, the best wavelet family and level of decomposition were defined, the coefficient discarded was summarized by Hus moments and contrastive studies with another successfull approach of categorization were done. Moreover, an approach to use SOM map in categorization is addressed, in which the SOM map for classification carried on better performance and computational time than traditional K nearest neighbor algorithm. Análise de conglomerados Análise de dados Análise de ondaletas Artificial neural networks Data clustering Descoberta de conhecimento Imagem digital (sistemas; processos) Medical image Mineração de dados Redes neurais (classificação) Wavelet transform
59	Categorização de imagens médicas baseada em transformada wavelet e mapas auto-organizáveis. / Medical image categorization based in wavelet transform and self-organizing maps. Leandro Augusto da Silva 25 March 2009 (has links) Nos tempos atuais, as imagens médicas são fonte de dados fundamentais na medicina moderna. As imagens armazenadas em uma base de dados de acordo com as respectivas categorias são um importante passo para aplicações como mineração de dados e recuperação de imagens por conteúdo. Estas aplicações podem apoiar médicos e estudantes na decisão de diagnóstico, permitir pesquisas e ser usadas como material didático. O trabalho propõe o uso de Mapas Auto-Organizáveis (SOM) e TransformadaWavelet combinada com momentos de Hu para a categorização de imagens médicas. Para tanto, são realizados experimentos para definição do tamanho do mapa SOM, uso do mesmo na categorização, definição da melhor família wavelet e nível de decomposição, sumarização dos coeficientes wavelets descartados por momento de Hu e experimentos comparativos com outras abordagens de categorização. Além dos experimentos de classificação comparativos em termos de taxa de acerto, é apresentada uma proposta de contribuição para uso do Mapa SOM na classificação. Nesta proposta, os resultados de classificação e o tempo de recurso computacional despendido pelo Mapa SOM mostram-se eficientes, quando comparados aos resultados e tempo apresentados pelo tradicional classificador K vizinhos mais próximos. / Nowadays, images are fundamental data source in modern medicine. The images stored in a database according with categories are an important step for data mining and contentbased image retrieval. They can support doctors and students in diagnostic decisions and provide research and didactic material. This work addresses the use of Self-Organizing Map (SOM) and discrete wavelet transform joint with Hus moments to medical image categorization. Furthermore, extensive experiments to define map size were done, employing the map in categorization, the best wavelet family and level of decomposition were defined, the coefficient discarded was summarized by Hus moments and contrastive studies with another successfull approach of categorization were done. Moreover, an approach to use SOM map in categorization is addressed, in which the SOM map for classification carried on better performance and computational time than traditional K nearest neighbor algorithm. Análise de conglomerados Análise de dados Análise de ondaletas Descoberta de conhecimento Imagem digital (sistemas; processos) Mineração de dados Redes neurais (classificação) Artificial neural networks Data clustering Medical image Wavelet transform
60	Análise de agrupamentos baseada na topologia dos dados e em mapas auto-organizáveis. / Data clustering based on data topology and self organizing-maps. Clodis Boscarioli 16 May 2008 (has links) Cada vez mais, na conjuntura das grandes tomadas de decisões, a análise de dados massivamente armazenados se torna uma necessidade das mais variadas áreas de conhecimento. A análise de dados envolve a realização de diferentes tarefas, que podem ser realizadas por diferentes técnicas e estratégias como análise de agrupamento de dados. Esta pesquisa enfatiza a realização da tarefa de análise de agrupamento de dados (Data Clustering) usando SOM (Self-Organizing Maps) como principal artefato. SOM é uma rede neural artificial baseada em aprendizado competitivo e não-supervisionado, o que significa que o treinamento é inteiramente guiado pelos dados e que os neurônios do mapa competem entre si. Essa rede neural possui a habilidade de formar mapeamentos que quantizam os dados, preservando a sua topologia. Este trabalho introduz uma nova metodologia de análise de agrupamentos a partir de SOM, que considera o mapa topológico gerado por ele e a topologia dos dados no processo de agrupamento. Uma análise experimental e comparativa é apresentada, evidenciando a potencialidade da proposta, destacando, por fim, as principais contribuições do trabalho. / More than ever, in environment of large decision making, the analysis of data stored massively becomes a real need in almost all knowledge areas. The data analyzing process covers the performing of different tasks that can be executed for different techniques and strategies as the data clustering analysis. This research is focused on the analysis task of data groups, called Data Clustering using Self Organizing Maps (SOM) as principal artifact. SOM is an artificial neural network based on competitive and unsupervised learning, what means that its training is entirely driven by the data, such the neurons of the map compete themselves for doing it. This neural network has the ability to build the mapping task that quantifies the source data, but preserving the topology. This work introduces a new clustering analysis methodology based on SOM, considering the topological map produced by it and also the topology of the data obtained in the clustering process. The experimental and comparative analysis are also presented to demonstrate the potential of the proposal, highlighting at the end the mainly contributions of the work. Análise de agrupamentos Análise exploratória de dados Descoberta de conhecimento Mapas Auto-organizáveis (SOM) Mineração de dados Data clustering Data mining Exploratory data analysis Knowledge discovery Self-organizing Maps (SOM)

Search results