Global ETD Search

151	Development of Partially Supervised Kernel-based Proximity Clustering Frameworks and Their Applications Graves, Daniel 06 1900 (has links) The focus of this study is the development and evaluation of a new partially supervised learning framework. This framework belongs to an emerging field in machine learning that augments unsupervised learning processes with some elements of supervision. It is based on proximity fuzzy clustering, where an active learning process is designed to query for the domain knowledge required in the supervision. Furthermore, the framework is extended to the parametric optimization of the kernel function in the proximity fuzzy clustering algorithm, where the goal is to achieve interesting non-spherical cluster structures through a non-linear mapping. It is demonstrated that the performance of kernel-based clustering is sensitive to the selection of these kernel parameters. Proximity hints procured from domain knowledge are exploited in the partially supervised framework. The theoretic developments with proximity fuzzy clustering are evaluated in several interesting and practical applications. One such problem is the clustering of a set of graphs based on their structural and semantic similarity. The segmentation of music is a second problem for proximity fuzzy clustering, where the aim is to determine the points in time, i.e. boundaries, of significant structural changes in the music. Finally, a time series prediction problem using a fuzzy rule-based system is established and evaluated. The antecedents of the rules are constructed by clustering the time series using proximity information in order to localize the behavior of the rule consequents in the architecture. Evaluation of these efforts on both synthetic and real-world data demonstrate that proximity fuzzy clustering is well suited for a variety of problems. / Digital Signals and Image Processing Partially supervised learning Fuzzy clustering Proximity hints Kernel-based clustering Active learning Multi-proximity clustering Time series analysis Time series clustering Structural musical segmentation Graph clustering
152	Development of Partially Supervised Kernel-based Proximity Clustering Frameworks and Their Applications Graves, Daniel Unknown Date No description available. Partially supervised learning Fuzzy clustering Proximity hints Kernel-based clustering Active learning Multi-proximity clustering Time series analysis Time series clustering Structural musical segmentation Graph clustering
153	Efficient Hierarchical Clustering Techniques For Pattern Classification Vijaya, P A 07 1900 (has links) (PDF) No description available. Hierarchical Clustering Pattern Classification Clustering Techniques (Computer Science) Clustering Algorithms Cluster Analysis Incremental Clustering Hierarchlcal Clustering Algorithm Protein Sequence Classification Computer Science
154	Metaheurísticas para o problema de agrupamento de dados em grafo / Metaheuristics for the graph clustering problem Nascimento, Mariá Cristina Vasconcelos 26 February 2010 (has links) O problema de agrupamento de dados em grafos consiste em encontrar clusters de nós em um dado grafo, ou seja, encontrar subgrafos com alta conectividade. Esse problema pode receber outras nomenclaturas, algumas delas são: problema de particionamento de grafos e problema de detecção de comunidades. Para modelar esse problema, existem diversas formulações matemáticas, cada qual com suas vantagens e desvantagens. A maioria dessas formulações tem como desvantagem a necessidade da definição prévia do número de grupos que se deseja obter. Entretanto, esse tipo de informação não está contida em dados para agrupamento, ou seja, em dados não rotulados. Esse foi um dos motivos da popularização nas últimas décadas da medida conhecida como modularidade, que tem sido maximizada para encontrar partições em grafos. Essa formulação, além de não exigir a definição prévia do número de clusters, se destaca pela qualidade das partições que ela fornece. Nesta Tese, metaheurísticas Greedy Randomized Search Procedures para dois modelos existentes para agrupamento em grafos foram propostas: uma para o problema de maximização da modularidade e a outra para o problema de maximização da similaridade intra-cluster. Os resultados obtidos por essas metaheurísticas foram melhores quando comparadas àqueles de outras heurísticas encontradas na literatura. Entretanto, o custo computacional foi alto, principalmente o da metaheurística para o modelo de maximização da modularidade. Com o passar dos anos, estudos revelaram que a formulação que maximiza a modularidade das partições possui algumas limitações. A fim de promover uma alternativa à altura do modelo de maximização da modularidade, esta Tese propõe novas formulações matemáticas de agrupamento em grafos com e sem pesos que visam encontrar partições cujos clusters apresentem alta conectividade. Além disso, as formulações propostas são capazes de prover partições sem a necessidade de definição prévia do número de clusters. Testes com centenas de grafos com pesos comprovaram a eficiência dos modelos propostos. Comparando as partições provenientes de todos os modelos estudados nesta Tese, foram observados melhores resultados em uma das novas formulações propostas, que encontrou partições bastante satisfatórias, superiores às outras existentes, até mesmo para a de maximização de modularidade. Os resultados apresentaram alta correlação com a classificação real dos dados simulados e reais, sendo esses últimos, em sua maioria, de origem biológica / Graph clustering aims at identifying highly connected groups or clusters of nodes of a graph. This problem can assume others nomenclatures, such as: graph partitioning problem and community detection problem. There are many mathematical formulations to model this problem, each one with advantages and disadvantages. Most of these formulations have the disadvantage of requiring the definition of the number of clusters in the final partition. Nevertheless, this type of information is not found in graphs for clustering, i.e., whose data are unlabeled. This is one of the reasons for the popularization in the last decades of the measure known as modularity, which is being maximized to find graph partitions. This formulation does not require the definition of the number of clusters of the partitions to be produced, and produces high quality partitions. In this Thesis, Greedy Randomized Search Procedures metaheuristics for two existing graph clustering mathematical formulations are proposed: one for the maximization of the partition modularity and the other for the maximization of the intra-cluster similarity. The results obtained by these proposed metaheuristics outperformed the results from other heuristics found in the literature. However, their computational cost was high, mainly for the metaheuristic for the maximization of modularity model. Along the years, researches revealed that the formulation that maximizes the modularity of the partitions has some limitations. In order to promote a good alternative for the maximization of the partition modularity model, this Thesis proposed new mathematical formulations for graph clustering for weighted and unweighted graphs, aiming at finding partitions with high connectivity clusters. Furthermore, the proposed formulations are able to provide partitions without a previous definition of the true number of clusters. Computational tests with hundreds of weighted graphs confirmed the efficiency of the proposed models. Comparing the partitions from all studied formulations in this Thesis, it was possible to observe that the proposed formulations presented better results, even better than the maximization of partition modularity. These results are characterized by satisfactory partitions with high correlation with the true classification for the simulated and real data (mostly biological) Agrupamento de dados em grafos Clustering coefficient Clustering Coefficient Community detection Detecção de comunidades Graph clustering GRASP GRASP Modularidade Modularity
155	Fuzzy Unequal Clustering In Wireless Sensor Networks Bagci, Hakan 01 January 2010 (has links) (PDF) In order to gather information more efficiently, wireless sensor networks are partitioned into clusters. The most of the proposed clustering algorithms do not consider the location of the base station. This situation causes hot spots problem in multi-hop wireless sensor networks. Unequal clustering mechanisms, which are designed by considering the base station location, solve this problem. In this thesis, we propose a fuzzy unequal clustering algorithm (EAUCF) which aims to prolong the lifetime of wireless sensor networks. EAUCF adjusts the cluster-head radius considering the residual energy and the distance to the base station parameters of the sensor nodes. This helps decreasing the intra-cluster work of the sensor nodes which are closer to the base station or have lower battery level. We utilize fuzzy logic for handling the uncertainties in cluster-head radius estimation. We compare our algorithm with some popular algorithms in literature, namely LEACH, CHEF and EEUC, according to First Node Dies (FND), Half of the Nodes Alive (HNA) and energy-efficiency metrics. Our simulation results show that EAUCF performs better than other algorithms in most of the cases considering FND, HNA and energy-efficiency. Therefore, our proposed algorithm is a stable and energy-efficient clustering algorithm.
156	Agrupamento de faces em vídeos digitais. MOURA, Eduardo Santiago. 06 June 2018 (has links) Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-06-06T11:40:34Z No. of bitstreams: 1 EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5) / Made available in DSpace on 2018-06-06T11:40:34Z (GMT). No. of bitstreams: 1 EDUARDO SANTIAGO MOURA - TESE (PPGCC) 2016.pdf: 4888830 bytes, checksum: b0fd54b306e9a1dfeb9e68ce43716fa2 (MD5) Previous issue date: 2016 / Faces humanas são algumas das entidades mais importantes frequentemente encontradas em vídeos. Devido ao substancial volume de produção e consumo de vídeos digitais na atualidade (tanto vídeos pessoais quanto provenientes das indústrias de comunicação e entretenimento), a extração automática de informações relevantes de tais vídeos se tornou um tema ativo de pesquisa. Parte dos esforços realizados nesta área tem se concentrado no uso do reconhecimento e agrupamento facial para auxiliar o processo de anotação automática de faces em vídeos. No entanto, algoritmos de agrupamento de faces atuais ainda não são robustos às variações de aparência de uma mesma face em situações de aquisição típicas. Neste contexto, o problema abordado nesta tese é o agrupamento de faces em vídeos digitais, com a proposição de nova abordagem com desempenho superior (em termos de qualidade do agrupamento e custo computacional) em relação ao estado-da-arte, utilizando bases de vídeos de referência da literatura. Com fundamentação em uma revisão bibliográfica sistemática e em avaliações experimentais, chegou-se à proposição da abordagem, a qual é constituída por módulos de pré-processamento, detecção de faces, rastreamento, extração de características, agrupamento, análise de similaridade temporal e reagrupamento espacial. A abordagem de agrupamento de faces proposta alcançou os objetivos planejados obtendo resultados superiores (no tocante a diferentes métricas) a métodos avaliados utilizando as bases de vídeos YouTube Celebrities (KIM et al., 2008) e SAIVT-Bnews (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013). / Human faces are some of the most important entities frequently encountered in videos. As a result of the currently high volumes of digital videos production and consumption both personal and profissional videos, automatic extraction of relevant information from those videos has become an active research topic. Many efforts in this area have focused on the use of face clustering and recognition in order to aid with the process of annotating faces in videos. However, current face clustering algorithms are not robust to variations of appearance that a same face may suffer due to typical changes in acquisition scenarios. Hence, this thesis proposes a novel approach to the problem of face clustering in digital videos which achieves superior performance (in terms of clustering quality and computational cost) in comparison to the state-of-the-art, using reference video databases according to the literature. After performing a systematic literature review and experimental evaluations, the current approach has been proposed, which has the following modules: preprocessing, face detection, tracking, feature extraction, clustering, temporal similarity analysis, and spatial reclustering. The proposed approach for face clustering achieved the planned objectives obtaining better results (according to different metrics) than those presented by methods evaluated on the YouTube Celebrities videos dataset (KIM et al., 2008) and SAIVT-Bnews videos dataset (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013). Ciências Ciência da Computação Agrupamento de Faces em Vídeos Agrupamento Aglomerativo Hierárquico Avaliação de Agrupamento Video Face Clustering Hierarchical Agglomerative Clustering Clustering Evaluation
157	Constrained clustering by constraint programming / Classification non supervisée sous contrainte utilisateurs par la programmation par contraintes Duong, Khanh-Chuong 10 December 2014 (has links) La classification non supervisée, souvent appelée par le terme anglais de clustering, est une tâche importante en Fouille de Données. Depuis une dizaine d'années, la classification non supervisée a été étendue pour intégrer des contraintes utilisateur permettant de modéliser des connaissances préalables dans le processus de clustering. Différents types de contraintes utilisateur peuvent être considérés, des contraintes pouvant porter soit sur les clusters, soit sur les instances. Dans cette thèse, nous étudions le cadre de la Programmation par Contraintes (PPC) pour modéliser les tâches de clustering sous contraintes utilisateur. Utiliser la PPC a deux avantages principaux : la déclarativité, qui permet d'intégrer aisément des contraintes utilisateur et la capacité de trouver une solution optimale qui satisfait toutes les contraintes (s'il en existe). Nous proposons deux modèles basés sur la PPC pour le clustering sous contraintes utilisateur. Les modèles sont généraux et flexibles, ils permettent d'intégrer des contraintes d'instances must-link et cannot-link et différents types de contraintes sur les clusters. Ils offrent également à l'utilisateur le choix entre différents critères d'optimisation. Afin d'améliorer l'efficacité, divers aspects sont étudiés. Les expérimentations sur des bases de données classiques et variées montrent qu'ils sont compétitifs par rapport aux approches exactes existantes. Nous montrons que nos modèles peuvent être intégrés dans une procédure plus générale et nous l'illustrons par la recherche de la frontière de Pareto dans un problème de clustering bi-critère sous contraintes utilisateur. / Cluster analysis is an important task in Data Mining with hundreds of different approaches in the literature. Since the last decade, the cluster analysis has been extended to constrained clustering, also called semi-supervised clustering, so as to integrate previous knowledge on data to clustering algorithms. In this dissertation, we explore Constraint Programming (CP) for solving the task of constrained clustering. The main principles in CP are: (1) users specify declaratively the problem in a Constraint Satisfaction Problem; (2) solvers search for solutions by constraint propagation and search. Relying on CP has two main advantages: the declarativity, which enables to easily add new constraints and the ability to find an optimal solution satisfying all the constraints (when there exists one). We propose two models based on CP to address constrained clustering tasks. The models are flexible and general and supports instance-level constraints and different cluster-level constraints. It also allows the users to choose among different optimization criteria. In order to improve the efficiency, different aspects have been studied in the dissertation. Experiments on various classical datasets show that our models are competitive with other exact approaches. We show that our models can easily be embedded in a more general process and we illustrate this on the problem of finding the Pareto front of a bi-criterion optimization process. Classification non supervisée Contraintes utilisateur Programmation par contraintes Clustering bi-critère Constrained clustering Filtering algorithm Constraint programming Bicriterion clustering 005.116
158	Metaheurísticas para o problema de agrupamento de dados em grafo / Metaheuristics for the graph clustering problem Mariá Cristina Vasconcelos Nascimento 26 February 2010 (has links) O problema de agrupamento de dados em grafos consiste em encontrar clusters de nós em um dado grafo, ou seja, encontrar subgrafos com alta conectividade. Esse problema pode receber outras nomenclaturas, algumas delas são: problema de particionamento de grafos e problema de detecção de comunidades. Para modelar esse problema, existem diversas formulações matemáticas, cada qual com suas vantagens e desvantagens. A maioria dessas formulações tem como desvantagem a necessidade da definição prévia do número de grupos que se deseja obter. Entretanto, esse tipo de informação não está contida em dados para agrupamento, ou seja, em dados não rotulados. Esse foi um dos motivos da popularização nas últimas décadas da medida conhecida como modularidade, que tem sido maximizada para encontrar partições em grafos. Essa formulação, além de não exigir a definição prévia do número de clusters, se destaca pela qualidade das partições que ela fornece. Nesta Tese, metaheurísticas Greedy Randomized Search Procedures para dois modelos existentes para agrupamento em grafos foram propostas: uma para o problema de maximização da modularidade e a outra para o problema de maximização da similaridade intra-cluster. Os resultados obtidos por essas metaheurísticas foram melhores quando comparadas àqueles de outras heurísticas encontradas na literatura. Entretanto, o custo computacional foi alto, principalmente o da metaheurística para o modelo de maximização da modularidade. Com o passar dos anos, estudos revelaram que a formulação que maximiza a modularidade das partições possui algumas limitações. A fim de promover uma alternativa à altura do modelo de maximização da modularidade, esta Tese propõe novas formulações matemáticas de agrupamento em grafos com e sem pesos que visam encontrar partições cujos clusters apresentem alta conectividade. Além disso, as formulações propostas são capazes de prover partições sem a necessidade de definição prévia do número de clusters. Testes com centenas de grafos com pesos comprovaram a eficiência dos modelos propostos. Comparando as partições provenientes de todos os modelos estudados nesta Tese, foram observados melhores resultados em uma das novas formulações propostas, que encontrou partições bastante satisfatórias, superiores às outras existentes, até mesmo para a de maximização de modularidade. Os resultados apresentaram alta correlação com a classificação real dos dados simulados e reais, sendo esses últimos, em sua maioria, de origem biológica / Graph clustering aims at identifying highly connected groups or clusters of nodes of a graph. This problem can assume others nomenclatures, such as: graph partitioning problem and community detection problem. There are many mathematical formulations to model this problem, each one with advantages and disadvantages. Most of these formulations have the disadvantage of requiring the definition of the number of clusters in the final partition. Nevertheless, this type of information is not found in graphs for clustering, i.e., whose data are unlabeled. This is one of the reasons for the popularization in the last decades of the measure known as modularity, which is being maximized to find graph partitions. This formulation does not require the definition of the number of clusters of the partitions to be produced, and produces high quality partitions. In this Thesis, Greedy Randomized Search Procedures metaheuristics for two existing graph clustering mathematical formulations are proposed: one for the maximization of the partition modularity and the other for the maximization of the intra-cluster similarity. The results obtained by these proposed metaheuristics outperformed the results from other heuristics found in the literature. However, their computational cost was high, mainly for the metaheuristic for the maximization of modularity model. Along the years, researches revealed that the formulation that maximizes the modularity of the partitions has some limitations. In order to promote a good alternative for the maximization of the partition modularity model, this Thesis proposed new mathematical formulations for graph clustering for weighted and unweighted graphs, aiming at finding partitions with high connectivity clusters. Furthermore, the proposed formulations are able to provide partitions without a previous definition of the true number of clusters. Computational tests with hundreds of weighted graphs confirmed the efficiency of the proposed models. Comparing the partitions from all studied formulations in this Thesis, it was possible to observe that the proposed formulations presented better results, even better than the maximization of partition modularity. These results are characterized by satisfactory partitions with high correlation with the true classification for the simulated and real data (mostly biological) Agrupamento de dados em grafos Clustering Coefficient Detecção de comunidades GRASP Modularidade Clustering coefficient Community detection Graph clustering GRASP Modularity
159	Finding Anomalous Energy ConsumersUsing Time Series Clustering in the Swedish Energy Market Tonneman, Lukas January 2023 (has links) Improving the energy efficiency of buildings is important for many reasons. There is a large body of data detailing the hourly energy consumption of buildings. This work studies a large data set from the Swedish energy market. This thesis proposes a data analysis methodology for identifying abnormal consumption patterns using two steps of clustering. First, typical weekly energy usage profiles are extracted from each building by clustering week-long segments of the building’s lifetime consumption, and by extracting the medoids of the clusters. Second, all the typical weekly energyusage profiles are clustered using agglomerative hierarchical clustering. Large clusters are assumed to contain normal consumption pattens, and small clusters are assumed to have abnormal patterns. Buildings with a large presence in small clusters are said to be abnormal, and vice versa. The method employs Dynamic Time Warping distance for dissimilarity measure. Using a set of 160 buildings, manually classified by domain experts, this thesis shows that the mean abnormality-score is higher for abnormal buildings compared to normal buildings with p ≈ 0.0036. Computer Sciences Datavetenskap (datalogi)
160	Analysis of Meso-scale Structures in Weighted Graphs Sardana, Divya January 2017 (has links) No description available. Computer Science community structure core periphery structure graph clustering protein protein interaction networks semi supervised clustering overlapping clustering

Search results