161 |
Metaheurísticas para o problema de agrupamento de dados em grafo / Metaheuristics for the graph clustering problemMariá Cristina Vasconcelos Nascimento 26 February 2010 (has links)
O problema de agrupamento de dados em grafos consiste em encontrar clusters de nós em um dado grafo, ou seja, encontrar subgrafos com alta conectividade. Esse problema pode receber outras nomenclaturas, algumas delas são: problema de particionamento de grafos e problema de detecção de comunidades. Para modelar esse problema, existem diversas formulações matemáticas, cada qual com suas vantagens e desvantagens. A maioria dessas formulações tem como desvantagem a necessidade da definição prévia do número de grupos que se deseja obter. Entretanto, esse tipo de informação não está contida em dados para agrupamento, ou seja, em dados não rotulados. Esse foi um dos motivos da popularização nas últimas décadas da medida conhecida como modularidade, que tem sido maximizada para encontrar partições em grafos. Essa formulação, além de não exigir a definição prévia do número de clusters, se destaca pela qualidade das partições que ela fornece. Nesta Tese, metaheurísticas Greedy Randomized Search Procedures para dois modelos existentes para agrupamento em grafos foram propostas: uma para o problema de maximização da modularidade e a outra para o problema de maximização da similaridade intra-cluster. Os resultados obtidos por essas metaheurísticas foram melhores quando comparadas àqueles de outras heurísticas encontradas na literatura. Entretanto, o custo computacional foi alto, principalmente o da metaheurística para o modelo de maximização da modularidade. Com o passar dos anos, estudos revelaram que a formulação que maximiza a modularidade das partições possui algumas limitações. A fim de promover uma alternativa à altura do modelo de maximização da modularidade, esta Tese propõe novas formulações matemáticas de agrupamento em grafos com e sem pesos que visam encontrar partições cujos clusters apresentem alta conectividade. Além disso, as formulações propostas são capazes de prover partições sem a necessidade de definição prévia do número de clusters. Testes com centenas de grafos com pesos comprovaram a eficiência dos modelos propostos. Comparando as partições provenientes de todos os modelos estudados nesta Tese, foram observados melhores resultados em uma das novas formulações propostas, que encontrou partições bastante satisfatórias, superiores às outras existentes, até mesmo para a de maximização de modularidade. Os resultados apresentaram alta correlação com a classificação real dos dados simulados e reais, sendo esses últimos, em sua maioria, de origem biológica / Graph clustering aims at identifying highly connected groups or clusters of nodes of a graph. This problem can assume others nomenclatures, such as: graph partitioning problem and community detection problem. There are many mathematical formulations to model this problem, each one with advantages and disadvantages. Most of these formulations have the disadvantage of requiring the definition of the number of clusters in the final partition. Nevertheless, this type of information is not found in graphs for clustering, i.e., whose data are unlabeled. This is one of the reasons for the popularization in the last decades of the measure known as modularity, which is being maximized to find graph partitions. This formulation does not require the definition of the number of clusters of the partitions to be produced, and produces high quality partitions. In this Thesis, Greedy Randomized Search Procedures metaheuristics for two existing graph clustering mathematical formulations are proposed: one for the maximization of the partition modularity and the other for the maximization of the intra-cluster similarity. The results obtained by these proposed metaheuristics outperformed the results from other heuristics found in the literature. However, their computational cost was high, mainly for the metaheuristic for the maximization of modularity model. Along the years, researches revealed that the formulation that maximizes the modularity of the partitions has some limitations. In order to promote a good alternative for the maximization of the partition modularity model, this Thesis proposed new mathematical formulations for graph clustering for weighted and unweighted graphs, aiming at finding partitions with high connectivity clusters. Furthermore, the proposed formulations are able to provide partitions without a previous definition of the true number of clusters. Computational tests with hundreds of weighted graphs confirmed the efficiency of the proposed models. Comparing the partitions from all studied formulations in this Thesis, it was possible to observe that the proposed formulations presented better results, even better than the maximization of partition modularity. These results are characterized by satisfactory partitions with high correlation with the true classification for the simulated and real data (mostly biological)
|
162 |
Finding Anomalous Energy ConsumersUsing Time Series Clustering in the Swedish Energy MarketTonneman, Lukas January 2023 (has links)
Improving the energy efficiency of buildings is important for many reasons. There is a large body of data detailing the hourly energy consumption of buildings. This work studies a large data set from the Swedish energy market. This thesis proposes a data analysis methodology for identifying abnormal consumption patterns using two steps of clustering. First, typical weekly energy usage profiles are extracted from each building by clustering week-long segments of the building’s lifetime consumption, and by extracting the medoids of the clusters. Second, all the typical weekly energyusage profiles are clustered using agglomerative hierarchical clustering. Large clusters are assumed to contain normal consumption pattens, and small clusters are assumed to have abnormal patterns. Buildings with a large presence in small clusters are said to be abnormal, and vice versa. The method employs Dynamic Time Warping distance for dissimilarity measure. Using a set of 160 buildings, manually classified by domain experts, this thesis shows that the mean abnormality-score is higher for abnormal buildings compared to normal buildings with p ≈ 0.0036.
|
163 |
Analysis of Meso-scale Structures in Weighted GraphsSardana, Divya January 2017 (has links)
No description available.
|
164 |
Classification océanique non dirigée des provinces biogéochimiqes de l'Atlantique Nord par télédétectionCourtemanche, Bruno January 2013 (has links)
Résumé : La cartographie des bio-régions des océans est d'une importance clé pour permettre une meilleure compréhension des dynamiques des écosystèmes qui y sont présents et permettre une saine gestion de ceux-ci. Les classifications actuelles utilisent les mêmes combinaisons d'attributs soit : la bathymétrie, la température de surface, la concentration en chlorophylle a et certaines luminances normalisées (443 nm, 520 nm 550 nm). L'utilisation de la variabilité de 2e ordre du signal optique de la chlorophylle a a permis de mettre en évidence d'autres attributs globaux, indépendants de la concentration en chlorophylle a, ouvrant la porte à de nouvelles démarches de classification non dirigée des océans en provinces biogéochimiques. L'objectif de l'étude est de développer une méthode de classification dynamique, non dirigée des provinces océaniques en utilisant une combinaison de données satellitaires, soit : les signatures optiques des constituants biochimiques présents dans l'océan et les propriétés physiques des masses d'eau selon une nouvelle approche intégrant à la fois des informations complémentaires et indépendantes de la chlorophylle a. Le but étant d'effectuer la classification des provinces océaniques de l'Atlantique Nord pour la période de disponibilité des données MODIS Aqua (2002-2012) et de déterminer l'évolution spatiale des provinces océaniques et leur succession au fil du temps. L'application de différentes techniques de classification a été réalisée sur deux jeux de données mis en place pour les besoins de l'étude. Les résultats montrent que la méthode K-mean et la méthode DBSCAN ne sont pas appropriées pour classifier de manière dynamique les provinces bio-optiques de l'Atlantique Nord. Une nouvelle méthode de classification : PRODENCAN, a été développée pour combler les lacunes de ces techniques. Les résultats obtenus par cette méthode permettent de confirmer le potentiel d'améliorer la classification océanique par l'utilisation de la variabilité de 2e ordre du signal optique de la chlorophylle a mais n'ont pas permis la création d'un patron de classification dynamique pour l'Atlantique Nord. Ceux-ci permettent de préciser le processus de résolution de ce problème par l'implémentation d'un jeu de données spécifiquement choisi d'un point de vue spatial et temporel. L'analyse dynamique a permis de confirmer le potentiel de l'utilisation de la variabilité de 2e ordre du signale optique de la chlorophylle a combinée à la température de surface de l'eau et de la concentration en chlorophylle a pour mieux définir des régions bio-optiques ayant des signatures phénologiques distinctives.||Abstract : Mapping bioregions of the oceans is of key importance for a better understanding the dynamics of ecosystems in oceans and ensure the adequate management of them. Actual existing classifications use the same combinations of attributes including: bathymetry, sea surface temperature, chlorophyll concentration and certain standard luminance (443 nm, 520 nm 550 nm). The use of second order variability of optical signals from chlorophyll a suggest other possible global attributes, independent of chlorophyll a concentration, opening doors to new approaches in unsupervised classification of oceans biogechimical provinces. The objective of the study is to develop a method of ocean provinces dynamic unsupervised classification, using a combination of satellite data as : optical signatures of biochemical constituents in the ocean and the physical properties of water masses according to a new approach that integrates both information complementary and independent of chlorophyll a. The goal is to perform the classification of oceanic provinces of the North Atlantic for the availability period of MODIS Aqua (2002-2012) and to determine the spatial evolution of oceanic provinces and their succession over time. Different techniques of classification were carried out on two data sets developed for the purposes of the study. The results show that the K-mean and DBSCAN method are not appropriate to perform bio-optical provinces dynamic classification of the North Atlantic. A new method of classification: PRODENCAN was developed to fill the gaps of these techniques. The results obtained by this method can confirm the potential to improve the classification by the use of second order variability of chlorophyll a optical signals but have not yet led to the creation of a dynamic pattern classification for North Atlantic. Nevertheless, they allow to specify the process for solving this problem by implementing a set of specifically training data spatially and temporally chosen. Dynamic analysis has confirmed the potential for the use of second order variability of chlorophyll a optical signals combined with sea surface temperature and the chlorophyll a concentration to better define bio-optical regions with distinctive phenology signatures. [symboles non conformes]
|
165 |
From spatio-temporal data to a weighted and lagged network between functional domains: Applications in climate and neuroscienceFountalis, Ilias 27 May 2016 (has links)
Spatio-temporal data have become increasingly prevalent and important for both science and enterprises. Such data are typically embedded in a grid with a resolution larger than the true dimensionality of the underlying system. One major task is to identify the distinct semi-autonomous functional components of the spatio-temporal system and to infer their interconnections. In this thesis, we propose two methods that identify the functional components of a spatio-temporal system. Next, an edge inference process identifies the possibly lagged and weighted connections between the system’s components. The weight of an edge accounts for the magnitude of the interaction between two components; the lag associated with each edge accounts for the temporal ordering of these interactions.
The first method, geo-Cluster, infers the spatial components as “areas”; spatially contiguous, non-overlapping, sets of grid cells satisfying a homogeneity constraint in terms of their average pair-wise cross-correlation. However, in real physical systems the underlying physical components might overlap. To this end we also propose
δ-MAPS, a method that first identifies the epicenters of activity of the functional components of the system and then creates domains – spatially contiguous, possibly overlapping, sets of grid cells that satisfy the same homogeneity constraint.
The proposed framework is applied in climate science and neuroscience. We show how these methods can be used to evaluate cutting edge climate models and identify lagged relationships between different climate regions. In the context of neuroscience, the method successfully identifies well-known “resting state networks” as well as a few areas forming the backbone of the functional cortical network. Finally, we contrast the proposed methods to dimensionality reduction techniques (e.g., clustering PCA/ICA) and show their limitations.
|
166 |
Alzheimer's disease heterogeneity assessment using high dimensional clustering techniquesPoulakis, Konstantinos January 2016 (has links)
This thesis sets out to investigate the Alzheimer's disease (AD) heterogeneity in an unsupervised framework. Different subtypes of AD were identified in the past from a number of studies. The major objective of the thesis is to apply clustering methods that are specialized in coping with high dimensional data sets, in a sample of AD patients. The evaluation of these clustering methods and the interpretation of the clustered groups from a statistical and a medical point of view, are some of the additional objectives. The data consist of 271 MRI images of AD patients from the AddNeuroMed and the ADNI cohorts. The raw MRI's have been preprocessed with the software Freesurfer and 82 cortical and subcortical volumes have been extracted for the needs of the analysis. The effect of different strategies in the initialization of a modified Gaussian Mixed Model (GMM) (Bouveyron et al, 2007) has been studied. Additionally, the GMM and a Bayesian clustering method proposed by Nia (2009) have been compared with respect to their performances in various distance based evaluation criteria. The later method resulted in the most compact and isolated clusters. The optimal numbers of clusters was evaluated with the Hopkins statistic and 6 clusters were decided while 2 observations formed an outlier cluster. Different patterns of atrophy were discovered in the 6 clusters. One cluster presented atrophy in the medial temporal area only (n=37,~13.65%). Another cluster resented atrophy in the lateral and medial temporal lobe and parts of the parietal lobe (n=39,~14.4%). A third cluster presented atrophy in temporoparietal areas but also in the frontal lobe (n=74,~27.3%). The remaining three clusters presented diffuse atrophy in nearly all the association cortices with some variation in the patterns (n1=40,~14.7%,n2=58,~21.4,n3=21,7.7%). The 6 subtypes also differed in their demographical, clinical and pathological features.
|
167 |
A Multicriteria Perspective on Reverse AuctionsDe Smet, Yves 20 December 2005 (has links)
Investigate the use of partial relations for multicriteria reverse auctions. At first, a theoretical framework is introduced. Then, an extension of traditional multicriteria tools is considered. This is referred to as the Butterfly model. Finally, the concept of Bidding Niches partitions is formalized and tested.
|
168 |
The changing geographical spread of corporate technological activity in Europe : the dynamics of corporate technological strategies and the hierarchy of innovative centresJanne, Odile E. M. January 2000 (has links)
No description available.
|
169 |
A measurement of the colour factors of quantum chromodynamics from four-jet events at LEPDorris, Simon James January 1997 (has links)
No description available.
|
170 |
Binary space partitioning for accelerated hidden surface removal and rendering of static environmentsJames, Adam January 1999 (has links)
No description available.
|
Page generated in 0.1004 seconds