Spelling suggestions: "subject:"data clustering"" "subject:"mata clustering""
71 |
Módulos computacionais para seleção de variáveis e Análise de agrupamento para definição de zonas de manejo / Computational modules for variable selection and cluster analysis for definition of management zonesGavioli, Alan 17 February 2017 (has links)
Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2017-09-18T14:32:46Z
No. of bitstreams: 1
Alan_Gavioli2017.pdf: 4935513 bytes, checksum: 58816f2871fee27474b2fd5e511826af (MD5) / Made available in DSpace on 2017-09-18T14:32:46Z (GMT). No. of bitstreams: 1
Alan_Gavioli2017.pdf: 4935513 bytes, checksum: 58816f2871fee27474b2fd5e511826af (MD5)
Previous issue date: 2017-02-17 / Two basic activities for the definition of quality management zones (MZs) are the variable
selection task and the cluster analysis task. There are several methods proposed to execute them, but due to their complexity, they need to be made available by computer systems. In this study, 5 methods based on spatial correlation analysis, principal component analysis (PCA) and multivariate spatial analysis based on Moran’s index and PCA (MULTISPATI-PCA) were evaluated. A new variable selection algorithm, named MPCA-SC, based on the combined use of spatial correlation analysis and MULTISPATI-PCA, was proposed. The potential use of 20 clustering algorithms for the generation of MZs was evaluated: average linkage, bagged clustering, centroid linkage, clustering large applications, complete linkage, divisive analysis, fuzzy analysis clustering (fanny), fuzzy c-means, fuzzy c-shells, hard competitive learning, hybrid hierarchical clustering, k-means, McQuitty’s method (mcquitty), median linkage, neural gas, partitioning around medoids, single linkage, spherical k-means, unsupervised fuzzy competitive learning, and Ward’s method. Two computational modules developed to provide the variable selection and data clustering methods for definition of MZs were also presented. The evaluations were conducted with data obtained between 2010 and 2015 in three commercial agricultural areas, cultivated with soybean and corn, in the state of Paraná, Brazil.
The experiments performed to evaluate the 5 variable selection algorithms showed that the
new method MPCA-SC can improve the quality of MZs in several aspects, even obtaining
satisfactory results with the other 4 algorithms. The evaluation experiments of the 20 clustering methods showed that 17 of them were suitable for the delineation of MZs, especially fanny and mcquitty. Finally, it was concluded that the two computational modules developed made it possible to obtain quality MZs. Furthermore, these modules constitute a more complete computer system than other free-to-use software such as FuzME, MZA, and SDUM, in terms of the diversity of variable selection and data clustering algorithms. / A seleção de variáveis e a análise de agrupamento de dados são atividades fundamentais
para a definição de zonas de manejo (ZMs) de qualidade. Para executar essas duas
atividades, existem diversos métodos propostos, que devido à sua complexidade precisam
ser executados por meio da utilização de sistemas computacionais. Neste trabalho, avaliaramse
5 métodos de seleção de variáveis baseados em análise de correlação espacial, análise
de componentes principais (ACP) e análise espacial multivariada baseada no índice de Moran
e em ACP (MULTISPATI-PCA). Propôs-se um novo algoritmo de seleção de variáveis,
denominado MPCA-SC, desenvolvido a partir da aplicação conjunta da análise de correlação
espacial e de MULTISPATI-PCA. Avaliou-se a viabilidade de aplicação de 20 algoritmos de
agrupamento de dados para a geração de ZMs: average linkage, bagged clustering, centroid
linkage, clustering large applications, complete linkage, divisive analysis, fuzzy analysis
clustering (fanny), fuzzy c-means, fuzzy c-shells, hard competitive learning, hybrid hierarchical
clustering, k-means, median linkage, método de McQuitty (mcquitty), método de Ward, neural
gas, partitioning around medoids, single linkage, spherical k-means e unsupervised fuzzy
competitive learning. Apresentaram-se ainda dois módulos computacionais desenvolvidos
para disponibilizar os métodos de seleção de variáveis e de agrupamento de dados para a
definição de ZMs. As avaliações foram realizadas com dados obtidos entre os anos de 2010
e 2015 de três áreas agrícolas comerciais, localizadas no estado do Paraná, nas quais
cultivaram-se milho e soja. Os experimentos efetuados para avaliar os 5 algoritmos de seleção
de variáveis mostraram que o novo método MPCA-SC pode melhorar a qualidade de ZMs em
diversos aspectos, mesmo obtendo-se resultados satisfatórios com os outros 4 algoritmos. Os
experimentos de avaliação dos 20 métodos de agrupamento citados mostraram que 17 deles
foram adequados para o delineamento de ZMs, com destaque para fanny e mcquitty. Por fim,
concluiu-se que os dois módulos computacionais desenvolvidos possibilitaram a obtenção de
ZMs de qualidade. Além disso, esses módulos constituem uma ferramenta computacional
mais abrangente que outros softwares de uso gratuito, como FuzME, MZA e SDUM, em
relação à diversidade de algoritmos disponibilizados para selecionar variáveis e agrupar
dados.
|
72 |
Modélisation et classification dynamique de données temporelles non stationnaires / Dynamic classification and modeling of non-stationary temporal dataEl Assaad, Hani 11 December 2014 (has links)
Cette thèse aborde la problématique de la classification non supervisée de données lorsque les caractéristiques des classes sont susceptibles d'évoluer au cours du temps. On parlera également, dans ce cas, de classification dynamique de données temporelles non stationnaires. Le cadre applicatif des travaux concerne le diagnostic par reconnaissance des formes de systèmes complexes dynamiques dont les classes de fonctionnement peuvent, suite à des phénomènes d'usures, des déréglages progressifs ou des contextes d'exploitation variables, évoluer au cours du temps. Un modèle probabiliste dynamique, fondé à la fois sur les mélanges de lois et sur les modèles dynamiques à espace d'état, a ainsi été proposé. Compte tenu de la structure complexe de ce modèle, une variante variationnelle de l'algorithme EM a été proposée pour l'apprentissage de ses paramètres. Dans la perspective du traitement rapide de flux de données, une version séquentielle de cet algorithme a également été développée, ainsi qu'une stratégie de choix dynamique du nombre de classes. Une série d'expérimentations menées sur des données simulées et des données réelles acquises sur le système d'aiguillage des trains a permis d'évaluer le potentiel des approches proposées / Nowadays, diagnosis and monitoring for predictive maintenance of railway components are important key subjects for both operators and manufacturers. They seek to anticipate upcoming maintenance actions, reduce maintenance costs and increase the availability of rail network. In order to maintain the components at a satisfactory level of operation, the implementation of reliable diagnostic strategy is required. In this thesis, we are interested in a main component of railway infrastructure, the railway switch; an important safety device whose failure could heavily impact the availability of the transportation system. The diagnosis of this system is therefore essential and can be done by exploiting sequential measurements acquired successively while the state of the system is evolving over time. These measurements consist of power consumption curves that are acquired during several switch operations. The shape of these curves is indicative of the operating state of the system. The aim is to track the temporal dynamic evolution of railway component state under different operating contexts by analyzing the specific data in order to detect and diagnose problems that may lead to functioning failure. This thesis tackles the problem of temporal data clustering within a broader context of developing innovative tools and decision-aid methods. We propose a new dynamic probabilistic approach within a temporal data clustering framework. This approach is based on both Gaussian mixture models and state-space models. The main challenge facing this work is the estimation of model parameters associated with this approach because of its complex structure. In order to meet this challenge, a variational approach has been developed. The results obtained on both synthetic and real data highlight the advantage of the proposed algorithms compared to other state of the art methods in terms of clustering and estimation accuracy
|
73 |
Complex network component unfolding using a particle competition technique / Desdobramento de componentes de redes complexas utilizando uma técnica de competição de partículasPaulo Roberto Urio 12 June 2017 (has links)
This work applies complex network theory to the problem of semi-supervised and unsupervised learning in networks that are representations of multivariate datasets. Complex networks allow the use of nonlinear dynamical systems to represent behaviors according to the connectivity patterns of networks. Inspired by behavior observed in nature, such as competition for limited resources, dynamical system models can be employed to uncover the organizational structure of a network. In this dissertation, we develop a technique for classifying data represented as interaction networks. As part of the technique, we model a dynamical system inspired by the biological dynamics of resource competition. So far, similar methods have focused on vertices as the resource of competition. We introduce edges as the resource of competition. In doing so, the connectivity pattern of a network might be used not only in the dynamical system simulation but in the learning task as well. / Este trabalho aplica a teoria de redes complexas para o estudo de uma técnica aplicada ao problema de aprendizado semissupervisionado e não-supervisionado em redes, especificamente, aquelas que representam conjuntos de dados multivariados. Redes complexas permitem o emprego de sistemas dinâmicos não-lineares que podem apresentar comportamentos de acordo com os padrões de conectividade de redes. Inspirado pelos comportamentos observados na natureza, tais como a competição por recursos limitados, sistema dinâmicos podem ser utilizados para revelar a estrutura da organização de uma rede. Nesta dissertação, desenvolve-se uma técnica aplicada ao problema de classificação de dados representados por redes de interação. Como parte da técnica, um sistema dinâmico inspirado na competição por recursos foi modelado. Métodos similares concentraram-se em vértices como o recurso da concorrência. Neste trabalho, introduziu-se arestas como o recurso-alvo da competição. Ao fazê-lo, utilizar-se-á o padrão de conectividade de uma rede tanto na simulação do sistema dinâmico, quanto na tarefa de aprendizado.
|
74 |
Nouvelles méthodes pour l’apprentissage non-supervisé en grandes dimensions. / New methods for large-scale unsupervised learning.Tiomoko ali, Hafiz 24 September 2018 (has links)
Motivée par les récentes avancées dans l'analyse théorique des performances des algorithmes d'apprentissage automatisé, cette thèse s'intéresse à l'analyse de performances et à l'amélioration de la classification nonsupervisée de données et graphes en grande dimension. Spécifiquement, dans la première grande partie de cette thèse, en s'appuyant sur des outils avancés de la théorie des grandes matrices aléatoires, nous analysons les performances de méthodes spectrales sur des modèles de graphes réalistes et denses ainsi que sur des données en grandes dimensions en étudiant notamment les valeurs propres et vecteurs propres des matrices d'affinités de ces données. De nouvelles méthodes améliorées sont proposées sur la base de cette analyse théorique et démontrent à travers de nombreuses simulations que leurs performances sont meilleures comparées aux méthodes de l'état de l'art. Dans la seconde partie de la thèse, nous proposons un nouvel algorithme pour la détection de communautés hétérogènes entre plusieurs couches d'un graphe à plusieurs types d'interaction. Une approche bayésienne variationnelle est utilisée pour approximer la distribution apostériori des variables latentes du modèle. Toutes les méthodes proposées dans cette thèse sont utilisées sur des bases de données synthétiques et sur des données réelles et présentent de meilleures performances en comparaison aux approches standard de classification dans les contextes susmentionnés. / Spurred by recent advances on the theoretical analysis of the performances of the data-driven machine learning algorithms, this thesis tackles the performance analysis and improvement of high dimensional data and graph clustering. Specifically, in the first bigger part of the thesis, using advanced tools from random matrix theory, the performance analysis of spectral methods on dense realistic graph models and on high dimensional kernel random matrices is performed through the study of the eigenvalues and eigenvectors of the similarity matrices characterizing those data. New improved methods are proposed and are shown to outperform state-of-the-art approaches. In a second part, a new algorithm is proposed for the detection of heterogeneous communities from multi-layer graphs using variational Bayes approaches to approximate the posterior distribution of the sought variables. The proposed methods are successfully applied to synthetic benchmarks as well as real-world datasets and are shown to outperform standard approaches to clustering in those specific contexts.
|
75 |
Exploration of Data Clustering Within a Novel Multi-Scale Topology Optimization FrameworkLawson, Kevin Robert 10 August 2022 (has links)
No description available.
|
76 |
Partial EM Procedure for Big-Data Linear Mixed Effects Model, and Generalized PPE for High-Dimensional Data in JuliaCho, Jang Ik 31 August 2018 (has links)
No description available.
|
77 |
DifFUZZY : a novel clustering algorithm for systems biologyCominetti Allende, Ornella Cecilia January 2012 (has links)
Current studies of the highly complex pathobiology and molecular signatures of human disease require the analysis of large sets of high-throughput data, from clinical to genetic expression experiments, containing a wide range of information types. A number of computational techniques are used to analyse such high-dimensional bioinformatics data. In this thesis we focus on the development of a novel soft clustering technique, DifFUZZY, a fuzzy clustering algorithm applicable to a larger class of problems than other soft clustering approaches. This method is better at handling datasets that contain clusters that are curved, elongated or are of different dispersion. We show how DifFUZZY outperforms a number of frequently used clustering algorithms using a number of examples of synthetic and real datasets. Furthermore, a quality measure based on the diffusion distance developed for DifFUZZY is presented, which is employed to automate the choice of its main parameter. We later apply DifFUZZY and other techniques to data from a clinical study of children from The Gambia with different types of severe malaria. The first step was to identify the most informative features in the dataset which allowed us to separate the different groups of patients. This led to us reproducing the World Health Organisation classification for severe malaria syndromes and obtaining a reduced dataset for further analysis. In order to validate these features as relevant for malaria across the continent and not only in The Gambia, we used a larger dataset for children from different sites in Sub-Saharan Africa. With the use of a novel network visualisation algorithm, we identified pathobiological clusters from which we made and subsequently verified clinical hypotheses. We finish by presenting conclusions and future directions, including image segmentation and clustering time-series data. We also suggest how we could bridge data modelling with bioinformatics by embedding microarray data into cell models. Towards this end we take as a case study a multiscale model of the intestinal crypt using a cell-vertex model.
|
78 |
Novos m?todos determin?sticos para gerar centros iniciais dos grupos no algoritmo fuzzy C-Means e variantesArnaldo, Helo?na Alves 24 February 2014 (has links)
Made available in DSpace on 2014-12-17T15:48:11Z (GMT). No. of bitstreams: 1
HeloinaAA_DISSERT.pdf: 1661373 bytes, checksum: df9fe39185a27ded472f2f72284acdf6 (MD5)
Previous issue date: 2014-02-24 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Data clustering is applied to various fields such as data mining, image processing and
pattern recognition technique. Clustering algorithms splits a data set into clusters such
that elements within the same cluster have a high degree of similarity, while elements
belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means
Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature.
The performance of the FCM is strongly affected by the selection of the initial centers of
the clusters. Therefore, the choice of a good set of initial cluster centers is very important
for the performance of the algorithm. However, in FCM, the choice of initial centers is
made randomly, making it difficult to find a good set. This paper proposes three new
methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can
also be used in variants of the FCM. In this work these initialization methods were applied
in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers
which are close to the real cluster centers. With these new approaches startup if you want
to reduce the number of iterations to converge these algorithms and processing time
without affecting the quality of the cluster or even improve the quality in some cases.
Accordingly, cluster validation indices were used to measure the quality of the clusters
obtained by the modified FCM and ckMeans algorithms with the proposed initialization
methods when applied to various data sets / Agrupamento de dados ? uma t?cnica aplicada a diversas ?reas como minera??o de dados,
processamento de imagens e reconhecimento de padr?es. Algoritmos de agrupamento
particionam um conjunto de dados em grupos, de tal forma, que elementos dentro de um
mesmo grupo tenham alto grau de similaridade, enquanto elementos pertencentes a diferentes
grupos tenham alto grau de dissimilaridade. O algoritmo Fuzzy C-Means (FCM)
? um dos algoritmos de agrupamento fuzzy de dados mais utilizados e discutidos na literatura.
O desempenho do FCM ? fortemente afetado pela sele??o dos centros iniciais dos
grupos. Portanto, a escolha de um bom conjunto de centros iniciais ? muito importante
para o desempenho do algoritmo. No entanto, no FCM, a escolha dos centros iniciais ?
feita de forma aleat?ria, tornando dif?cil encontrar um bom conjunto. Este trabalho prop?e
tr?s novos m?todos para obter os centros iniciais dos grupos, de forma determin?stica,
no algoritmo FCM, e que podem tamb?m ser usados em variantes do FCM. Neste trabalho
esses m?todos de inicializa??o foram aplicados na variante ckMeans. Com os m?todos
propostos, pretende-se obter um conjunto de centros iniciais que esteja pr?ximo dos centros
reais dos grupos. Com estas novas abordagens de inicializa??o deseja-se reduzir o
n?mero de itera??es para estes algoritmos convergirem e o tempo de processamento, sem
afetar a qualidade do agrupamento ou at? melhorar a qualidade em alguns casos. Neste
sentido, foram utilizados ?ndices de valida??o de agrupamento para medir a qualidade dos
agrupamentos obtidos pelos algoritmos FCM e ckMeans, modificados com os m?todos de
inicializa??o propostos, quando aplicados a diversas bases de dados
|
79 |
Localisation dans les bâtiments des personnes handicapées et classification automatique de données par fourmis artificielles / Indoor localization of disabled people and ant based data clusteringAmadou Kountché, Djibrilla 22 November 2013 (has links)
Le concept du « smart » envahit de plus en plus notre vie quotidienne. L’exemple type est sans doute le smartphone. Celui-ci est devenu au fil des ans un appareil incontournable. Bientôt, c’est la ville, la voiture, la maison qui seront « intelligentes ». L’intelligence se manifeste par une capacité d’interaction et de prise de décision entre l’environnement et l’utilisateur. Ceci nécessite des informations sur les changements d’états survenus des deux côtés. Les réseaux de capteurs permettent de collecter ces données, de leur appliquer des pré-traitements et de les transmettre aux applications. Ces réseaux de par certaines de leurs caractéristiques se rapprochent de l’intelligence collective, dans le sens, où des entités de faibles capacités se coordonnent automatiquement, sans intervention humaine, de façon décentralisée et distribuée pour accomplir des tâches complexes. Ces méthodes bio-inspirées ont servi à la résolution de plusieurs problèmes, surtout l’optimisation, ce qui nous a encouragé à étudier la possibilité de les utiliser pour les problèmes liés à l’Ambient Assisted Living ou AAL et à la classification automatique de données. L’AAL est un sous-domaine des services dits basés sur le contexte, et a pour objectifs de faciliter la vie des personnes âgées et handicapées dans leurs défis quotidiens. Pour ce faire, il détermine le contexte et, sur cette base, propose divers services. Deux éléments du contexte nous ont intéressé : le handicap et la position. Bien que la détermination de la position (localisation, positionnement) se fasse à l’extérieur des bâtiments avec des précisions très satisfaisantes, elle rencontre plusieurs difficultés à l’intérieur des bâtiments, liées à la propagation des ondes électromagnétiques dans les milieux difficiles, aux coûts des systèmes, à l’interopérabilité, etc. Nos travaux se sont intéressés au positionnement des personnes handicapées à l’intérieur de bâtiments en utilisant un réseau de capteurs afin de déterminer les caractéristiques de l’onde électromagnétique (puissance, temps, angle) pour estimer la position par méthodes géométriques (triangulation, latération), méthodes de fingerprinting (k plus proches voisins), par des filtres baysiens (filtre de Kalman). L’application est d’offrir des services types AAL tel que la navigation. Nous avons élargi la notion de réseau de capteurs pour prendre en compte tout appareil capable d’émettre et de recevoir une onde électromagnétique et se trouvant dans l’environnement. Nous avons aussi appliqué l’algorithme API sur la classification automatique de données. Enfin, nous avons proposé une architecture à middleware pour la localisation indoor. / The concept of « smart » invades more and more our daily life. A typical example is the smartphone, which becames by years an essential device. Soon, it’s the city, the car and the home which will become « smart ». The intelligence is manifested by the ability for the environment to interact and to take decisons in its relationships with users and other environments. This needs information on state changes occurred on both sides. Sensor networks allow to collect these data, to apply on them some pre-processings and to transmit them. Sensor network, towards some of their caracteristics are closed to Swarm Intelligence in the sense that small entities with reduced capababilities can cooperate automatically, in unattended, decentralised and distributed manner in order to accomplish complex tasks. These bio-inspired methods have served as basis for the resolution of many problems, mostly optimization and this insipired us to apply them on problems met in Ambient Assisted Living and on the data clustering problem. AAL is a sub-field of context-aware services, and its goals are to facilitate the everyday life of elderly and disable people. These systems determine the context and then propose different kind of services. We have used two important elements of the context : the position and the disabilty. Although positioning has very good precision outdoor, it faces many challenges in indoor environments due to the electromagnetic wave propagation in harsh conditions, the cost of systems, interoperabilty, etc. Our works have been involved in positioning disabled people in indoor environment by using wireless sensor network for determining the caracteristics of the electromagnetic wave (signal strenght, time, angle) for estimating the position by geometric methods (triangulation, lateration), fingerprinting methods (k-nearest neighbours), baysiens filters (Kalman filter). The application is to offer AAL services like navigation. Therefore we extend the definition of sensor node to take into account any device, in the environment, capable of emiting and receiving a signal. Also, we have studied the possibility of using Pachycondylla Apicalis for data clustering and for indoor localization by casting this last problem as data clustering problem. Finally we have proposed a system based on a middleware architecture.
|
80 |
Uma Aplica??o do Algoritmo QT Clustering para Marca??o Colaborativa de Pontos Perigosos em Vias P?blicasLima, Adelson Luiz de 07 December 2012 (has links)
Made available in DSpace on 2014-12-17T14:56:08Z (GMT). No. of bitstreams: 1
AdelsonLL_DISSERT.pdf: 4321760 bytes, checksum: 6713cd42f04db2851670b86192ca27de (MD5)
Previous issue date: 2012-12-07 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / This work proposes a collaborative system for marking dangerous points in the
transport routes and generation of alerts to drivers. It consisted of a proximity warning
system for a danger point that is fed by the driver via a mobile device equipped with
GPS. The system will consolidate data provided by several different drivers and generate
a set of points common to be used in the warning system. Although the application is
designed to protect drivers, the data generated by it can serve as inputs for the responsible
to improve signage and recovery of public roads / O trabalho prop?e um sistema colaborativo para marca??o de pontos perigosos em
vias de transporte e gera??o de alertas para motoristas. Ele consistire de um sistema
de alerta de proximidade de um ponto de perigo, que ser? alimentado pelos pr?prios
motoristas atrav?s de um aparelho m?vel equipado com GPS. O sistema dever? consolidar
dados fornecidos por v?rios motoristas diferentes e gerar um conjunto de pontos comuns
que ser?o usados no sistema de alerta. Embora a aplica??o seja destinada ? prote??o
de motoristas, os dados gerados por ela poder?o servir de insumos para os ?rg?os
respons?veis melhorarem a sinaliza??o e recupera??o de vias p?blicas
|
Page generated in 0.0879 seconds