1 |
Získávání znalostí na webu - shlukování / Web Mining - ClusteringRychnovský, Martin January 2008 (has links)
This work presents the topic of data mining on the web. It is focused on clustering. The aim of this project was to study the field of clustering and to implement clustering through the k-means algorithm. Then, the algorithm was tested on a dataset of text documents and on data extracted from web. This clustering method was implemented by means of Java technologies.
|
2 |
Uma nova forma de calcular os centros dos Clusters em algoritmos de agrupamento tipo fuzzy c-meansVargas, Rogerio Rodrigues de 30 March 2012 (has links)
Made available in DSpace on 2014-12-17T15:47:00Z (GMT). No. of bitstreams: 1
RogerioRV_TESE.pdf: 769325 bytes, checksum: ddaac964e1c74fba3533b5cdd90927b2 (MD5)
Previous issue date: 2012-03-30 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Clustering data is a very important task in data mining, image processing and pattern recognition
problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM).
This thesis proposes to implement a new way of calculating the cluster centers in the procedure
of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here
we apply it for those variants that use other distances. The goal of this change is to reduce
the number of iterations and processing time of these algorithms without affecting the quality
of the partition, or even to improve the number of correct classifications in some cases. Also,
we developed an algorithm based on ckMeans to manipulate interval data considering interval
membership degrees. This algorithm allows the representation of data without converting interval
data into punctual ones, as it happens to other extensions of FCM that deal with interval
data. In order to validate the proposed methodologies it was made a comparison between a
clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this
paper to calculate the centers is similar to the K-Means) considering three different distances.
We used several known databases. In this case, the results of Interval ckMeans were compared
with the results of other clustering algorithms when applied to an interval database with minimum
and maximum temperature of the month for a given year, referring to 37 cities distributed
across continents / Agrupar dados ? uma tarefa muito importante em minera??o de dados, processamento de
imagens e em problemas de reconhecimento de padr?es. Um dos algoritmos de agrupamentos
mais popular ? o Fuzzy C-Means (FCM). Esta tese prop?e aplicar uma nova forma de calcular
os centros dos clusters no algoritmo FCM, que denominamos de ckMeans, e que pode ser
tamb?m aplicada em algumas variantes do FCM, em particular aqui aplicamos naquelas variantes
que usam outras dist?ncias. Com essa modifica??o, pretende-se reduzir o n?mero de
itera??es e o tempo de processamento desses algoritmos sem afetar a qualidade da parti??o ou
at? melhorar o n?mero de classifica??es corretas em alguns casos. Tamb?m, desenvolveu-se um
algoritmo baseado no ckMeans para manipular dados intervalares considerando graus de pertin?ncia
intervalares. Este algoritmo possibilita a representa??o dos dados sem convers?o dos
dados intervalares para pontuais, como ocorre com outras extens?es do FCM que lidam com
dados intervalares. Para validar com as metodologias propostas, comparou-se o agrupamento
ckMeans com os algoritmos K-Means (pois o algoritmo proposto neste trabalho para c?lculo
dos centros se assemelha ? do K-Means) e FCM, considerando tr?s dist?ncias diferentes. Foram
utilizadas v?rias bases de dados conhecidas. No caso, os resultados do ckMeans intervalar,
foram comparadas com outros algoritmos de agrupamento intervalar quando aplicadas a uma
base de dados intervalar com a temperatura m?nima e m?xima do m?s de um determinado ano,
referente a 37 cidades distribu?das entre os continentes
|
Page generated in 0.1009 seconds