Six methods are used for clustering data containing two different objects: sugar-beet plants and weed. These objects are described by 19 different features, i.e. shape and color features. There is also information about the distance between sugar-beet plants that is used for labeling clusters. The methods that are evaluated: k-means, k-medoids, hierarchical clustering, competitive learning, self-organizing maps and fuzzy c-means. After using the methods on plant data, clusters are formed. The clusters are labeled with three different proposed methods: expert, database and context method. Expert method is using a human for giving initial cluster centers that are labeled. The database method is using a database as an expert that provides initial cluster centers. The context method is using information about the environment, which is the distance between sugar-beet plants, for labeling the clusters. The algorithms that were tested, with the lowest achieved corresponding error, are: k-means (3.3%), k-medoids (3.8%), hierarchical clustering (5.3%), competitive learning (6.8%), self- organizing maps (4.9%) and fuzzy c-means (7.9%). Three different datasets were used and the lowest error on dataset0 is 3.3%, compared to supervised learning methods where it is 3%. For dataset1 the error is 18.7% and for dataset2 it is 5.8%. Compared to supervised methods, the error on dataset1 is 11% and for dataset2 it is 5.1%. The high error rate on dataset1 is due to the samples are not very well separated in different clusters. The features from dataset1 are extracted from lower resolution on images than the other datasets, and another difference between the datasets are the sugar-beet plants that are in different growth stages. The performance of the three methods for labeling clusters is: expert method (6.8% as the lowest error achieved), database method (3.7%) and context method (6.8%). These results show the clustering results by competitive learning where the real error is 6.8%. Unsupervised-learning methods for clustering can very well be used for plant identification. Because the samples are not classified, an automatic labeling technique must be used if plants are to be identified. The three proposed techniques can be used for automatic labeling of plants.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hh-247 |
Date | January 2006 |
Creators | Jelacic, Mersad |
Publisher | Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), Högskolan i Halmstad/Sektionen för Informationsvetenskap, Data- och Elektroteknik (IDE) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0022 seconds