• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Modified Silhouette Score with Generalized Mean and Trimmed Mean

Zhang, Yiran January 2023 (has links)
The silhouette score is a widely used technique to evaluate the quality of a clustering result. One of the current issues with the silhouette score is its sensitivity to outliers, which can lead to misleading interpretations. This problem is caused by the silhouette score using the arithmetic mean to calculate the average intra and inter-cluster distances. To address this issue, three modified silhouette scores are presented: GenSil, TrimSil, and extended TrimSil, which replace the arithmetic mean with the generalized mean, the trimmed mean and a modified trimmed mean, respectively. Experiments on both simulated and real-world datasets show that GenSil is the most effective method, significantly reducing the impact of outliers and achieving high silhouette scores with negative parameter values. TrimSil also improves silhouette scores but performs worse than GenSil, while the extended TrimSil outperforms TrimSil but is still less effective than GenSil. To further aid in selecting the optimal number of clusters with these modified silhouette scores, a more straightforward visualization technique, the silhouette-parameter plot, is also introduced. / Thesis / Master of Science (MSc)

Page generated in 0.0866 seconds