Return to search

A sliding window BIRCH algorithm with performance evaluations

An increasing number of applications covered various fields generate transactional data or other time-stamped data which all belongs to time series data. Time series data mining is a popular topic in the data mining field, it introduces some challenges to improve accuracy and efficiency of algorithms for time series data. Time series data are dynamical, large-scale and high complexity, which makes it difficult to discover patterns among time series data with common methods suitable for static data. One of hierarchical-based clustering methods called BIRCH was proposed and employed for addressing the problems of large datasets. It minimizes the costs of I/O and time. A CF tree is generated during its working process and clusters are generated after four phases of the whole BIRCH procedure. A drawback of BIRCH is that it is not very scalable. This thesis is devoted to improve accuracy and efficiency of BIRCH algorithm. A sliding window BIRCH algorithm is implemented on the basis of BIRCH algorithm. At the end of thesis, the accuracy and efficiency of sliding window BIRCH are evaluated. A performance comparison among SW BIRCH, BIRCH and K-means are also presented with Silhouette Coefficient index and Calinski-Harabaz Index. The preliminary results indicate that the SW BIRCH may achieve a better performance than BIRCH in some cases.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:miun-32397
Date January 2017
CreatorsLi, Chuhe
PublisherMittuniversitetet, Avdelningen för informationssystem och -teknologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds