Spelling suggestions: "subject:"attern clustering"" "subject:"attern klustering""
1 |
Financial Time Series Analysis using Pattern Recognition MethodsZeng, Zhanggui January 2008 (has links)
Doctor of Philosophy / This thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database.
|
2 |
Association Pattern Analysis for Pattern Pruning, Clustering and SummarizationLi, Chung Lam 12 September 2008 (has links)
Automatic pattern mining from databases and the analysis of the discovered patterns for useful information are important and in great demand in science, engineering and business. Today, effective pattern mining methods, such as association rule mining and pattern discovery, have been developed and widely used in various challenging industrial and business applications. These methods attempt to uncover the valuable information trapped in large collections of raw data. The patterns revealed provide significant and useful information for decision makers. Paradoxically, pattern mining itself can produce such huge amounts of data that poses a new knowledge management problem: to tackle thousands or even more patterns discovered and held in a data set. Unlike raw data, patterns often overlap, entangle and interrelate to each other in the databases. The relationship among them is usually complex and the notion of distance between them is difficult to qualify and quantify. Such phenomena pose great challenges to the existing data mining discipline. In this thesis, the analysis of patterns after their discovery by existing pattern mining methods is referred to as pattern post-analysis since the patterns to be analyzed are first discovered.
Due to the overwhelmingly huge volume of discovered patterns in pattern mining, it is virtually impossible for a human user to manually analyze them. Thus, the valuable trapped information in the data is shifted to a large collection of patterns. Hence, to automatically analyze the patterns discovered and present the results in a user-friendly manner such as pattern post-analysis is badly needed. This thesis attempts to solve the problems listed below. It addresses 1) the important factors contributing to the interrelating relationship among patterns and hence more accurate measurements of distances between them; 2) the objective pruning of redundant patterns from the discovered patterns; 3) the objective clustering of the patterns into coherent pattern clusters for better organization; 4) the automatic summarization of each pattern cluster for human interpretation; and 5) the application of pattern post-analysis to large database analysis and data mining.
In this thesis, the conceptualization, theoretical formulation, algorithm design and system development of pattern post-analysis of categorical or discrete-valued data is presented. It starts with presenting a natural dual relationship between patterns and data. The relationship furnishes an explicit one-to-one correspondence between a pattern and its associated data and provides a base for an effective analysis of patterns by relating them back to the data. It then discusses the important factors that differentiate patterns and formulates the notion of distances among patterns using a formal graphical approach. To accurately measure the distances between patterns and their associated data, both the samples and the attributes matched by the patterns are considered. To achieve this, the distance measure between patterns has to account for the differences of their associated data clusters at the attribute value (i.e. item) level. Furthermore, to capture the degree of variation of the items matched by patterns, entropy-based distance measures are developed. It attempts to quantify the uncertainty of the matched items. Such distances render an accurate and robust distance measurement between patterns and their associated data. To understand the properties and behaviors of the new distance measures, the mathematical relation between the new distances and the existing sample-matching distances is analytically derived.
The new pattern distances based on the dual pattern-data relationship and their related concepts are used and adapted to pattern pruning, pattern clustering and pattern summarization to furnish an integrated, flexible and generic framework for pattern post-analysis which is able to meet the challenges of today’s complex real-world problems. In pattern pruning, the system defines the amount of redundancy of a pattern with respect to another pattern at the item level. Such definition generalizes the classical closed itemset pruning and maximal itemset pruning which define redundancy at the sample level. A new generalized itemset pruning method is developed using the new definition. It includes the closed and maximal itemsets as two extreme special cases and provides a control parameter for the user to adjust the tradeoff between the number of patterns being pruned and the amount of information loss after pruning. The mathematical relation between the proposed generalized itemsets and the existing closed and maximal itemsets are also given. In pattern clustering, a dual clustering method, known as simultaneous pattern and data clustering, is developed using two common yet very different types of clustering algorithms: hierarchical clustering and k-means clustering. Hierarchical clustering generates the entire clustering hierarchy but it is slow and not scalable. K-means clustering produces only a partition so it is fast and scalable. They can be used to handle most real-world situations (i.e. speed and clustering quality). The new clustering method is able to simultaneously cluster patterns as well as their associated data while maintaining an explicit pattern-data relationship. Such relationship enables subsequent analysis of individual pattern clusters through their associated data clusters. One important analysis on a pattern cluster is pattern summarization. In pattern summarization, to summarize each pattern cluster, a subset of the representative patterns will be selected for the cluster. Again, the system measures how representative a pattern is at the item level and takes into account how the patterns overlap each other. The proposed method, called AreaCover, is extended from the well-known RuleCover algorithm. The relationship between the two methods is given. AreaCover is less prone to yield large, trivial patterns (large patterns may cause summary that is too general and not informative enough), and the resulting summary is more concise (with less duplicated attribute values among summary patterns) and more informative (describing more attribute values in the cluster and have longer summary patterns).
The thesis also covers the implementation of the major ideas outlined in the pattern post-analysis framework in an integrated software system. It ends with a discussion on the experimental results of pattern post-analysis on both synthetic and real-world benchmark data. Compared with the existing systems, the new methodology that this thesis presents stands out, possessing significant and superior characteristics in pattern post-analysis and decision support.
|
3 |
Association Pattern Analysis for Pattern Pruning, Clustering and SummarizationLi, Chung Lam 12 September 2008 (has links)
Automatic pattern mining from databases and the analysis of the discovered patterns for useful information are important and in great demand in science, engineering and business. Today, effective pattern mining methods, such as association rule mining and pattern discovery, have been developed and widely used in various challenging industrial and business applications. These methods attempt to uncover the valuable information trapped in large collections of raw data. The patterns revealed provide significant and useful information for decision makers. Paradoxically, pattern mining itself can produce such huge amounts of data that poses a new knowledge management problem: to tackle thousands or even more patterns discovered and held in a data set. Unlike raw data, patterns often overlap, entangle and interrelate to each other in the databases. The relationship among them is usually complex and the notion of distance between them is difficult to qualify and quantify. Such phenomena pose great challenges to the existing data mining discipline. In this thesis, the analysis of patterns after their discovery by existing pattern mining methods is referred to as pattern post-analysis since the patterns to be analyzed are first discovered.
Due to the overwhelmingly huge volume of discovered patterns in pattern mining, it is virtually impossible for a human user to manually analyze them. Thus, the valuable trapped information in the data is shifted to a large collection of patterns. Hence, to automatically analyze the patterns discovered and present the results in a user-friendly manner such as pattern post-analysis is badly needed. This thesis attempts to solve the problems listed below. It addresses 1) the important factors contributing to the interrelating relationship among patterns and hence more accurate measurements of distances between them; 2) the objective pruning of redundant patterns from the discovered patterns; 3) the objective clustering of the patterns into coherent pattern clusters for better organization; 4) the automatic summarization of each pattern cluster for human interpretation; and 5) the application of pattern post-analysis to large database analysis and data mining.
In this thesis, the conceptualization, theoretical formulation, algorithm design and system development of pattern post-analysis of categorical or discrete-valued data is presented. It starts with presenting a natural dual relationship between patterns and data. The relationship furnishes an explicit one-to-one correspondence between a pattern and its associated data and provides a base for an effective analysis of patterns by relating them back to the data. It then discusses the important factors that differentiate patterns and formulates the notion of distances among patterns using a formal graphical approach. To accurately measure the distances between patterns and their associated data, both the samples and the attributes matched by the patterns are considered. To achieve this, the distance measure between patterns has to account for the differences of their associated data clusters at the attribute value (i.e. item) level. Furthermore, to capture the degree of variation of the items matched by patterns, entropy-based distance measures are developed. It attempts to quantify the uncertainty of the matched items. Such distances render an accurate and robust distance measurement between patterns and their associated data. To understand the properties and behaviors of the new distance measures, the mathematical relation between the new distances and the existing sample-matching distances is analytically derived.
The new pattern distances based on the dual pattern-data relationship and their related concepts are used and adapted to pattern pruning, pattern clustering and pattern summarization to furnish an integrated, flexible and generic framework for pattern post-analysis which is able to meet the challenges of today’s complex real-world problems. In pattern pruning, the system defines the amount of redundancy of a pattern with respect to another pattern at the item level. Such definition generalizes the classical closed itemset pruning and maximal itemset pruning which define redundancy at the sample level. A new generalized itemset pruning method is developed using the new definition. It includes the closed and maximal itemsets as two extreme special cases and provides a control parameter for the user to adjust the tradeoff between the number of patterns being pruned and the amount of information loss after pruning. The mathematical relation between the proposed generalized itemsets and the existing closed and maximal itemsets are also given. In pattern clustering, a dual clustering method, known as simultaneous pattern and data clustering, is developed using two common yet very different types of clustering algorithms: hierarchical clustering and k-means clustering. Hierarchical clustering generates the entire clustering hierarchy but it is slow and not scalable. K-means clustering produces only a partition so it is fast and scalable. They can be used to handle most real-world situations (i.e. speed and clustering quality). The new clustering method is able to simultaneously cluster patterns as well as their associated data while maintaining an explicit pattern-data relationship. Such relationship enables subsequent analysis of individual pattern clusters through their associated data clusters. One important analysis on a pattern cluster is pattern summarization. In pattern summarization, to summarize each pattern cluster, a subset of the representative patterns will be selected for the cluster. Again, the system measures how representative a pattern is at the item level and takes into account how the patterns overlap each other. The proposed method, called AreaCover, is extended from the well-known RuleCover algorithm. The relationship between the two methods is given. AreaCover is less prone to yield large, trivial patterns (large patterns may cause summary that is too general and not informative enough), and the resulting summary is more concise (with less duplicated attribute values among summary patterns) and more informative (describing more attribute values in the cluster and have longer summary patterns).
The thesis also covers the implementation of the major ideas outlined in the pattern post-analysis framework in an integrated software system. It ends with a discussion on the experimental results of pattern post-analysis on both synthetic and real-world benchmark data. Compared with the existing systems, the new methodology that this thesis presents stands out, possessing significant and superior characteristics in pattern post-analysis and decision support.
|
4 |
Financial Time Series Analysis using Pattern Recognition MethodsZeng, Zhanggui January 2008 (has links)
Doctor of Philosophy / This thesis is based on research on financial time series analysis using pattern recognition methods. The first part of this research focuses on univariate time series analysis using different pattern recognition methods. First, probabilities of basic patterns are used to represent the features of a section of time series. This feature can remove noise from the time series by statistical probability. It is experimentally proven that this feature is successful for pattern repeated time series. Second, a multiscale Gaussian gravity as a pattern relationship measurement which can describe the direction of the pattern relationship is introduced to pattern clustering. By searching for the Gaussian-gravity-guided nearest neighbour of each pattern, this clustering method can easily determine the boundaries of the clusters. Third, a method that unsupervised pattern classification can be transformed into multiscale supervised pattern classification by multiscale supervisory time series or multiscale filtered time series is presented. The second part of this research focuses on multivariate time series analysis using pattern recognition. A systematic method is proposed to find the independent variables of a group of share prices by time series clustering, principal component analysis, independent component analysis, and object recognition. The number of dependent variables is reduced and the multivariate time series analysis is simplified by time series clustering and principal component analysis. Independent component analysis aims to find the ideal independent variables of the group of shares. Object recognition is expected to recognize those independent variables which are similar to the independent components. This method provides a new clue to understanding the stock market and to modelling a large time series database.
|
5 |
Scalable and explainable self-supervised motif discovery in temporal dataBakhtiari Ramezani, Somayeh 08 December 2023 (has links) (PDF)
The availability of a scalable and explainable rule extraction technique via motif discovery is crucial for identifying the health states of a system. Such a technique can enable the creation of a repository of normal and abnormal states of the system and identify the system’s state as we receive data. In complex systems such as ECG, each activity session can consist of a long sequence of motifs that form different global structures. As a result, applying machine learning algorithms without first identifying the local patterns is not feasible and would result in low performance. Thus, extracting unique local motifs and establishing a database of prototypes or signatures is a crucial first step in analyzing long temporal data that reduces the computational cost and overcomes imbalanced data. The present research aims to streamline the extraction of motifs and add explainability to their analysis by identifying their differences. We have developed a novel framework for unsupervised motif extraction. We also offer a robust algorithm to identify unique motifs and their signatures, coupled with a proper distance metric to compare the signatures of partially similar motifs. Defining such distance metrics allows us to assign a degree of semblance between two motifs that may have different lengths or contain noise. We have tested our framework against five different datasets and observed excellent results, including extraction of motifs from 100 million samples in 8.02 seconds, 99.90% accuracy in self-supervised ECG data classification, and an average error of 16.66% in RUL prediction of bearing failure.
|
6 |
Graph Cut Based Mesh Segmentation Using Feature Points and Geodesic DistanceLiu, L., Sheng, Y., Zhang, G., Ugail, Hassan January 2015 (has links)
No / Both prominent feature points and geodesic distance
are key factors for mesh segmentation. With these two factors,
this paper proposes a graph cut based mesh segmentation
method. The mesh is first preprocessed by Laplacian smoothing.
According to the Gaussian curvature, candidate feature points
are then selected by a predefined threshold. With DBSCAN
(Density-Based Spatial Clustering of Application with Noise), the
selected candidate points are separated into some clusters, and
the points with the maximum curvature in every cluster are
regarded as the final feature points. We label these feature points,
and regard the faces in the mesh as nodes for graph cut. Our
energy function is constructed by utilizing the ratio between the
geodesic distance and the Euclidean distance of vertex pairs of
the mesh. The final segmentation result is obtained by minimizing
the energy function using graph cut. The proposed algorithm is
pose-invariant and can robustly segment the mesh into different
parts in line with the selected feature points.
|
Page generated in 0.1556 seconds