Return to search

Model-based clustering algorithms, performance and application

<p>The main contributions of this thesis are the development of new clustering algorithms (with cluster validation) both off-line and on-line, the performance analysis of the new algorithms and their applications to intrapulse analysis. Bayesian inference and minimum encoding inference including Wallace's minimum message length (MML) and Rissanen's minimum description length (MDL), are reviewed for model selection. It is found that the MML coding length is more accurate than the other two in the view of quantization. By introducing a penalty weight, all criteria considered here are cast into the framework of a penalized likelihood method. Based on minimum encoding inference, an appropriate measure of coding length is proposed for cluster validation, and the coding lengths under four different Gaussian mixture models are fully derived. This provides us with a criterion for the development of a new clustering algorithm. Judging from the performance comparison with other algorithms, the new clustering algorithm is more suitable to process high dimensional data with satisfactory performance on small and medium samples. This clustering algorithm is off-line because it requires all the data available at the same time. The theoretical error performance of our clustering algorithm is evaluated under reasonable assumptions. It is shown here how the dimension of data space, the sample size, the mixing portion and the inter-cluster distance affect the performance of our clustering algorithm to detect the true number of clusters. Furthermore, we examine the impact of the penalty weight under the framework of the penalized likelihood method. It is found that there is a range of the penalty weight within which the best performance of our clustering algorithm can be achieved. Therefore, with some supervision we could adjust the penalty weight to further improve the performance of our clustering algorithm. The application of our clustering algorithm to intrapulse analysis is investigated in detail. We first develop the pre-processing techniques including data compression for received pulses and formulated the problem of emitter number detection and pulse-emitter association into a multivariate clustering problem. After applying the above (off-line) clustering algorithm here, we further develop two on-line clustering algorithms, one is based on some known thresholds while the other is based on a model-based detection scheme. Performance on intrapulse data by using our pre-processing techniques and clustering algorithms is reported, and the results demonstrate that our new clustering algorithms are very effective for intrapulse analysis, especially the model-based on-line algorithm. Finally, the DSP implementation for intrapulse analysis is considered. Some relevant physical parameters are estimated such as the likely maximal incoming pulse rate. Then a suitable system diagram is proposed and its system requirements are investigated. Our on-line clustering algorithm is implemented as a core classification module on a TMS320C44 DSP board.</p> / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/6376
Date January 2000
CreatorsLiu, Jun
ContributorsWong, K. M., Luo, Z.Q., Electrical and Computer Engineering
Source SetsMcMaster University
Detected LanguageEnglish
Typethesis

Page generated in 0.002 seconds