We address the problem of learning discrete hidden Markov models from very long sequences of observations. Incremental versions of the Baum-Welch algorithm that approximate the beta-values used in the backward procedure are commonly used for this problem since their memory complexity is independent of the sequence length. However, traditional approaches have two main disadvantages: the approximation of the beta-values deviates far from the real values, and the learning algorithm requires previous knowledge of the topology of the model. This dissertation describes a new incremental Baum-Welch algorithm with a novel backward procedure that improves the approximation of the รข-values based on a one-step lookahead in the training sequence and investigates heuristics to prune unnecessary states from an initial complex model. Two new approaches for pruning, greedy and controlled, are introduced and a novel method for identification of ill-conditioned models is presented. Incremental learning of multiple independent observations is also investigated. We justify the new approaches analytically and report empirical results that show they converge faster than the traditional Baum-Welch algorithm using fewer computer resources. Furthermore, we demonstrate that the new learning algorithms converge faster than the previous incremental approaches and can be used to perform online learning of high-quality models useful for classification tasks. Finally, this dissertation explores the use of the new algorithms for anomaly detection in computer systems, that improve our previous research work on detectors based on hidden Markov models integrated into real-world monitoring systems of high-performance computers.
Identifer | oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-3689 |
Date | 06 August 2005 |
Creators | Florez-Larrahondo, German |
Publisher | Scholars Junction |
Source Sets | Mississippi State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Page generated in 0.0021 seconds