Global ETD Search

51	DESIGN AND EVALUATION OF HIDDEN MARKOV MODEL BASED ARCHITECTURES FOR DETECTION OF INTERLEAVED MULTI-STAGE NETWORK ATTACKS Tawfeeq A Shawly (7370912) 16 October 2019 (has links) <div> <div> <div> <p>Nowadays, the pace of coordinated cyber security crimes has become drastically more rapid, and network attacks have become more advanced and diversified. The explosive growth of network security threats poses serious challenges for building secure Cyber-based Systems (CBS). Existing studies have addressed a breadth of challenges related to detecting network attacks. However, there is still a lack of studies on the detection of sophisticated Multi-stage Attacks (MSAs). </p> <p>The objective of this dissertation is to address the challenges of modeling and detecting sophisticated network attacks, such as multiple interleaved MSAs. We present the interleaving concept and investigate how interleaving multiple MSAs can deceive intrusion detection systems. Using one of the important statistical machine learning (ML) techniques, Hidden Markov Models (HMM), we develop three architectures that take into account the stealth nature of the interleaving attacks, and that can detect and track the progress of these attacks. These architectures deploy a set of HMM templates of known attacks and exhibit varying performance and complexity. </p> <p>For performance evaluation, various metrics are proposed which include (1) attack risk probability, (2) detection error rate, and (3) the number of correctly detected stages. Extensive simulation experiments are conducted to demonstrate the efficacy of the proposed architecture in the presence of multiple multi-stage attack scenarios, and in the presence of false alerts with various rates. </p> </div> </div> </div> Computer Engineering Network security Multi-stage attacks intrusion detection hidden Markov model
52	Intelligent Telerobotic Assistance For Enhancing Manipulation Capabilities Of Persons With Disabilities Yu, Wentao 11 August 2004 (has links) This dissertation addresses the development of a telemanipulation system using intelligent mapping from a haptic user interface to a remote manipulator to assist in maximizing the manipulation capabilities of persons with disabilities. This mapping, referred to as assistance function, is determined on the basis of environmental model or real-time sensory data to guide the motion of a telerobotic manipulator while performing a given task. Human input is enhanced rather than superseded by the computer. This is particularly useful when the user has restricted range of movements due to certain disabilities such as muscular dystrophy, a stroke, or any form of pathological tremor. In telemanipulation system, assistance of variable position/velocity mapping or virtual fixture can improve manipulation capability and dexterity. Conventionally, these assistances are based on the environment information, without knowing user's motion intention. In this dissertation, user's motion intention is combined with real-time environment information for applying appropriate assistance. If the current task is following a path, a virtual fixture orthogonal to the path is applied. Similarly, if the task is to align the end-effector with a target, an attractive force field is generated. In order to successfully recognize user's motion intention, a Hidden Markov Model (HMM) is developed. This dissertation describes the HMM based skill learning and its application in a motion therapy system in which motion along a labyrinth is controlled using a haptic interface. Two persons with disabilities on upper limb are trained using this virtual therapist. The performance measures before and after the therapy training, including the smoothness of the trajectory, distance ratio, time taken, tremor and impact forces are presented. The results demonstrate that the forms of assistance provided reduced the execution times and increased the performance of the chosen tasks for the disabled individuals. In addition, these results suggest that the introduction of the haptic rendering capabilities, including the force feedback, offers special benefit to motion-impaired users by augmenting their performance on job related tasks. Rehabilitation Hidden Markov Model Motion Intention Recognition Virtual Fixture Skill Learning Therapy American Studies Arts and Humanities
53	Efficient duration modelling in the hierarchical hidden semi-Markov models and their applications Duong, Thi V. T. January 2008 (has links) Modeling patterns in temporal data has arisen as an important problem in engineering and science. This has led to the popularity of several dynamic models, in particular the renowned hidden Markov model (HMM) [Rabiner, 1989]. Despite its widespread success in many cases, the standard HMM often fails to model more complex data whose elements are correlated hierarchically or over a long period. Such problems are, however, frequently encountered in practice. Existing efforts to overcome this weakness often address either one of these two aspects separately, mainly due to computational intractability. Motivated by this modeling challenge in many real world problems, in particular, for video surveillance and segmentation, this thesis aims to develop tractable probabilistic models that can jointly model duration and hierarchical information in a unified framework. We believe that jointly exploiting statistical strength from both properties will lead to more accurate and robust models for the needed task. To tackle the modeling aspect, we base our work on an intersection between dynamic graphical models and statistics of lifetime modeling. Realizing that the key bottleneck found in the existing works lies in the choice of the distribution for a state, we have successfully integrated the discrete Coxian distribution [Cox, 1955], a special class of phase-type distributions, into the HMM to form a novel and powerful stochastic model termed as the Coxian Hidden Semi-Markov Model (CxHSMM). We show that this model can still be expressed as a dynamic Bayesian network, and inference and learning can be derived analytically. / Most importantly, it has four superior features over existing semi-Markov modelling: the parameter space is compact, computation is fast (almost the same as the HMM), close-formed estimation can be derived, and the Coxian is flexible enough to approximate a large class of distributions. Next, we exploit hierarchical decomposition in the data by borrowing analogy from the hierarchical hidden Markov model in [Fine et al., 1998, Bui et al., 2004] and introduce a new type of shallow structured graphical model that combines both duration and hierarchical modelling into a unified framework, termed the Coxian Switching Hidden Semi-Markov Models (CxSHSMM). The top layer is a Markov sequence of switching variables, while the bottom layer is a sequence of concatenated CxHSMMs whose parameters are determined by the switching variable at the top. Again, we provide a thorough analysis along with inference and learning machinery. We also show that semi-Markov models with arbitrary depth structure can easily be developed. In all cases we further address two practical issues: missing observations to unstable tracking and the use of partially labelled data to improve training accuracy. Motivated by real-world problems, our application contribution is a framework to recognize complex activities of daily livings (ADLs) and detect anomalies to provide better intelligent caring services for the elderly. / Coarser activities with self duration distributions are represented using the CxHSMM. Complex activities are made of a sequence of coarser activities and represented at the top level in the CxSHSMM. Intensive experiments are conducted to evaluate our solutions against existing methods. In many cases, the superiority of the joint modeling and the Coxian parameterization over traditional methods is confirmed. The robustness of our proposed models is further demonstrated in a series of more challenging experiments, in which the tracking is often lost and activities considerably overlap. Our final contribution is an application of the switching Coxian model to segment education-oriented videos into coherent topical units. Our results again demonstrate such segmentation processes can benefit greatly from the joint modeling of duration and hierarchy.
54	Human Activity Recognition and Pathological Gait Pattern Identification Niu, Feng 14 December 2007 (has links) Human activity analysis has attracted great interest from computer vision researchers due to its promising applications in many areas such as automated visual surveillance, computer-human interactions, and motion-based identification and diagnosis. This dissertation presents work in two areas: general human activity recognition from video, and human activity analysis for the purpose of identifying pathological gait from both 3D captured data and from video. Even though the research in human activity recognition has been going on for many years, still there are many issues that need more research. This includes the effective representation and modeling of human activities and the segmentation of sequences of continuous activities. In this thesis we present an algorithm that combines shape and motion features to represent human activities. In order to handle the activity recognition from any viewing angle we quantize the viewing direction and build a set of Hidden Markov Models (HMMs), where each model represents the activity from a given view. Finally, a voting based algorithm is used to segment and recognize a sequence of human activities from video. Our method of representing activities has good attributes and is suitable for both low resolution and high resolution video. The voting based algorithm performs the segmentation and recognition simultaneously. Experiments on two sets of video clips of different activities show that our method is effective. Our work on identifying pathological gait is based on the assumption of gait symmetry. Previous work on gait analysis measures the symmetry of gait based on Ground Reaction Force data, stance time, swing time or step length. Since the trajectories of the body parts contain information about the whole body movement, we measure the symmetry of the gait based on the trajectories of the body parts. Two algorithms, which can work with different data sources, are presented. The first algorithm works on 3D motion-captured data and the second works on video data. Both algorithms use support vector machine (SVM) for classification. Each of the two methods has three steps: the first step is data preparation, i.e., obtaining the trajectories of the body parts; the second step is gait representation based on a measure of gait symmetry; and the last step is SVM based classification. For 3D motion-captured data, a set of features based on Discrete Fourier Transform (DFT) is used to represent the gait. We demonstrate the accuracy of the classification by a set of experiments that shows that the method for 3D motion-captured data is highly effective. For video data, a model based tracking algorithm for human body parts is developed for preparing the data. Then, a symmetry measure that works on the sequence of 2D data, i.e. sequence of video frames, is derived to represent the gait. We performed experiments on both 2D projected data and real video data to examine this algorithm. The experimental results on 2D projected data showed that the presented algorithm is promising for identifying pathological gait from video. The experimental results on the real video data are not good as the results on 2D projected data. We believe that better results could be obtained if the accuracy of the tracking algorithm is improved.
55	Continuous automatic classification of seismic signals of volcanic origin at Mt. Merapi, Java, Indonesia Ohrnberger, Matthias January 2001 (has links) Aufgrund seiner nahezu kontinuierlichen eruptiven Aktivität zählt der Merapi zu den gefährlichsten Vulkanen der Welt. Der Merapi befindet sich im Zentralteil der dicht bevölkerten Insel Java (Indonesien). Selbst kleinere Ausbrüche des Merapi stellen deswegen eine große Gefahr für die ansässige Bevölkerung in der Umgebung des Vulkans dar. Die am Merapi beobachtete enge Korrelation zwischen seismischer und vulkanischer Aktivität erlaubt es, mit Hilfe der Überwachung der seismischen Aktivität Veränderungen des Aktivitätszustandes des Merapi zu erkennen. Ein System zur automatischen Detektion und Klassifizierung seismischer Ereignisse liefert einen wichtigen Beitrag für die schnelle Analyse der seismischen Aktivität. Im Falle eines bevorstehenden Ausbruchszyklus bedeutet dies ein wichtiges Hilfsmittel für die vor Ort ansässigen Wissenschaftler.<br /> In der vorliegenden Arbeit wird ein Mustererkennungsverfahren verwendet, um die Detektion und Klassifizierung seismischer Signale vulkanischen Urprunges aus den kontinuierlich aufgezeichneten Daten in Echtzeit zu bewerkstelligen. Der hier verwendete A nsatz der hidden Markov Modelle (HMM) wird motiviert durch die große Ähnlichkeit von seismischen Signalen vulkanischen Ursprunges und Sprachaufzeichnungen und den großen Erfolg, den HMM-basierte Erkennungssysteme in der automatischen Spracherkennung erlangt haben. <br /> Für eine erfolgreiche Implementierung eines Mustererkennungssytems ist es notwendig, eine geeignete Parametrisierung der Rohdaten vorzunehmen. Basierend auf den Erfahrungswerten seismologischer Observatorien wird ein Vorgehen zur Parametrisierung des seismischen Wellenfeldes auf Grundlage von robusten Analyseverfahren vorgeschlagen. Die Wellenfeldparameter werden pro Zeitschritt in einen reell-wertigen Mustervektor zusammengefasst. Die aus diesen Mustervektoren gebildete Zeitreihe ist dann Gegenstand des HMM-basierten Erkennungssystems. Um diskrete hidden Markov Modelle (DHMM) verwenden zu können, werden die Mustervektoren durch eine lineare Transformation und nachgeschaltete Vektor Quantisierung in eine diskrete Symbolsequenz überführt. Als Klassifikator kommt eine Maximum-Likelihood Testfunktion zwischen dieser Sequenz und den, in einem überwachten Lernverfahren trainierten, DHMMs zum Einsatz.<br /> Die am Merapi kontinuierlich aufgezeichneten seismischen Daten im Zeitraum vom 01.07. und 05.07.1998 sind besonders für einen Test dieses Klassifikationssystems geeignet. In dieser Zeit zeigte der Merapi einen rapiden Anstieg der Seismizität kurz bevor dem Auftreten zweier Eruptionen am 10.07. und 19.07.1998. Drei der bekannten, vom Vulkanologischen Dienst in Indonesien beschriebenen, seimischen Signalklassen konnten in diesem Zeitraum beobachtet werden. Es handelt sich hierbei um flache vulkanisch-tektonische Beben (VTB, h < 2.5 km), um sogenannte MP-Ereignisse, die in direktem Zusammenhang mit dem Wachstum des aktiven Lavadoms gebracht werden, und um seismische Ereignisse, die durch Gesteinslawinen erzeugt werden (lokaler Name: Guguran).<br /> Die spezielle Geometrie des digitalen seismischen Netzwerkes am Merapi besteht aus einer Kombination von drei Mini-Arrays an den Flanken des Merapi. Für die Parametrisierung des Wellenfeldes werden deswegen seismische Array-Verfahren eingesetzt. Die individuellen Wellenfeld Parameter wurden hinsichtlich ihrer Relevanz für den Klassifikationsprozess detailliert analysiert. Für jede der drei Signalklassen wurde ein Satz von DHMMs trainiert. Zusätzlich wurden als Ausschlussklassen noch zwei Gruppen von Noise-Modellen unterschieden.<br /> Insgesamt konnte mit diesem Ansatz eine Erkennungsrate von 67 % erreicht werden. Im Mittel erzeugte das automatische Klassifizierungssystem 41 Fehlalarme pro Tag und Klasse. Die Güte der Klassifikationsergebnisse zeigt starke Variationen zwischen den individuellen Signalklassen. Flache vulkanisch-tektonische Beben (VTB) zeigen sehr ausgeprägte Wellenfeldeigenschaften und, zumindest im untersuchten Zeitraum, sehr stabile Zeitmuster der individuellen Wellenfeldparameter. Das DHMM-basierte Klassifizierungssystem erlaubte für diesen Ereignistyp nahezu 89% richtige Entscheidungen und erzeugte im Mittel 2 Fehlalarme pro Tag.<br /> Ereignisse der Klassen MP und Guguran sind mit dem automatischen System schwieriger zu erkennen. 64% aller MP-Ereignisse und 74% aller Guguran-Ereignisse wurden korrekt erkannt. Im Mittel kam es bei MP-Ereignissen zu 87 Fehlalarmen und bei Guguran Ereignissen zu 33 Fehlalarmen pro Tag. Eine Vielzahl der Fehlalarme und nicht detektierten Ereignisse entstehen jedoch durch eine Verwechslung dieser beiden Signalklassen im automatischen Erkennnungsprozess. Dieses Ergebnis konnte aufgrund der ähnlichen Wellenfeldeigenschaften beider Signalklassen erklärt werden, deren Ursache vermutlich in den bekannt starken Einflüssen des Mediums entlang des Wellenausbreitungsweges in vulkanischen Gebieten liegen. <br /> Insgesamt ist die Erkennungsleistung des entwickelten automatischen Klassifizierungssystems als sehr vielversprechend einzustufen. Im Gegensatz zu Standardverfahren, bei denen in der Seismologie üblicherweise nur der Startzeitpunkt eines seismischen Ereignisses detektiert wird, werden in dem untersuchten Verfahren seismische Ereignisse in ihrer Gesamtheit erfasst und zudem im selben Schritt bereits klassifiziert. / Merapi volcano is one of the most active and dangerous volcanoes of the earth. Located in central part of Java island (Indonesia), even a moderate eruption of Merapi poses a high risk to the highly populated area. Due to the close relationship between the volcanic unrest and the occurrence of seismic events at Mt. Merapi, the monitoring of Merapi's seismicity plays an important role for recognizing major changes in the volcanic activity. An automatic seismic event detection and classification system, which is capable to characterize the actual seismic activity in near real-time, is an important tool which allows the scientists in charge to take immediate decisions during a volcanic crisis. <br /> In order to accomplish the task of detecting and classifying volcano-seismic signals automatically in the continuous data streams, a pattern recognition approach has been used. It is based on the method of hidden Markov models (HMM), a technique, which has proven to provide high recognition rates at high confidence levels in classification tasks of similar complexity (e.g. speech recognition). Any pattern recognition system relies on the appropriate representation of the input data in order to allow a reasonable class-decision by means of a mathematical test function. Based on the experiences from seismological observatory practice, a parametrization scheme of the seismic waveform data is derived using robust seismological analysis techniques. The wavefield parameters are summarized into a real-valued feature vector per time step. The time series of this feature vector build the basis for the HMM-based classification system. In order to make use of discrete hidden Markov (DHMM) techniques, the feature vectors are further processed by applying a de-correlating and prewhitening transformation and additional vector quantization. The seismic wavefield is finally represented as a discrete symbol sequence with a finite alphabet. This sequence is subject to a maximum likelihood test against the discrete hidden Markov models, learned from a representative set of training sequences for each seismic event type of interest.<br /> A time period from July, 1st to July, 5th, 1998 of rapidly increasing seismic activity prior to the eruptive cycle between July, 10th and July, 19th, 1998 at Merapi volcano is selected for evaluating the performance of this classification approach. Three distinct types of seismic events according to the established classification scheme of the Volcanological Survey of Indonesia (VSI) have been observed during this time period. Shallow volcano-tectonic events VTB (h < 2.5 km), very shallow dome-growth related seismic events MP (h < 1 km) and seismic signals connected to rockfall activity originating from the active lava dome, termed Guguran.<br /> The special configuration of the digital seismic station network at Merapi volcano, a combination of small-aperture array deployments surrounding Merapi's summit region, allows the use of array methods to parametrize the continuously recorded seismic wavefield. The individual signal parameters are analyzed to determine their relevance for the discrimination of seismic event classes. For each of the three observed event types a set of DHMMs has been trained using a selected set of seismic events with varying signal to noise ratios and signal durations. Additionally, two sets of discrete hidden Markov models have been derived for the seismic noise, incorporating the fact, that the wavefield properties of the ambient vibrations differ considerably during working hours and night time. <br /> A total recognition accuracy of 67% is obtained. The mean false alarm (FA) rate can be given by 41 FA/class/day. However, variations in the recognition capabilities for the individual seismic event classes are significant. Shallow volcano-tectonic signals (VTB) show very distinct wavefield properties and (at least in the selected time period) a stable time pattern of wavefield attributes. The DHMM-based classification performs therefore best for VTB-type events, with almost 89% recognition accuracy and 2 FA/day. <br /> Seismic signals of the MP- and Guguran-classes are more difficult to detect and classify. Around 64% of MP-events and 74% of Guguran signals are recognized correctly. The average false alarm rate for MP-events is 87 FA/day, whereas for Guguran signals 33 FA/day are obtained. However, the majority of missed events and false alarms for both MP and Guguran events are due to confusion errors between these two event classes in the recognition process. <br /> The confusion of MP and Guguran events is interpreted as being a consequence of the selected parametrization approach for the continuous seismic data streams. The observed patterns of the analyzed wavefield attributes for MP and Guguran events show a significant amount of similarity, thus providing not sufficient discriminative information for the numerical classification. The similarity of wavefield parameters obtained for seismic events of MP and Guguran type reflect the commonly observed dominance of path effects on the seismic wave propagation in volcanic environments.<br /> The recognition rates obtained for the five-day period of increasing seismicity show, that the presented DHMM-based automatic classification system is a promising approach for the difficult task of classifying volcano-seismic signals. Compared to standard signal detection algorithms, the most significant advantage of the discussed technique is, that the entire seismogram is detected and classified in a single step. Earth sciences
56	Sequence-based predictions of membrane-protein topology, homology and insertion Bernsel, Andreas January 2008 (has links) Membrane proteins comprise around 20-30% of a typical proteome and play crucial roles in a wide variety of biochemical pathways. Apart from their general biological significance, membrane proteins are of particular interest to the pharmaceutical industry, being targets for more than half of all available drugs. This thesis focuses on prediction methods for membrane proteins that ultimately rely on their amino acid sequence only. By identifying soluble protein domains in membrane protein sequences, we were able to constrain and improve prediction of membrane protein topology, i.e. what parts of the sequence span the membrane and what parts are located on the cytoplasmic and extra-cytoplasmic sides. Using predicted topology as input to a profile-profile based alignment protocol, we managed to increase sensitivity to detect distant membrane protein homologs. Finally, experimental measurements of the level of membrane integration of systematically designed transmembrane helices in vitro were used to derive a scale of position-specific contributions to helix insertion efficiency for all 20 naturally occurring amino acids. Notably, position within the helix was found to be an important factor for the contribution to helix insertion efficiency for polar and charged amino acids, reflecting the highly anisotropic environment of the membrane. Using the scale to predict natural transmembrane helices in protein sequences revealed that, whereas helices in single-spanning proteins are typically hydrophobic enough to insert by themselves, a large part of the helices in multi-spanning proteins seem to require stabilizing helix-helix interactions for proper membrane integration. Implementing the scale to predict full transmembrane topologies yielded results comparable to the best statistics-based topology prediction methods. membrane protein topology prediction hidden markov model homology detection Sec translocon Bioinformatics Bioinformatik
57	Model Based Speech Enhancement and Coding Zhao, David Yuheng January 2007 (has links) In mobile speech communication, adverse conditions, such as noisy acoustic environments and unreliable network connections, may severely degrade the intelligibility and natural- ness of the received speech quality, and increase the listening effort. This thesis focuses on countermeasures based on statistical signal processing techniques. The main body of the thesis consists of three research articles, targeting two specific problems: speech enhancement for noise reduction and flexible source coder design for unreliable networks. Papers A and B consider speech enhancement for noise reduction. New schemes based on an extension to the auto-regressive (AR) hidden Markov model (HMM) for speech and noise are proposed. Stochastic models for speech and noise gains (excitation variance from an AR model) are integrated into the HMM framework in order to improve the modeling of energy variation. The extended model is referred to as a stochastic-gain hidden Markov model (SG-HMM). The speech gain describes the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain improves the tracking of the time-varying energy of non-stationary noise, e.g., due to movement of the noise source. In Paper A, it is assumed that prior knowledge on the noise environment is available, so that a pre-trained noise model is used. In Paper B, the noise model is adaptive and the model parameters are estimated on-line from the noisy observations using a recursive estimation algorithm. Based on the speech and noise models, a novel Bayesian estimator of the clean speech is developed in Paper A, and an estimator of the noise power spectral density (PSD) in Paper B. It is demonstrated that the proposed schemes achieve more accurate models of speech and noise than traditional techniques, and as part of a speech enhancement system provide improved speech quality, particularly for non-stationary noise sources. In Paper C, a flexible entropy-constrained vector quantization scheme based on Gaus- sian mixture model (GMM), lattice quantization, and arithmetic coding is proposed. The method allows for changing the average rate in real-time, and facilitates adaptation to the currently available bandwidth of the network. A practical solution to the classical issue of indexing and entropy-coding the quantized code vectors is given. The proposed scheme has a computational complexity that is independent of rate, and quadratic with respect to vector dimension. Hence, the scheme can be applied to the quantization of source vectors in a high dimensional space. The theoretical performance of the scheme is analyzed under a high-rate assumption. It is shown that, at high rate, the scheme approaches the theoretically optimal performance, if the mixture components are located far apart. The practical performance of the scheme is confirmed through simulations on both synthetic and speech-derived source vectors. / QC 20100825 statistical model Gaussian mixture mdel (GMM) hidden Markov model (HMM) moise reduction Telecommunication Telekommunikation
58	Exploring the Behaviour of the Hidden Markov Model on CpG Island Prediction 2013 April 1900 (has links) DNA can be represented abstrzctly as a language with only four nucleotides represented by the letters A, C, G, and T, yet the arrangement of those four letters plays a major role in determining the development of an organism. Understanding the signi cance of certain arrangements of nucleotides can unlock the secrets of how the genome achieves its essential functionality. Regions of DNA particularly enriched with cytosine (C nucleotides) and guanine (G nucleotides), especially the CpG di-nucleotide, are frequently associated with biological function related to gene expression, and concentrations of CpGs referred to as \CpG islands" are known to collocate with regions upstream from gene coding sequences within the promoter region. The pattern of occurrence of these nucleotides, relative to adenine (A nucleotides) and thymine (T nucleotides), lends itself to analysis by machine-learning techniques such as Hidden Markov Models (HMMs) to predict the areas of greater enrichment. HMMs have been applied to CpG island prediction before, but often without an awareness of how the outcomes are a ected by the manner in which the HMM is applied. Two main ndings of this study are: 1. The outcome of a HMM is highly sensitive to the setting of the initial probability estimates. 2. Without the appropriate software techniques, HMMs cannot be applied e ectively to large data such as whole eukaryotic chromosomes. Both of these factors are rarely considered by users of HMMs, but are critical to a successful application of HMMs to large DNA sequences. In fact, these shortcomings were discovered through a close examination of published results of CpG island prediction using HMMs, and without being addressed, can lead to an incorrect implementation and application of HMM theory. A rst-order HMM is developed and its performance compared to two other historical methods, the Takai and Jones method and the UCSC method from the University of California Santa Cruz. The HMM is then extended to a second-order to acknowledge that pairs of nucleotides de ne CpG islands rather than single nucleotides alone, and the second-order HMM is evaluated in comparison to the other methods. The UCSC method is found to be based on properties that are not related to CpG islands, and thus is not a fair comparison to the other methods. Of the other methods, the rst-order HMM method and the Takai and Jones method are comparable in the tests conducted, but the second-order HMM method demonstrates superior predictive capabilities. However, these results are valid only when taking into consideration the highly sensitive outcomes based on initial estimates, and nding a suitable set of estimates that provide the most appropriate results. The rst-order HMM is applied to the problem of producing synthetic data that simulates the characteristics of a DNA sequence, including the speci ed presence of CpG islands, based on the model parameters of a trained HMM. HMM analysis is applied to the synthetic data to explore its delity in generating data with similar characteristics, as well as to validate the predictive ability of an HMM. Although this test fails to i meet expectations, a second test using a second-order HMM to produce simulated DNA data using frequency distributions of CpG island pro les exhibits highly accurate predictions of the pre-speci ed CpG islands, con- rming that when the synthetic data are appropriately structured, an HMM can be an accurate predictive tool. One outcome of this thesis is a set of software components (CpGID 2.0 and TrackMap) capable of ef- cient and accurate application of an HMM to genomic sequences, together with visualization that allows quantitative CpG island results to be viewed in conjunction with other genomic data. CpGID 2.0 is an adaptation of a previously published software component that has been extensively revised, and TrackMap is a companion product that works with the results produced by the CpGID 2.0 program. Executing these components allows one to monitor output aspects of the computational model such as number and size of the predicted CpG islands, including their CG content percentage and level of CpG frequency. These outcomes can then be related to the input values used to parameterize the HMM. CpG islands Hidden Markov Model synthetic data Baum-Welch Viterbi methylation
59	Automated Rehabilitation Exercise Motion Tracking Lin, Jonathan Feng-Shun January 2012 (has links) Current physiotherapy practice relies on visual observation of the patient for diagnosis and assessment. The assessment process can potentially be automated to improve accuracy and reliability. This thesis proposes a method to recover patient joint angles and automatically extract movement profiles utilizing small and lightweight body-worn sensors. Joint angles are estimated from sensor measurements via the extended Kalman filter (EKF). Constant-acceleration kinematics is employed as the state evolution model. The forward kinematics of the body is utilized as the measurement model. The state and measurement models are used to estimate the position, velocity and acceleration of each joint, updated based on the sensor inputs from inertial measurement units (IMUs). Additional joint limit constraints are imposed to reduce drift, and an automated approach is developed for estimating and adapting the process noise during on-line estimation. Once joint angles are determined, the exercise data is segmented to identify each of the repetitions. This process of identifying when a particular repetition begins and ends allows the physiotherapist to obtain useful metrics such as the number of repetitions performed, or the time required to complete each repetition. A feature-guided hidden Markov model (HMM) based algorithm is developed for performing the segmentation. In a sequence of unlabelled data, motion segment candidates are found by scanning the data for velocity-based features, such as velocity peaks and zero crossings, which match the pre-determined motion templates. These segment potentials are passed into the HMM for template matching. This two-tier approach combines the speed of a velocity feature based approach, which only requires the data to be differentiated, with the accuracy of the more computationally-heavy HMM, allowing for fast and accurate segmentation. The proposed algorithms were verified experimentally on a dataset consisting of 20 healthy subjects performing rehabilitation exercises. The movement data was collected by IMUs strapped onto the hip, thigh and calf. The joint angle estimation system achieves an overall average RMS error of 4.27 cm, when compared against motion capture data. The segmentation algorithm reports 78% accuracy when the template training data comes from the same participant, and 74% for a generic template. Physiotherapy Kalman filter Hidden Markov model Inertial measurement units Forward kinematics Segmentation and identification Electrical and Computer Engineering
60	Multivariate Longitudinal Data Analysis with Mixed Effects Hidden Markov Models Raffa, Jesse Daniel January 2012 (has links) Longitudinal studies, where data on study subjects are collected over time, is increasingly involving multivariate longitudinal responses. Frequently, the heterogeneity observed in a multivariate longitudinal response can be attributed to underlying unobserved disease states in addition to any between-subject differences. We propose modeling such disease states using a hidden Markov model (HMM) approach and expand upon previous work, which incorporated random effects into HMMs for the analysis of univariate longitudinal data, to the setting of a multivariate longitudinal response. Multivariate longitudinal data are modeled jointly using separate but correlated random effects between longitudinal responses of mixed data types in addition to a shared underlying hidden process. We use a computationally efficient Bayesian approach via Markov chain Monte Carlo (MCMC) to fit such models. We apply this methodology to bivariate longitudinal response data from a smoking cessation clinical trial. Under these models, we examine how to incorporate a treatment effect on the disease states, as well as develop methods to classify observations by disease state and to attempt to understand patient dropout. Simulation studies were performed to evaluate the properties of such models and their applications under a variety of realistic situations. multivariate longitudinal data hidden markov model random effects Markov chain monte carlo Statistics (Biostatistics)

Search results