Spelling suggestions: "subject:"expectationmaximization"" "subject:"expectationmaximisation""
81 |
Algorithmic Trading : Hidden Markov Models on Foreign Exchange DataIdvall, Patrik, Jonsson, Conny January 2008 (has links)
<p>In this master's thesis, hidden Markov models (HMM) are evaluated as a tool for forecasting movements in a currency cross. With an ever increasing electronic market, making way for more automated trading, or so called algorithmic trading, there is constantly a need for new trading strategies trying to find alpha, the excess return, in the market.</p><p>HMMs are based on the well-known theories of Markov chains, but where the states are assumed hidden, governing some observable output. HMMs have mainly been used for speech recognition and communication systems, but have lately also been utilized on financial time series with encouraging results. Both discrete and continuous versions of the model will be tested, as well as single- and multivariate input data.</p><p>In addition to the basic framework, two extensions are implemented in the belief that they will further improve the prediction capabilities of the HMM. The first is a Gaussian mixture model (GMM), where one for each state assign a set of single Gaussians that are weighted together to replicate the density function of the stochastic process. This opens up for modeling non-normal distributions, which is often assumed for foreign exchange data. The second is an exponentially weighted expectation maximization (EWEM) algorithm, which takes time attenuation in consideration when re-estimating the parameters of the model. This allows for keeping old trends in mind while more recent patterns at the same time are given more attention.</p><p>Empirical results shows that the HMM using continuous emission probabilities can, for some model settings, generate acceptable returns with Sharpe ratios well over one, whilst the discrete in general performs poorly. The GMM therefore seems to be an highly needed complement to the HMM for functionality. The EWEM however does not improve results as one might have expected. Our general impression is that the predictor using HMMs that we have developed and tested is too unstable to be taken in as a trading tool on foreign exchange data, with too many factors influencing the results. More research and development is called for.</p>
|
82 |
Representation and interpretation of manual and non-manual information for automated American Sign Language recognition [electronic resource] / by Ayush S Parashar.Parashar, Ayush S. January 2003 (has links)
Title from PDF of title page. / Document formatted into pages; contains 80 pages. / Thesis (M.S.C.S.)--University of South Florida, 2003. / Includes bibliographical references. / Text (Electronic thesis) in PDF format. / ABSTRACT: Continuous recognition of sign language has many practical applications and it can help to improve the quality of life of deaf persons by facilitating their interaction with hearing populace in public situations. This has led to some research in automated continuous American Sign Language recognition. But most work in continuous ASL recognition has only used top-down Hidden Markov Model (HMM) based approaches for recognition. There is no work on using facial information, which is considered to be fairly important. In this thesis, we explore bottom-up approach based on the use of Relational Distributions and Space of Probability Functions (SoPF) for intermediate level ASL recognition. We also use non-manual information, firstly, to decrease the number of deletion and insertion errors and secondly, to find whether the ASL sentence has 'Negation' in it, for which we use motion trajectories of the face. / ABSTRACT: The experimental results show: - The SoPF representation works well for ASL recognition. The accuracy based on the number of deletion errors, considering the 8 most probable signs in the sentence is 95%, while when considering 6 most probable signs, is 88%. - Using facial or non-manual information increases accuracy when we consider top 6 signs, from 88% to 92%. Thus face does have information content in it. - It is difficult to directly combine the manual information (information from hand motion) with non-manual (facial information) to improve the accuracy because of following two reasons: 1. Manual images are not synchronized with the non-manual images. For example the same facial expressions is not present at the same manual position in two instances of the same sentences. 2. One another problem in finding the facial expresion related with the sign, occurs when there is presence of a strong non-manual indicating 'Assertion' or 'Negation' in the sentence. / ABSTRACT: In such cases the facial expressions are totally dominated by the face movements which is indicated by 'head shakes' or 'head nods'. - The number of sentences, that have 'Negation' in them and are correctly recognized with the help of motion trajectories of the face are, 27 out of 30. / System requirements: World Wide Web browser and PDF reader. / Mode of access: World Wide Web.
|
83 |
A sensor fusion method for detection of surface laid land minesWestberg, Daniel January 2007 (has links)
Landminor är ett stort problem både under och efter krigstid. De metoder som används för att detektera minor har inte ändrats mycket sedan 1940-talet. Forskning med mål att utvärdera olika elektro-optiska sensorer och metoder som skulle kunna användas för att skapa mer effektiv min-detektion genomförs på FOI. Försök som har gjorts med data från bland annat laser-radar och IR-sensorer har gett intressanta resultat. I det här examensarbetet utvärderades olika fenomen och egenskaper i laser-radar- och IR-data. De testade egenskaperna var intensitet, IR, ytlikhet och höjd. En metod som segmenterar intressanta objekt och bakgrundsdata utformades och implementerades. Metoden använde sig av expectation-maximization-skattning och ett minimum message length-kriterium. Ett scatter separability-kriterium användes för att bestämma kvalitén på de olika egenskaperna och på den resulterande segmenteringen. Data insamlad under en mätkampanj av FOI användes för att testa metoden. Resultatet visade bland annat att ytlikhetsmåttet gav en bra segmentering för stora objekt med släta ytor, men var sämre för små objekt med skrovliga ytor. Vid jämförelse med en manuellt skapad mål-mask visade det sig att metoden klarade av att välja ut egenskaper som i många fall gav en godkänd segmentering. / Land mines are a huge problem in conflict time and after. Methods used to detect mines have not changed much since the 1940's. Research aiming to evaluate output from different electro-optical sensors and develop methods for more efficient mine detection is performed at FOI. Early experiments with laser radar sensors show promising results, as do analysis of data from infrared sensors. In this thesis, an evaluation is made of features found in laser radar- and in infrared -sensor data. The tested features are intensity, infrared, a surfaceness feature extracted from the laser radar data and height above an estimated ground plane. A method for segmenting interesting objects from background data using theexpectation-maximization algorithm and a minimum message length criterion is designed and implemented. A scatter separability criterion is utilized to determine the quality of the features and the resulting segmentation. The method is tested on real data from a field trial performed by FOI. The results show that the surfaceness feature supports the segmentation of larger object with smooth surfaces but gives no contribution to small object with irregular surfaces. The method produces a decent result of selecting contributing features for different neighbourhoods of a scene. A comparison with a manually created target mask of the neighbourhood and the segmented components show that in most cases a high percentage separation of mine data and background data is possible.
|
84 |
Exploiting Non-Sequence Data in Dynamic Model LearningHuang, Tzu-Kuo 01 October 2013 (has links)
Virtually all methods of learning dynamic models from data start from the same basic assumption: that the learning algorithm will be provided with a single or multiple sequences of data generated from the dynamic model. However, in quite a few modern time series modeling tasks, the collection of reliable time series data turns out to be a major challenge, due to either slow progression of the dynamic process of interest, or inaccessibility of repetitive measurements of the same dynamic process over time. In most of those situations, however, we observe that it is easier to collect a large amount of non-sequence samples, or random snapshots of the dynamic process of interest without time information. This thesis aims to exploit such non-sequence data in learning a few widely used dynamic models, including fully observable, linear and nonlinear models as well as Hidden Markov Models (HMMs). For fully observable models, we point out several issues on model identifiability when learning from non-sequence data, and develop EM-type learning algorithms based on maximizing approximate likelihood. We also consider the setting where a small amount of sequence data are available in addition to non-sequence data, and propose a novel penalized least square approach that uses non-sequence data to regularize the model. For HMMs, we draw inspiration from recent advances in spectral learning of latent variable models and propose spectral algorithms that provably recover the model parameters, under reasonable assumptions on the generative process of non-sequence data and the true model. To the best of our knowledge, this is the first formal guarantee on learning dynamic models from non-sequence data. We also consider the case where little sequence data are available, and propose learning algorithms that, as in the fully observable case, use non-sequence data to provide regularization, but does so in combination with spectral methods. Experiments on synthetic data and several real data sets, including gene expression and cell image time series, demonstrate the effectiveness of our proposed methods. In the last part of the thesis we return to the usual setting of learning from sequence data, and consider learning bi-clustered vector auto-regressive models, whose transition matrix is both sparse, revealing significant interactions among variables, and bi-clustered, identifying groups of variables that have similar interactions with other variables. Such structures may aid other learning tasks in the same domain that have abundant non-sequence data by providing better regularization in our proposed non-sequence methods.
|
85 |
Using Primary Dynamic Factor Analysis on repeated cross-sectional surveys with binary responses / Primär Dynamisk Faktoranalys för upprepade tvärsnittsundersökningar med binära svarEdenheim, Arvid January 2020 (has links)
With the growing popularity of business analytics, companies experience an increasing need of reliable data. Although the availability of behavioural data showing what the consumers do has increased, the access to data showing consumer mentality, what the con- sumers actually think, remain heavily dependent on tracking surveys. This thesis inves- tigates the performance of a Dynamic Factor Model using respondent-level data gathered through repeated cross-sectional surveys. Through Monte Carlo simulations, the model was shown to improve the accuracy of brand tracking estimates by double digit percent- ages, or equivalently reducing the required amount of data by more than a factor 2, while maintaining the same level of accuracy. Furthermore, the study showed clear indications that even greater performance benefits are possible.
|
86 |
Improving the speed and quality of an Adverse Event cluster analysis with Stepwise Expectation Maximization and Community DetectionErlanson, Nils January 2020 (has links)
Adverse drug reactions are unwanted effects alongside the intended benefit of a drug and might be responsible for 3-7\% of hospitalizations. Finding such reactions is partly done by analysing individual case safety reports (ICSR) of adverse events. The reports consist of categorical terms that describe the event.Data-driven identification of suspected adverse drug reactions using this data typically considers single adverse event terms, one at a time. This single term approach narrows the identification of reports and information in the reports is ignored during the search. If one instead assumes that each report is connected to a topic, then by creating a cluster of the reports that are connected to the topic more reports would be identified. More context would also be provided by virtue of the topics. This thesis takes place at Uppsala Monitoring Centre which has implemented a probabilistic model of how an ICSR, and its topic, is assumed to be generated. The parameters of the model are estimated with expectation maximization (EM), which also assigns the reports to clusters. The clusters are improved with Consensus Clustering that identify groups of reports that tend to be grouped together by several runs of EM. Additionally, in order to not cluster outlying reports all clusters below a certain size are excluded. The objective of the thesis is to improve the algorithm in terms of computational efficiency and quality, as measured by stability and clinical coherence. The convergence of EM is improved using stepwise EM, which resulted in a speed up of at least 1.4, and a decrease of the computational complexity. With all the speed improvements the speed up factor of the entire algorithm can reach 2 but is constrained by the size of the data. In order to improve the clusters' quality, the community detection algorithm Leiden is used. It is able to improve the stability with the added benefit of increasing the number of clustered reports. The clinical coherence score performs worse with Leiden. There are good reasons to further investigate the benefits of Leiden as there were suggestions that community detection identified clusters with greater resolution that still appeared clinically coherent in a posthoc analysis.
|
87 |
Application of Inter-Die Rank Statistics in Defect DetectionBakshi, Vivek 01 March 2012 (has links)
This thesis presents a statistical method to identify the test escapes. Test often acquires parametric measurements as a function of logical state of a chip. The usual method of classifying chips as pass or fail is to compare each state measurement to a test limit. Subtle manufacturing defects are escaping the test limits due to process variations in deep sub-micron technologies which results in mixing of healthy and faulty parametric test measurements. This thesis identifies the chips with subtle defects by using rank order of the parametric measurements. A hypothesis is developed that a defect is likely to disturb the defect-free ranking, whereas a shift caused by process variations will not affect the rank. The hypothesis does not depend on a-priori knowledge of a defect-free ranking of parametric measurements. This thesis introduces a modified Estimation Maximization (EM) algorithm to separate the healthy and faulty tau components calculated from parametric responses of die pairs on a wafer. The modified EM uses generalized beta distributions to model the two components of tau mixture distribution. The modified EM estimates the faulty probability of each die on a wafer. The sensitivity of the modified EM is evaluated using Monte Carlo simulations. The modified EM is applied on production product A. An average 30% reduction in DPPM (defective parts per million) is observed in Product A across all lots.
|
88 |
Gibbs Sampling and Expectation Maximization Methods for Estimation of Censored Values from Correlated Multivariate DistributionsHUNTER, TINA D. 25 August 2008 (has links)
No description available.
|
89 |
Methods for Differential Analysis of Gene Expression and Metabolic Pathway ActivityTemate Tiagueu, Yvette Charly B, Temate Tiagueu, Yvette C. B. 09 May 2016 (has links)
RNA-Seq is an increasingly popular approach to transcriptome profiling that uses the capabilities of next generation sequencing technologies and provides better measurement of levels of transcripts and their isoforms. In this thesis, we apply RNA-Seq protocol and transcriptome quantification to estimate gene expression and pathway activity levels. We present a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. In the first version of IsoDE, we compared the tool against four existing methods: Fisher's exact test, GFOLD, edgeR and Cuffdiff on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. We also introduce the second version of IsoDE which runs 10 times faster than the first implementation due to some in-memory processing applied to the underlying gene expression frequencies estimation tool and we also perform more optimization on the analysis.
The second part of this thesis presents a set of tools to differentially analyze metabolic pathways from RNA-Seq data. Metabolic pathways are series of chemical reactions occurring within a cell. We focus on two main problems in metabolic pathways differential analysis, namely, differential analysis of their inferred activity level and of their estimated abundance. We validate our approaches through differential expression analysis at the transcripts and genes levels and also through real-time quantitative PCR experiments. In part Four, we present the different packages created or updated in the course of this study. We conclude with our future work plans for further improving IsoDE 2.0.
|
90 |
Adaptation de modèles statistiques pour la séparation de sources mono-capteur Texte imprimé : application à la séparation voix / musique dans les chansonsOzerov, Alexey 15 December 2006 (has links) (PDF)
La séparation de sources avec un seul capteur est un problème très récent, qui attire de plus en plus d'attention dans le monde scientifique. Cependant, il est loin d'être résolu et, même plus, il ne peut pas être résolu en toute généralité. La difficulté principale est que, ce problème étant extrêmement sous déterminé, il faut disposer de fortes connaissances sur les sources pour pouvoir les séparer. Pour une grande partie des méthodes de séparation, ces connaissances sont représentées par des modèles statistiques des sources, notamment par des Modèles de Mélange de Gaussiennes (MMG), qui sont appris auparavant à partir d'exemples. L'objet de cette thèse est d'étudier les méthodes de séparation basées sur des modèles statistiques en général, puis de les appliquer à un problème concret, tel que la séparation de la voix par rapport à la musique dans des enregistrements monophoniques de chansons. Apporter des solutions à ce problème, qui est assez difficile et peu étudié pour l'instant, peut être très utile pour faciliter l'analyse du contenu des chansons, par exemple dans le contexte de l'indexation audio. Les méthodes de séparation existantes donnent de bonnes performances à condition que les caractéristiques des modèles statistiques utilisés soient proches de celles des sources à séparer. Malheureusement, il n'est pas toujours possible de construire et d'utiliser en pratique de tels modèles, à cause de l'insuffisance des exemples d'apprentissage représentatifs et des ressources calculatoires. Pour remédier à ce problème, il est proposé dans cette thèse d'adapter a posteriori les modèles aux sources à séparer. Ainsi, un formalisme général d'adaptation est développé. En s'inspirant de techniques similaires utilisées en reconnaissance de la parole, ce formalisme est introduit sous la forme d'un critère d'adaptation Maximum A Posteriori (MAP). De plus, il est montré comment optimiser ce critère à l'aide de l'algorithme EM à différents niveaux de généralité. Ce formalisme d'adaptation est ensuite appliqué dans certaines formes particulières pour la séparation voix / musique. Les résultats obtenus montrent que pour cette tâche, l'utilisation des modèles adaptés permet d'augmenter significativement (au moins de 5 dB) les performances de séparation par rapport aux modèles non adaptés. Par ailleurs, il est observé que la séparation de la voix chantée facilite l'estimation de sa fréquence fondamentale (pitch), et que l'adaptation des modèles ne fait qu'améliorer ce résultat.
|
Page generated in 0.1505 seconds