Global ETD Search

201	Programming language semantics as a foundation for Bayesian inference Szymczak, Marcin January 2018 (has links) Bayesian modelling, in which our prior belief about the distribution on model parameters is updated by observed data, is a popular approach to statistical data analysis. However, writing specific inference algorithms for Bayesian models by hand is time-consuming and requires significant machine learning expertise. Probabilistic programming promises to make Bayesian modelling easier and more accessible by letting the user express a generative model as a short computer program (with random variables), leaving inference to the generic algorithm provided by the compiler of the given language. However, it is not easy to design a probabilistic programming language correctly and define the meaning of programs expressible in it. Moreover, the inference algorithms used by probabilistic programming systems usually lack formal correctness proofs and bugs have been found in some of them, which limits the confidence one can have in the results they return. In this work, we apply ideas from the areas of programming language theory and statistics to show that probabilistic programming can be a reliable tool for Bayesian inference. The first part of this dissertation concerns the design, semantics and type system of a new, substantially enhanced version of the Tabular language. Tabular is a schema-based probabilistic language, which means that instead of writing a full program, the user only has to annotate the columns of a schema with expressions generating corresponding values. By adopting this paradigm, Tabular aims to be user-friendly, but this unusual design also makes it harder to define the syntax and semantics correctly and reason about the language. We define the syntax of a version of Tabular extended with user-defined functions and pseudo-deterministic queries, design a dependent type system for this language and endow it with a precise semantics. We also extend Tabular with a concise formula notation for hierarchical linear regressions, define the type system of this extended language and show how to reduce it to pure Tabular. In the second part of this dissertation, we present the first correctness proof for a Metropolis-Hastings sampling algorithm for a higher-order probabilistic language. We define a measure-theoretic semantics of the language by means of an operationally-defined density function on program traces (sequences of random variables) and a map from traces to program outputs. We then show that the distribution of samples returned by our algorithm (a variant of “Trace MCMC” used by the Church language) matches the program semantics in the limit.
202	Analysing plant closure effects using time-varying mixture-of-experts Markov chain clustering Frühwirth-Schnatter, Sylvia, Pittner, Stefan, Weber, Andrea, Winter-Ebmer, Rudolf January 2018 (has links) (PDF) In this paper we study data on discrete labor market transitions from Austria. In particular, we follow the careers of workers who experience a job displacement due to plant closure and observe - over a period of 40 quarters - whether these workers manage to return to a steady career path. To analyse these discrete-valued panel data, we apply a new method of Bayesian Markov chain clustering analysis based on inhomogeneous first order Markov transition processes with time-varying transition matrices. In addition, a mixtureof- experts approach allows us to model the probability of belonging to a certain cluster as depending on a set of covariates via a multinomial logit model. Our cluster analysis identifies five career patterns after plant closure and reveals that some workers cope quite easily with a job loss whereas others suffer large losses over extended periods of time.
203	Stochastic Modeling and Optimization to Improve Identification and Treatment of Alzheimer’s Disease January 2018 (has links) abstract: Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading cause of death. Diagnosis of AD cannot be confidently confirmed until after death. This has prompted the importance of early diagnosis of AD, based upon symptoms of cognitive decline. A symptom of early cognitive decline and indicator of AD is Mild Cognitive Impairment (MCI). In addition to this qualitative test, Biomarker tests have been proposed in the medical field including p-Tau, FDG-PET, and hippocampal. These tests can be administered to patients as early detectors of AD thus improving patients’ life quality and potentially reducing the costs of the health structure. Preliminary work has been conducted in the development of a Sequential Tree Based Classifier (STC), which helps medical providers predict if a patient will contract AD or not, by sequentially testing these biomarker tests. The STC model, however, has its limitations and the need for a more complex, robust model is needed. In fact, STC assumes a general linear model as the status of the patient based upon the tests results. We take a simulation perspective and try to define a more complex model that represents the patient evolution in time. Specifically, this thesis focuses on the formulation of a Markov Chain model that is complex and robust. This Markov Chain model emulates the evolution of MCI patients based upon doctor visits and the sequential administration of biomarker tests. Data provided to create this Markov Chain model were collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The data lacked detailed information of the sequential administration of the biomarker tests and therefore, different analytical approaches were tried and conducted in order to calibrate the model. The resulting Markov Chain model provided the capability to conduct experiments regarding different parameters of the Markov Chain and yielded different results of patients that contracted AD and those that did not, leading to important insights into effect of thresholds and sequence on patient prediction capability as well as health costs reduction. The data in this thesis was provided from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI investigators did not contribute to any analysis or writing of this thesis. A list of the ADNI investigators can be found at: http://adni.loni.usc.edu/about/governance/principal-investigators/ . / Dissertation/Thesis / Masters Thesis Industrial Engineering 2018 Industrial engineering Medicine Operations research Alzheimer's Disease Biomarker Tests Markov Chain Simulation
204	Building Reconstruction of Digital Height Models with the Markov Chain Monte Carlo Method Nilsson, Mats January 2018 (has links) Data about the earth is increasing in value and demand from customers, but itis difficult to produce accurately and cheap. This thesis examines if it is possible to take low resolution and distorted 3D data and increase the accuracy of building geometry by performing building reconstruction. Building reconstruction is performed with a Markov chain Monte Carlo method where building primitives are placed iteratively until a good fit is found. The digital height model and pixel classification used is produced by Vricon. The method is able to correctly place primitive models, but often overestimate their dimensions by about 15%. building reconstruction markov chain monte carlo digital height models Signal Processing Signalbehandling
205	Modélisation Stochastique des carnets d'ordres / Stochastic order book modelling Jedidi, Aymen 09 January 2014 (has links) Cette thèse étudie quelques aspects de la modélisation stochastique des carnets d'ordres. Nous analysons dans la première partie un modèle dans lequel les temps d'arrivées des ordres sont Poissoniens indépendants. Nous démontrons que le carnet d'ordres est stable (au sens des chaines de Markov) et qu'il converge vers sa distribution stationnaire exponentiellement vite. Nous en déduisons que le prix engendré dans ce cadre converge vers un mouvement Brownien aux grandes échelles de temps. Nous illustrons les résultats numériquement et les comparons aux données de marché en soulignant les succès du modèle et ses limites. Dans une deuxième partie, nous généralisons les résultats à un cadre où les temps d'arrivés sont régis par des processus auto et mutuellement existants, moyennant des hypothèses sur la mémoire de ces processus. La dernière partie est plus appliquée et traite de l'identification d'un modèle réaliste multivarié à partir des flux des ordres. Nous détaillons deux approches : la première par maximisation de la vraisemblance et la seconde à partir de la densité de covariance, et réussissons à avoir une concordance remarquable avec les données. Nous appliquons le modèle ainsi estimé à deux problèmes concrets de trading algorithmique, à savoir la mesure de la probabilité d'exécution et le coût d'un ordre limite. / This thesis presents some aspects of stochastic order book modelling. In the first part, we analyze a model in which order arrivals are independent Poisson. We show that the order book is stable (in the sense of Markov chains) and that it converges to its stationary state exponentially fast. We deduce that the price generated in this setting converges to a Brownian motion at large time scales. We illustrate the results numerically and compare them to market data. In the second part, we generalize the results to a setting in which arrival times are governed by self and mutually existing processes. The last part is more applied and deals with the identification of a realistic multivariate model from the order flow. We describe two approaches: the first based on maximum likelihood estimation and the second on the covariance density function, and obtain a remarkable agreement with the data. We apply the estimated model to two specific algorithmic trading problems, namely the measurement of the execution probability of a limit order and its cost. Carnet d'ordre Chaine de Markov Stabilité stochastique Order book Markov chain Stochastic stability
206	Contribuições para o controle on-line de processos por atributos. / Contributions to monitoring process for attributes. Anderson Laécio Galindo Trindade 02 April 2008 (has links) O procedimento de controle on-line de processos por atributos, proposto por Taguchi et al. (1989), consiste em amostrar um item a cada m produzidos e decidir, a cada inspeção, se houve ou não aumento na fração de itens não-conformes produzidos. Caso o item inspecionado seja não-conforme, pára-se o processo para ajuste supondo-se que tenha ocorrido uma mudança para a condição fora de controle. Como i) o sistema de inspeção pode estar sujeito a erros de classificação e o item inspecionado ser submetido à classificações repetidas; ii) a fração de itens não-conformes no estado fora de controle pode variar em função do número de itens produzidos (x) segundo uma função y (x); iii) e a decisão de parar o processo pode ser tomada com base no resultado das últimas h inspeções, desenvolve-se um modelo que engloba estes pontos. Utilizando-se as propriedades de uma cadeia de Markov ergódica, obtém-se a expressão do custo médio do sistema de controle, que é minimizada por parâmetros que vão além do intervalo de inspeção m: o número de classificações repetidas r; o número mínimo de classificações conformes para declarar um item como conforme s, o comprimento do histórico de inspeções considerado h e o critério de parada para ajuste u. Os resultados obtidos mostram que: o uso de classificações repetidas pode ser uma alternativa econômica quando apenas um item é considerado na decisão sobre o ajuste do processo; uma cadeia da Markov finita pode ser utilizada para representar o sistema de controle na presença de uma função y (x) não-constante; tomar a decisão de ajuste com base na observação de uma seqüência de itens inspecionados é a alternativa de maior impacto sobre o custo de controle do processo. / The quality control procedure for attributes, proposed by Taguchi et al. (1989), consists in inspecting a single item at every m produced items and, based on the result of each inspection, deciding weather the non-conforming fraction has increased or not. If an inspected item is declared non-conforming, the process is stopped and adjusted, assuming that it has changed to out-of-control condition. Once: i) the inspection system is subject to misclassification and it is possible to carry out repetitive classifications in the inspected item; ii) the non-conforming fraction, when the process is out-of-control, can be described by y(x); iii) the decision about stopping the process can be based on last h inspections, a model which considers those points is developed. Using properties of ergodic Markov chains, the average cost expression is calculated and can be minimized by parameters beyond m: number of repetitive classifications (r); minimum number of classifications as conforming to declare an item as conforming (s); number of inspections taken into account (h) and stopping criteria (u). The results obtained show that: repetitive classifications of the inspected item can be a viable option if only one item is used to decide about the process condition; a finite Markov chain can be used to represent the control procedure in presence of a function y(x); deciding about the process condition based on last h inspections has a significant impact on the average cost. Controle de processos Control procedure for attributes Markov chain Quality deterioration Repetitive classifications
207	Algoritmos de estimação para Cadeias de Markov de alcance variavel : aplicações a detecção do ritmo em textos escritos / Estimation of algorithms for variable length Markov chains : applications in the detection of rhythm in written texts Matta, David Henriques da 25 March 2008 (has links) Orientador: Nancy Lopes Garcia / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-10T20:09:07Z (GMT). No. of bitstreams: 1 Matta_DavidHenriquesda_M.pdf: 974440 bytes, checksum: 6d2fc35e3a33e3e3bbee24baf377bfdb (MD5) Previous issue date: 2008 / Resumo: No presente trabalho, direcionamos nossos estudos à questão de se encontrar evidências estatísticas na detecção de ritmos em textos escritos, apresentando para isso ferramentas probabilísticas que nos permitam discriminar textos brasileiros e portugueses. Para alcançarmos tais objetivos, abordamos alguns resultados teóricos e práticos em modelagem, reamostragem e estimação das cadeias de Markov de alcance variável. Sendo que na parte de reamostragem, propomos um novo método para conjuntos de dados com um ponto de renovação / Abstract: In this project, we focus our studies on the question of finding statistical evidences in detecting rhythm in written texts by presenting probabilistic tools that allow us to discriminate Brazilian and Portuguese texts. To achieve such goals, we some present theoretical and practical results in modeling, resampling and estimation of variable length Markov Chains. More over in the part, we propose a new method of resampling for data sets with a renewal point / Mestrado / Estatistica e Probabilidade / Mestre em Estatística Markov, Cadeias de Algoritmo contexto Bootstrap (Estatística) Markov, Chain Context algorithm Bootstrap (Statistical)
208	Modelagem estocástica do labirinto em cruz elevado / Elevated Plus Maze: a Stochastic Modeling Rafael Arantes 27 September 2016 (has links) O labirinto em cruz elevado é muito usado em estudos relacionados à ansiedade em ratos. As medidas tradicionalmente usadas são o número de entradas e o tempo passado nos braços abertos. Trabalhos recentes analisam a movimentação no interior dos braços, mas não propõem um índice que resuma as análises feitas. Esta tese sintetiza as informações de um destes trabalhos em índices. Um dos índices propostos usa os tempos médios de primeira visita a cada posição do labirinto e o outro se baseia na distribuição estacionária de probabilidade, o primeiro é capaz de diferenciar grupos de ratos submetidos a diferentes drogas ansiolíticas. Além disso, a tese propõe um modelo de processo Markoviano que incorpora informações desconsideradas no modelo anterior. A comparação entre os modelos revelou valores super ou subestimados no primeiro. Por fim, esta tese propõe um modelo de cadeias de Markov considerando como estados o seguinte conjunto de comportamentos: \"rearing\", \"stretching\", \"dipping\", \"freezing\" e \"grooming\". Tal abordagem inédita, apesar de simplificar exageradamente o modelo, foi capaz de reproduzir várias características conhecidas da exploração. / The elevated plus maze is widely used in studies related to anxiety in rats. The most widely used measures are the amount of entries and time spent on open arms. Recent studies analyze the movement inside the arms, but do not propose an index that summarizes the analyzes. This thesis summarizes by indices the information in one of these studies. An index uses the first visit average time to each position of the maze and another is based on the stationary probability distribution, the first one are able to differentiate groups of rats submitted to different anxiolytic drugs. These thesis also proposes a Markov process model that incorporates more information than the other model. The comparison between the models showed over- or underestimated values in the first. Finally, this thesis proposes a Markov chains model considering the following behaviors as states: rearing, stretching, dipping, freezing and grooming. This new approach, though oversimplify the model was able to reproduce several known features of exploitation. Ansiedade Cadeias de Markov Comportamento Labirinto em Cruz Elevado Anxity Behavior Elevated Plus-Maze Markov Chain
209	Comparação de algoritmos usados na construção de mapas genéticos / Comparison of algorithms used in the construction of genetic linkage maps Marcelo Mollinari 23 January 2008 (has links) Mapas genéticos são arranjos lineares que indicam a ordem e distância entre locos nos cromossomos de uma determinada espécie. Recentemente, a grande disponibilidade de marcadores moleculares tem tornado estes mapas cada vez mais saturados, sendo necessários métodos eficientes para sua construção. Uma das etapas que merece mais atenção na construção de mapas de ligação é a ordenação dos marcadores genéticos dentro de cada grupo de ligação. Tal ordenação é considerada um caso especial do clássico problema do caixeiro viajante (TSP), que consiste em escolher a melhor ordem entre todas as possíveis. Entretanto, a estratégia de busca exaustiva torna-se inviável quando o número de marcadores é grande. Nesses casos, para que esses mapas possam ser construídos uma alternativa viável é a utilização de algoritmos que forneçam soluções aproximadas. O objetivo desse trabalho foi avaliar a eficiência dos algoritmos Try (TRY), Seriation (SER), Rapid Chain Delineation (RCD), Recombination Counting and Ordering (RECORD) e Unidirectional Growth (UG), além dos critérios PARF (produto mínimo das frações de recombinação adjacentes), SARF (soma mínima das frações de recombinação adjacentes), SALOD (soma máxima dos LOD scores adjacentes) e LMHC (verossimilhança via cadeias de Markov ocultas), usados juntamente com o algoritmo de verificação de erros RIPPLE, para a construção de mapas genéticos. Para tanto, foi simulado um mapa de ligação de uma espécie vegetal hipotética, diplóide e monóica, contendo 21 marcadores com distância fixa entre eles de 3 centimorgans. Usando o método Monte Carlo, foram obtidas aleatoriamente 550 populações F2 com 100 e 400 indivíduos, além de diferentes combinações de marcadores dominantes e codominantes. Foi ainda simulada perda de 10% e 20% dos dados. Os resultados mostraram que os algoritmos TRY e SER tiveram bons resultados em todas as situações simuladas, mesmo com presença de elevado número de dados perdidos e marcadores dominantes ligados em repulsão, podendo ser então recomendado em situações práticas. Os algoritmos RECORD e UG apresentaram bons resultados na ausência de marcadores dominantes ligados em repulsão, podendo então ser recomendados em situações com poucos marcadores dominantes. Dentre todos os algoritmos, o RCD foi o que se mostrou menos eficiente. O critério LHMC, aplicado com o algoritmo RIPPLE, foi o que apresentou melhores resultados quando se deseja fazer verificações de erros na ordenação. / Genetic linkage maps are linear arrangements showing the order and distance between loci in chromosomes of a particular species. Recently, the availability of molecular markers has made such maps more saturated and efficient methods are needed for their construction. One of the steps that deserves more attention in the construction of genetic linkage maps is the ordering of genetic markers within each linkage group. This ordering is considered a special case of the classic traveling salesman problem (TSP), which consists in choosing the best order among all possible ones. However, the strategy of exhaustive search becomes unfeasible when the number of markers is large. One possible alternative to construct such maps is to use algorithms that provide approximate solutions. Thus, the aim of this work was to evaluate the efficiency of algorithms Try (TRY), Seriation (SER), Rapid Chain Delineation (RCD), Recombination Counting and Ordering (RECORD) and Unidirectional Growth (UG), as well as the criteria PARF (product of adjacent recombination fractions), SARF (sum of adjacent recombination fractions), SALOD (sum of adjacent lod scores) and LMHC (likelihood via hidden Markov chains), used with the RIPPLE algorithm for error verification, in the construction of genetic linkage maps. For doing so, a linkage map of a hypothetical diploid and monoecious plant species was simulated, containing 21 markers with fixed distance of 3 centimorgans between them. Using Monte Carlo methods, 550 F2 populations were randomly simulated with 100 and 400 individuals, together with different combinations of dominant and codominant markers. 10 % and 20 % of missing data was also included. Results showed that the algorithms TRY and SER gave good results in all situations, even with presence of a large number of missing data and dominant markers linked in repulsion phase. Thus, these can be recommended for analyzing real data. The algorithms RECORD and UG gave good results in the absence of dominant markers linked in repulsion phase and can be used in this case. Among all algorithms, RCD was the least efficient. The criterion LHMC, applied with the RIPPLE algorithm, showed the best results when the goal is to check ordering errors. Algoritmos Cadeias de Markov Mapeamento genético Marcador Molecular. Hidden Markov Chain Molecular Marker. Monte Carlo Multipoint estimates
210	A Note on Perfect Slice Sampling Hörmann, Wolfgang, Leydold, Josef January 2006 (has links) (PDF) Perfect slice sampling is a method to turn Markov Chain Monte Carlo (MCMC) samplers into exact generators for independent random variates. We show that the simplest version of the perfect slice sampler suggested in the literature does not always sample from the target distribution. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics

Search results