261 |
O uso de quase U-estatísticas para séries temporais uni e multivaridas / The use of quasi U-statistics for univariate and multivariate time seriesValk, Marcio 17 August 2018 (has links)
Orientador: Aluísio de Souza Pinheiro / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatítica e Computação Científica / Made available in DSpace on 2018-08-17T14:57:09Z (GMT). No. of bitstreams: 1
Valk_Marcio_D.pdf: 2306844 bytes, checksum: 31162915c290291a91806cdc6f69f697 (MD5)
Previous issue date: 2011 / Resumo: Classificação e agrupamento de séries temporais são problemas bastante explorados na literatura atual. Muitas técnicas são apresentadas para resolver estes problemas. No entanto, as restrições necessárias, em geral, tornam os procedimentos específicos e aplicáveis somente a uma determinada classe de séries temporais. Além disso, muitas dessas abordagens são empíricas. Neste trabalho, propomos métodos para classificação e agrupamento de séries temporais baseados em quase U-estatísticas(Pinheiro et al. (2009) e Pinheiro et al. (2010)). Como núcleos das U-estatísticas são utilizadas métricas baseadas em ferramentas bem conhecidas na literatura de séries temporais, entre as quais o periodograma e a autocorrelação amostral. Três situações principais são consideradas: séries univariadas; séries multivariadas; e séries com valores aberrantes. _E demonstrada a normalidade assintética dos testes propostos para uma ampla classe de métricas e modelos. Os métodos são estudados também por simulação e ilustrados por aplicação em dados reais. / Abstract: Classifcation and clustering of time series are problems widely explored in the current literature. Many techniques are presented to solve these problems. However, the necessary restrictions in general, make the procedures specific and applicable only to a certain class of time series. Moreover, many of these approaches are empirical. We present methods for classi_cation and clustering of time series based on Quasi U-statistics (Pinheiro et al. (2009) and Pinheiro et al. (2010)). As kernel of U-statistics are used metrics based on tools well known in the literature of time series, including the sample autocorrelation and periodogram. Three main situations are considered: univariate time series, multivariate time series, and time series with outliers. It is demonstrated the asymptotic normality of the proposed tests for a wide class of metrics and models. The methods are also studied by simulation and applied in a real data set. / Doutorado / Estatistica / Doutor em Estatística
|
262 |
Détection et classification de signatures temporelles CAN pour l’aide à la maintenance de sous-systèmes d’un véhicule de transport collectif / Detection and classification of temporal CAN signatures to support maintenance of public transportation vehicle subsystemsCheifetz, Nicolas 09 September 2013 (has links)
Le problème étudié dans le cadre de cette thèse porte essentiellement sur l'étape de détection de défaut dans un processus de diagnostic industriel. Ces travaux sont motivés par la surveillance de deux sous-systèmes complexes d'un autobus impactant la disponibilité des véhicules et leurs coûts de maintenance : le système de freinage et celui des portes. Cette thèse décrit plusieurs outils dédiés au suivi de fonctionnement de ces deux systèmes. On choisit une approche de diagnostic par reconnaissance des formes qui s'appuie sur l'analyse de données collectées en exploitation à partir d'une nouvelle architecture télématique embarquée dans les autobus. Les méthodes proposées dans ces travaux de thèse permettent de détecter un changement structurel dans un flux de données traité séquentiellement, et intègrent des connaissances disponibles sur les systèmes surveillés. Le détecteur appliqué aux freins s'appuie sur les variables de sortie (liées au freinage) d'un modèle physique dynamique du véhicule qui est validé expérimentalement dans le cadre de nos travaux. L'étape de détection est ensuite réalisée par des cartes de contrôle multivariées à partir de données multidimensionnelles. La stratégie de détection pour l'étude du système porte traite directement les données collectées par des capteurs embarqués pendant des cycles d'ouverture et de fermeture, sans modèle physique a priori. On propose un test séquentiel à base d'hypothèses alimenté par un modèle génératif pour représenter les données fonctionnelles. Ce modèle de régression permet de segmenter des courbes multidimensionnelles en plusieurs régimes. Les paramètres de ce modèle sont estimés par un algorithme de type EM dans un mode semi-supervisé. Les résultats obtenus à partir de données réelles et simulées ont permis de mettre en évidence l'efficacité des méthodes proposées aussi bien pour l'étude des freins que celle des portes / This thesis is mainly dedicated to the fault detection step occurring in a process of industrial diagnosis. This work is motivated by the monitoring of two complex subsystems of a transit bus, which impact the availability of vehicles and their maintenance costs: the brake and the door systems. This thesis describes several tools that monitor operating actions of these systems. We choose a pattern recognition approach based on the analysis of data collected from a new IT architecture on-board the buses. The proposed methods allow to detect sequentially a structural change in a datastream, and take advantage of prior knowledge of the monitored systems. The detector applied to the brakes is based on the output variables (related to the brake system) from a physical dynamic modeling of the vehicle which is experimentally validated in this work. The detection step is then performed by multivariate control charts from multidimensional data. The detection strategy dedicated to doors deals with data collected by embedded sensors during opening and closing cycles, with no need for a physical model. We propose a sequential testing approach using a generative model to describe the functional data. This regression model allows to segment multidimensional curves in several regimes. The model parameters are estimated via a specific EM algorithm in a semi-supervised mode. The results obtained from simulated and real data allow to highlight the effectiveness of the proposed methods on both the study of brakes and doors
|
263 |
Estimation paramétriques et tests d'hypothèses pour des modèles avec plusieurs ruptures d'un processus de poisson / Parametric estimation and hypothesis testing for models with multiple change-point of poisson processTop, Alioune 20 June 2016 (has links)
Ce travail est consacré aux problèmes d’estimation paramétriques, aux tests d’hypothèses et aux tests d’ajustement pour les processus de Poisson non homogènes.Tout d’abord on a étudié deux modèles ayant chacun deux sauts localisés par un paramètre inconnu. Pour le premier modèle la somme des sauts est positive. Tandis que le second a un changement de régime et constant par morceaux. La somme de ses deux sauts est nulle. Ainsi pour chacun de ces modèles nous avons étudié les propriétés asymptotiques de l’estimateur bayésien (EB) et celui du maximum de vraisemblance(EMV). Nous avons montré la consistance, la convergence en distribution et la convergence des moments. En particulier l’estimateur bayésien est asymptotiquement efficace. Pour le second modèle nous avons aussi considéré le test d’une hypothèse simple contre une alternative unilatérale et nous avons décrit les propriétés asymptotiques (choix du seuil et puissance ) du test de Wald (WT)et du test du rapport de vraisemblance généralisé (GRLT).Les démonstrations sont basées sur la méthode d’Ibragimov et Khasminskii. Cette dernière repose sur la convergence faible du rapport de vraisemblance normalisé dans l’espace de Skorohod sous certains critères de tension des familles demesure correspondantes.Par des simulations numériques, les variances limites nous ont permis de conclure que l’EB est meilleur que celui du EMV. Lorsque la somme des sauts est nulle, nous avons développé une approche numérique pour le EMV.Ensuite on a considéré le problème de construction d’un test d’ajustement pour un modèle avec un paramètre d’échelle. On a montré que dans ce cas, le test de Cramer-von Mises est asymptotiquement ”parameter-free” et est consistent. / This work is devoted to the parametric estimation, hypothesis testing and goodnessof-fit test problems for non homogenous Poisson processes. First we consider two models having two jumps located by an unknown parameter.For the first model the sum of jumps is positive. The second is a model of switching intensity, piecewise constant and the sum of jumps is zero. Thus, for each model, we studied the asymptotic properties of the Bayesian estimator (BE) andthe likelihood estimator (MLE). The consistency, the convergence in distribution and the convergence of moments are shown. In particular we show that the BE is asymptotically efficient. For the second model we also consider the problem of asimple hypothesis testing against a one- sided alternative. The asymptotic properties (choice of the threshold and power) of Wald test (WT) and the generalized likelihood ratio test (GRLT) are described.For the proofs we use the method of Ibragimov and Khasminskii. This method is based on the weak convergence of the normalized likelihood ratio in the Skorohod space under some tightness criterion of the corresponding families of measure.By numerical simulations, the limiting variances of estimators allows us to conclude that the BE outperforms the MLE. In the situation where the sum of jumps is zero, we developed a numerical approach to obtain the MLE.Then we consider the problem of construction of goodness-of-test for a model with scale parameter. We show that the Cram´er-von Mises type test is asymptotically parameter-free. It is also consistent.
|
264 |
Neuronal Dissimilarity Indices that Predict Oddball Detection in BehaviourVaidhiyan, Nidhin Koshy January 2016 (has links) (PDF)
Our vision is as yet unsurpassed by machines because of the sophisticated representations of objects in our brains. This representation is vastly different from a pixel-based representation used in machine storages. It is this sophisticated representation that enables us to perceive two faces as very different, i.e, they are far apart in the “perceptual space”, even though they are close to each other in their pixel-based representations. Neuroscientists have proposed distances between responses of neurons to the images (as measured in macaque monkeys) as a quantification of the “perceptual distance” between the images. Let us call these neuronal dissimilarity indices of perceptual distances. They have also proposed behavioural experiments to quantify these perceptual distances. Human subjects are asked to identify, as quickly as possible, an oddball image embedded among multiple distractor images. The reciprocal of the search times for identifying the oddball is taken as a measure of perceptual distance between the oddball and the distractor. Let us call such estimates as behavioural dissimilarity indices. In this thesis, we describe a decision-theoretic model for visual search that suggests a connection between these two notions of perceptual distances.
In the first part of the thesis, we model visual search as an active sequential hypothesis testing problem. Our analysis suggests an appropriate neuronal dissimilarity index which correlates strongly with the reciprocal of search times. We also consider a number of alternative possibilities such as relative entropy (Kullback-Leibler divergence), the Chernoff entropy and the L1-distance associated with the neuronal firing rate profiles. We then come up with a means to rank the various neuronal dissimilarity indices based on how well they explain the behavioural observations. Our proposed dissimilarity index does better than the other three, followed by relative entropy, then Chernoff entropy and then L1 distance.
In the second part of the thesis, we consider a scenario where the subject has to find an oddball image, but without any prior knowledge of the oddball and distractor images. Equivalently, in the neuronal space, the task for the decision maker is to find the image that elicits firing rates different from the others. Here, the decision maker has to “learn” the underlying statistics and then make a decision on the oddball. We model this scenario as one of detecting an odd Poisson point process having a rate different from the common rate of the others. The revised model suggests a new neuronal dissimilarity index. The new dissimilarity index is also strongly correlated with the behavioural data. However, the new dissimilarity index performs worse than the dissimilarity index proposed in the first part on existing behavioural data. The degradation in performance may be attributed to the experimental setup used for the current behavioural tasks, where search tasks associated with a given image pair were sequenced one after another, thereby possibly cueing the subject about the upcoming image pair, and thus violating the assumption of this part on the lack of prior knowledge of the image pairs to the decision maker.
In conclusion, the thesis provides a framework for connecting the perceptual distances in the neuronal and the behavioural spaces. Our framework can possibly be used to analyze the connection between the neuronal space and the behavioural space for various other behavioural tasks.
|
265 |
"Testes de hipótese e critério bayesiano de seleção de modelos para séries temporais com raiz unitária" / "Hypothesis testing and bayesian model selection for time series with a unit root"Ricardo Gonçalves da Silva 23 June 2004 (has links)
A literatura referente a testes de hipótese em modelos auto-regressivos que apresentam uma possível raiz unitária é bastante vasta e engloba pesquisas oriundas de diversas áreas. Nesta dissertação, inicialmente, buscou-se realizar uma revisão dos principais resultados existentes, oriundos tanto da visão clássica quanto da bayesiana de inferência. No que concerne ao ferramental clássico, o papel do movimento browniano foi apresentado de forma detalhada, buscando-se enfatizar a sua aplicabilidade na dedução de estatísticas assintóticas para a realização dos testes de hipótese relativos à presença de uma raíz unitária. Com relação à inferência bayesiana, foi inicialmente conduzido um exame detalhado do status corrente da literatura. A seguir, foi realizado um estudo comparativo em que se testa a hipótese de raiz unitária com base na probabilidade da densidade a posteriori do parâmetro do modelo, considerando as seguintes densidades a priori: Flat, Jeffreys, Normal e Beta. A inferência foi realizada com base no algoritmo Metropolis-Hastings, usando a técnica de simulação de Monte Carlo por Cadeias de Markov (MCMC). Poder, tamanho e confiança dos testes apresentados foram computados com o uso de séries simuladas. Finalmente, foi proposto um critério bayesiano de seleção de modelos, utilizando as mesmas distribuições a priori do teste de hipótese. Ambos os procedimentos foram ilustrados com aplicações empíricas à séries temporais macroeconômicas. / Testing for unit root hypothesis in non stationary autoregressive models has been a research topic disseminated along many academic areas. As a first step for approaching this issue, this dissertation includes an extensive review highlighting the main results provided by Classical and Bayesian inferences methods. Concerning Classical approach, the role of brownian motion is discussed in a very detailed way, clearly emphasizing its application for obtaining good asymptotic statistics when we are testing for the existence of a unit root in a time series. Alternatively, for Bayesian approach, a detailed discussion is also introduced in the main text. Then, exploring an empirical façade of this dissertation, we implemented a comparative study for testing unit root based on a posteriori model's parameter density probability, taking into account the following a priori densities: Flat, Jeffreys, Normal and Beta. The inference is based on the Metropolis-Hastings algorithm and on the Monte Carlo Markov Chains (MCMC) technique. Simulated time series are used for calculating size, power and confidence intervals for the developed unit root hypothesis test. Finally, we proposed a Bayesian criterion for selecting models based on the same a priori distributions used for developing the same hypothesis tests. Obviously, both procedures are empirically illustrated through application to macroeconomic time series.
|
266 |
Metoda bootstrap a její aplikace / Bootstrap Method and its ApplicationPavlíčková, Lucie January 2009 (has links)
The diploma thesis describes the bootstrap method and its applications in the estimate accuracy statement, in the confidence intervals generation and in the testing of statistical hypotheses. Further the method of the discrete probability estimation of the categorical quantity is presented, making use the gradient of the quasi-norm hereof distribution. On concrete examples the bootstrap method is applied in the confidence intervals forming of the categorical quantity probability function. The diploma thesis was supported by the project of MŠMT of the Czech Republic no. 1M06047 "Centre for Quality and Reliability of Production", by the grant of Grant Agency of the Czech Republic (Czech Science Foundation) reg. no. 103/08/1658 "Advanced optimum design of composed concrete structures" and by the research plan of MŠMT of the Czech Republic no. MSM0021630519 "Progressive reliable and durable structures".
|
267 |
Fitování rozdělení pravděpodobnosti pro aplikace / Fitting of Probability Distributions for ApplicationsPavlíčková, Lenka January 2012 (has links)
The diploma thesis describes the bootstrap method and its applications in the confidence intervals generation, in the testing of statistical hypotheses and in the regression analysis. We present the confidence interval for individual value. Further the method of discrete probability estimation of the categorical quantity is presented, making use the gradient and the line estimate.
|
268 |
Uplatnění statistických metod při zpracování dat / The Use of Statistical Methods for Data ProcessingČupr, Jiří January 2016 (has links)
This master's thesis is focused on problem of orders of ingredients in McDonald's. It's an analysis of usage changes depending on outside temperature. Thesis includes theoretical background for correct analysis of the problem and possibilities to figuring it out. There is also an algorithmus for more efficient solution of problem with needs or excess of ingredients. There is also a program written in VBA language, that makes more simple usage of this algorithm on restaurants.
|
269 |
Bringing methodological light to ecological processes : are ecological scales and constrained null models relevant solutions? / Apporter une lumière méthodologique aux processus écologiques : les échelles écologiques et les modèles nuls contraints sont-ils des solutions pertinentes?Clappe, Sylvie 14 December 2018 (has links)
Les distributions d'espèces observées dans un environnement hétérogène résultent de plusieurs processus déterministes et stochastiques agissant comme des filtres pour contraindre la coexistence des espèces. L’action successive de ces processus a pour conséquence directe de structurer spatialement la composition des communautés et la variation de ces compositions (i.e., diversité bêta). Un des objectifs majeurs de l'écologie des communautés et métacommunautés consiste à identifier et quantifier les effets respectifs de ces différents processus sur la diversité bêta des communautés afin de mieux comprendre et prédire la distribution de la biodiversité. L'expérimentation étant difficilement possible, les processus responsables de la variation spatiale de la composition des communautés sont généralement inférés à partir des structures spatiales des distributions d’espèces observées dans la nature. La thèse s’inscrit dans ce contexte et vise à améliorer les outils de statistique multivariée permettant d’identifier et quantifier l'effet des processus écologiques structurant les communautés et métacommunautés. En particulier, il est proposé d’intégrer les échelles écologiques et les modèles nuls contraints pour étudier l’effet de l’environnement. La décomposition des relations trait-environnement dans les échelles spatiales et phylogénétiques permet une étude plus approfondie du filtrage environnemental en associant son échelle spatiale d’action au signal phylogénétique des traits sélectionnés pour capturer l’histoire évolutive associée au filtrage environnemental. L’interprétation en terme de processus évolutifs est néanmoins limitée et mériterait l’intégration de modèles nuls phylogénétiquement contraints pour une analyse plus fine. Dans la continuité, des modèles nuls spatialement contrains ont été développés et intégrés à deux analyses multivariées très largement utilisées en écologie des communautés (i.e., partitionnement de variation et test de Mantel) pour estimer et tester l’effet de l’environnement sur les assemblages d’espèces. Ces deux analyses présentaient une surestimation de leur statistique mesurée ainsi qu’un taux anormal de faux positifs lorsque les distributions d’espèces (via processus de dispersion limitée) et l’environnement étaient indépendamment spatialement structurés. L’intégration de modèles nuls spatialement contraints a permis d’ajuster à la fois les estimations et les tests de ces deux analyses illustrant ainsi le besoin d’utiliser des modèles nuls écologiquement contraints pour une identification et quantification correctes des processus écologiques / Species distributions observed in an heterogeneous environment result from multiple deterministic and stochastic processes acting as filters to constrain species co-existence. As a direct consequence, the successive actions of these processes spatially structure communities composition and the variation of these compositions (i.e., beta-diversity). One of the major objective in community and metacommunity ecology is to identify and quantify the respective effects of these different processes on communities beta-diversity to better understand and predict the distribution of biodiversity. Experiments being hardly possible, processes responsible for the spatial variation of communities composition are generally inferred from spatial patterns of species distributions observed in nature. In this context, the thesis aims at improving multivariate statistical tools conducted to identify and quantify the effects of ecological processes shaping communities and metacommunities. In particular, this thesis proposes to integrate ecological scales and constrained null models to study the effect of environment.Decomposing trait-environment relationships through spatial and phylogenetic scales allows to further study environmental filtering. The association of spatial scales involved in environmental filtering with the phylogenetic signals of traits allowed to capture the evolutive history related to environmental filtering. The interpretation in terms of evolutive processes is however limited and phylogenetically-constrained null models should be considered to improve the analysis. Following on from this work, spatially-constrained null models were developed and integrated into two multivariate analyses widely used in community ecology (i.e., variation partitioning and Mantel tests) to estimate and test the effect of environmental filtering on species assemblages. Both approaches presented overestimation of their computed statistic as well as high rates of false positive when species distributions (via limited dispersal) and environmental conditions were independently spatially structured. Integrating spatially-constrained null models allowed to adjust both their tests and the values of their statistic, as such demonstrating the need of using ecologically-constrained null models to correctly identify and quantify ecological processes.For future works, the thesis suggests that adopting a scaling approach to study ecological processes in addition to mechanistic null models could offer the possibility to distinguish processes from one another
|
270 |
Algorithmic and Graph-Theoretic Approaches for Optimal Sensor Selection in Large-Scale SystemsLintao Ye (9741149) 15 December 2020 (has links)
<div>Using sensor measurements to estimate the states and parameters of a system is a fundamental task in understanding the behavior of the system. Moreover, as modern systems grow rapidly in scale and complexity, it is not always possible to deploy sensors to measure all of the states and parameters of the system, due to cost and physical constraints. Therefore, selecting an optimal subset of all the candidate sensors to deploy and gather measurements of the system is an important and challenging problem. In addition, the systems may be targeted by external attackers who attempt to remove or destroy the deployed sensors. This further motivates the formulation of resilient sensor selection strategies. In this thesis, we address the sensor selection problem under different settings as follows. </div><div><br></div><div>First, we consider the optimal sensor selection problem for linear dynamical systems with stochastic inputs, where the Kalman filter is applied based on the sensor measurements to give an estimate of the system states. The goal is to select a subset of sensors under certain budget constraints such that the trace of the steady-state error covariance of the Kalman filter with the selected sensors is minimized. We characterize the complexity of this problem by showing that the Kalman filtering sensor selection problem is NP-hard and cannot be approximated within any constant factor in polynomial time for general systems. We then consider the optimal sensor attack problem for Kalman filtering. The Kalman filtering sensor attack problem is to attack a subset of selected sensors under certain budget constraints in order to maximize the trace of the steady-state error covariance of the Kalman filter with sensors after the attack. We show that the same results as the Kalman filtering sensor selection problem also hold for the Kalman filtering sensor attack problem. Having shown that the general sensor selection and sensor attack problems for Kalman filtering are hard to solve, our next step is to consider special classes of the general problems. Specifically, we consider the underlying directed network corresponding to a linear dynamical system and investigate the case when there is a single node of the network that is affected by a stochastic input. In this setting, we show that the corresponding sensor selection and sensor attack problems for Kalman filtering can be solved in polynomial time. We further study the resilient sensor selection problem for Kalman filtering, where the problem is to find a sensor selection strategy under sensor selection budget constraints such that the trace of the steady-state error covariance of the Kalman filter is minimized after an adversary removes some of the deployed sensors. We show that the resilient sensor selection problem for Kalman filtering is NP-hard, and provide a pseudo-polynomial-time algorithm to solve it optimally.</div><div> </div><div> Next, we consider the sensor selection problem for binary hypothesis testing. The problem is to select a subset of sensors under certain budget constraints such that a certain metric of the Neyman-Pearson (resp., Bayesian) detector corresponding to the selected sensors is optimized. We show that this problem is NP-hard if the objective is to minimize the miss probability (resp., error probability) of the Neyman-Pearson (resp., Bayesian) detector. We then consider three optimization objectives based on the Kullback-Leibler distance, J-Divergence and Bhattacharyya distance, respectively, in the hypothesis testing sensor selection problem, and provide performance bounds on greedy algorithms when applied to the sensor selection problem associated with these optimization objectives.</div><div> </div><div> Moving beyond the binary hypothesis setting, we also consider the setting where the true state of the world comes from a set that can have cardinality greater than two. A Bayesian approach is then used to learn the true state of the world based on the data streams provided by the data sources. We formulate the Bayesian learning data source selection problem under this setting, where the goal is to minimize the cost spent on the data sources such that the learning error is within a certain range. We show that the Bayesian learning data source selection is also NP-hard, and provide greedy algorithms with performance guarantees.</div><div> </div><div> Finally, in light of the COVID-19 pandemic, we study the parameter estimation measurement selection problem for epidemics spreading in networks. Here, the measurements (with certain costs) are collected by conducting virus and antibody tests on the individuals in the epidemic spread network. The goal of the problem is then to optimally estimate the parameters (i.e., the infection rate and the recovery rate of the virus) in the epidemic spread network, while satisfying the budget constraint on collecting the measurements. Again, we show that the measurement selection problem is NP-hard, and provide approximation algorithms with performance guarantees.</div>
|
Page generated in 0.1416 seconds