Global ETD Search

131	A unified discrepancy-based approach for balancing efficiency and robustness in state-space modeling estimation, selection, and diagnosis Hu, Nan 01 December 2016 (has links) Due to its generality and flexibility, the state-space model has become one of the most popular models in modern time domain analysis for the description and prediction of time series data. The model is often used to characterize processes that can be conceptualized as "signal plus noise," where the realized series is viewed as the manifestation of a latent signal that has been corrupted by observation noise. In the state-space framework, parameter estimation is generally accomplished by maximizing the innovations Gaussian log-likelihood. The maximum likelihood estimator (MLE) is efficient when the normality assumption is satisfied. However, in the presence of contamination, the MLE suffers from a lack of robustness. Basu, Harris, Hjort, and Jones (1998) introduced a discrepancy measure (BHHJ) with a non-negative tuning parameter that regulates the trade-off between robustness and efficiency. In this manuscript, we propose a new parameter estimation procedure based on the BHHJ discrepancy for fitting state-space models. As the tuning parameter is increased, the estimation procedure becomes more robust but less efficient. We investigate the performance of the procedure in an illustrative simulation study. In addition, we propose a numerical method to approximate the asymptotic variance of the estimator, and we provide an approach for choosing an appropriate tuning parameter in practice. We justify these procedures theoretically and investigate their efficacy in simulation studies. Based on the proposed parameter estimation procedure, we then develop a new model selection criterion in the state-space framework. The traditional Akaike information criterion (AIC), where the goodness-of-fit is assessed by the empirical log-likelihood, is not robust to outliers. Our new criterion is comprised of a goodness-of-fit term based on the empirical BHHJ discrepancy, and a penalty term based on both the tuning parameter and the dimension of the candidate model. We present a comprehensive simulation study to investigate the performance of the new criterion. In instances where the time series data is contaminated, our proposed model selection criterion is shown to perform favorably relative to AIC. Lastly, using the BHHJ discrepancy based on the chosen tuning parameter, we propose two versions of an influence diagnostic in the state-space framework. Specifically, our diagnostics help to identify cases that influence the recovery of the latent signal, thereby providing initial guidance and insight for further exploration. We illustrate the behavior of these measures in a simulation study. discrepancy influence diagnostic robust model selection robust parameter estimation state-space model time series analysis Applied Mathematics
132	Statistical Properties of Preliminary Test Estimators Korsell, Nicklas January 2006 (has links) <p>This thesis investigates the statistical properties of preliminary test estimators of linear models with normally distributed errors. Specifically, we derive exact expressions for the mean, variance and quadratic risk (i.e. the Mean Square Error) of estimators whose form are determined by the outcome of a statistical test. In the process, some new results on the moments of truncated linear or quadratic forms in normal vectors are established.</p><p>In the first paper (Paper I), we consider the estimation of the vector of regression coefficients under a model selection procedure where it is assumed that the analyst chooses between two nested linear models by some of the standard model selection criteria. This is shown to be equivalent to estimation under a preliminary test of some linear restrictions on the vector of regression coefficients. The main contribution of Paper I compared to earlier research is the generality of the form of the test statistic; we only assume it to be a quadratic form in the (translated) observation vector. Paper II paper deals with the estimation of the regression coefficients under a preliminary test for homoscedasticity of the error variances. In Paper III, we investigate the statistical properties of estimators, truncated at zero, of variance components in linear models with random effects. Paper IV establishes some new results on the moments of truncated linear and/or quadratic forms in normally distributed vectors. These results are used in Papers I-III. In Paper V we study some algebraic properties of matrices that occur in the comparison of two nested models. Specifically we derive an expression for the inertia (the number of positive, negative and zero eigenvalues) of this type of matrices.</p> Statistics Linear regression Preliminary test Model selection Test for homoscedasticity Variance components Truncated estimators Inertia of matrices Statistik
133	Learning in wireless sensor networks for energy-efficient environmental monitoring/Apprentissage dans les réseaux de capteurs pour une surveillance environnementale moins coûteuse en énergie Le Borgne, Yann-Aël 30 April 2009 (has links) Wireless sensor networks form an emerging class of computing devices capable of observing the world with an unprecedented resolution, and promise to provide a revolutionary instrument for environmental monitoring. Such a network is composed of a collection of battery-operated wireless sensors, or sensor nodes, each of which is equipped with sensing, processing and wireless communication capabilities. Thanks to advances in microelectronics and wireless technologies, wireless sensors are small in size, and can be deployed at low cost over different kinds of environments in order to monitor both over space and time the variations of physical quantities such as temperature, humidity, light, or sound. In environmental monitoring studies, many applications are expected to run unattended for months or years. Sensor nodes are however constrained by limited resources, particularly in terms of energy. Since communication is one order of magnitude more energy-consuming than processing, the design of data collection schemes that limit the amount of transmitted data is therefore recognized as a central issue for wireless sensor networks. An efficient way to address this challenge is to approximate, by means of mathematical models, the evolution of the measurements taken by sensors over space and/or time. Indeed, whenever a mathematical model may be used in place of the true measurements, significant gains in communications may be obtained by only transmitting the parameters of the model instead of the set of real measurements. Since in most cases there is little or no a priori information about the variations taken by sensor measurements, the models must be identified in an automated manner. This calls for the use of machine learning techniques, which allow to model the variations of future measurements on the basis of past measurements. This thesis brings two main contributions to the use of learning techniques in a sensor network. First, we propose an approach which combines time series prediction and model selection for reducing the amount of communication. The rationale of this approach, called adaptive model selection, is to let the sensors determine in an automated manner a prediction model that does not only fits their measurements, but that also reduces the amount of transmitted data. The second main contribution is the design of a distributed approach for modeling sensed data, based on the principal component analysis (PCA). The proposed method allows to transform along a routing tree the measurements taken in such a way that (i) most of the variability in the measurements is retained, and (ii) the network load sustained by sensor nodes is reduced and more evenly distributed, which in turn extends the overall network lifetime. The framework can be seen as a truly distributed approach for the principal component analysis, and finds applications not only for approximated data collection tasks, but also for event detection or recognition tasks. / Les réseaux de capteurs sans fil forment une nouvelle famille de systèmes informatiques permettant d'observer le monde avec une résolution sans précédent. En particulier, ces systèmes promettent de révolutionner le domaine de l'étude environnementale. Un tel réseau est composé d'un ensemble de capteurs sans fil, ou unités sensorielles, capables de collecter, traiter, et transmettre de l'information. Grâce aux avancées dans les domaines de la microélectronique et des technologies sans fil, ces systèmes sont à la fois peu volumineux et peu coûteux. Ceci permet leurs deploiements dans différents types d'environnements, afin d'observer l'évolution dans le temps et l'espace de quantités physiques telles que la température, l'humidité, la lumière ou le son. Dans le domaine de l'étude environnementale, les systèmes de prise de mesures doivent souvent fonctionner de manière autonome pendant plusieurs mois ou plusieurs années. Les capteurs sans fil ont cependant des ressources limitées, particulièrement en terme d'énergie. Les communications radios étant d'un ordre de grandeur plus coûteuses en énergie que l'utilisation du processeur, la conception de méthodes de collecte de données limitant la transmission de données est devenue l'un des principaux défis soulevés par cette technologie. Ce défi peut être abordé de manière efficace par l'utilisation de modèles mathématiques modélisant l'évolution spatiotemporelle des mesures prises par les capteurs. En effet, si un tel modèle peut être utilisé à la place des mesures, d'importants gains en communications peuvent être obtenus en utilisant les paramètres du modèle comme substitut des mesures. Cependant, dans la majorité des cas, peu ou aucune information sur la nature des mesures prises par les capteurs ne sont disponibles, et donc aucun modèle ne peut être a priori défini. Dans ces cas, les techniques issues du domaine de l'apprentissage machine sont particulièrement appropriées. Ces techniques ont pour but de créer ces modèles de façon autonome, en anticipant les mesures à venir sur la base des mesures passées. Dans cette thèse, deux contributions sont principalement apportées permettant l'applica-tion de techniques d'apprentissage machine dans le domaine des réseaux de capteurs sans fil. Premièrement, nous proposons une approche qui combine la prédiction de série temporelle avec la sélection de modèles afin de réduire la communication. La logique de cette approche, appelée sélection de modèle adaptive, est de permettre aux unités sensorielles de determiner de manière autonome un modèle de prédiction qui anticipe correctement leurs mesures, tout en réduisant l'utilisation de leur radio. Deuxièmement, nous avons conçu une méthode permettant de modéliser de façon distribuée les mesures collectées, qui se base sur l'analyse en composantes principales (ACP). La méthode permet de transformer les mesures le long d'un arbre de routage, de façon à ce que (i) la majeure partie des variations dans les mesures des capteurs soient conservées, et (ii) la charge réseau soit réduite et mieux distribuée, ce qui permet d'augmenter également la durée de vie du réseau. L'approche proposée permet de véritablement distribuer l'ACP, et peut être utilisée pour des applications impliquant la collecte de données, mais également pour la détection ou la classification d'événements. Machine Learning/Apprentissage Machine Model Selection/Sélection de Modèles
134	Statistical Properties of Preliminary Test Estimators Korsell, Nicklas January 2006 (has links) This thesis investigates the statistical properties of preliminary test estimators of linear models with normally distributed errors. Specifically, we derive exact expressions for the mean, variance and quadratic risk (i.e. the Mean Square Error) of estimators whose form are determined by the outcome of a statistical test. In the process, some new results on the moments of truncated linear or quadratic forms in normal vectors are established. In the first paper (Paper I), we consider the estimation of the vector of regression coefficients under a model selection procedure where it is assumed that the analyst chooses between two nested linear models by some of the standard model selection criteria. This is shown to be equivalent to estimation under a preliminary test of some linear restrictions on the vector of regression coefficients. The main contribution of Paper I compared to earlier research is the generality of the form of the test statistic; we only assume it to be a quadratic form in the (translated) observation vector. Paper II paper deals with the estimation of the regression coefficients under a preliminary test for homoscedasticity of the error variances. In Paper III, we investigate the statistical properties of estimators, truncated at zero, of variance components in linear models with random effects. Paper IV establishes some new results on the moments of truncated linear and/or quadratic forms in normally distributed vectors. These results are used in Papers I-III. In Paper V we study some algebraic properties of matrices that occur in the comparison of two nested models. Specifically we derive an expression for the inertia (the number of positive, negative and zero eigenvalues) of this type of matrices. Statistics Linear regression Preliminary test Model selection Test for homoscedasticity Variance components Truncated estimators Inertia of matrices Statistik
135	Bayesian Phylogenetics and the Evolution of Gall Wasps Nylander, Johan A. A. January 2004 (has links) This thesis concerns the phylogenetic relationships and the evolution of the gall-inducing wasps belonging to the family Cynipidae. Several previous studies have used morphological data to reconstruct the evolution of the family. DNA sequences from several mitochondrial and nuclear genes where obtained and the first molecular, and combined molecular and morphological, analyses of higher-level relationships in the Cynipidae is presented. A Bayesian approach to data analysis is adopted, and models allowing combined analysis of heterogeneous data, such as multiple DNA data sets and morphology, are developed. The performance of these models is evaluated using methods that allow the estimation of posterior model probabilities, thus allowing selection of most probable models for the use in phylogenetics. The use of Bayesian model averaging in phylogenetics, as opposed to model selection, is also discussed. It is shown that Bayesian MCMC analysis deals efficiently with complex models and that morphology can influence combined-data analyses, despite being outnumbered by DNA data. This emphasizes the utility and potential importance of using morphological data in statistical analyses of phylogeny. The DNA-based and combined-data analyses of cynipid relationships differ from previous studies in two important respects. First, it was previously believed that there was a monophyletic clade of woody rosid gallers but the new results place the non-oak gallers in this assemblage (tribes Pediaspidini, Diplolepidini, and Eschatocerini) outside the rest of the Cynipidae. Second, earlier studies have lent strong support to the monophyly of the inquilines (tribe Synergini), gall wasps that develop inside the galls of other species. The new analyses suggest that the inquilines either originated several times independently, or that some inquilines secondarily regained the ability to induce galls. Possible reasons for the incongruence between morphological and DNA data is discussed in terms of heterogeneity in evolutionary rates among lineages, and convergent evolution of morphological characters. Biology Bayesian analysis Bayes factors Cynipidae gall wasps MCMC model averaging model selection phylogeny total evidence Biologi Biology Biologi
136	An Algorithm For The Forward Step Of Adaptive Regression Splines Via Mapping Approach Kartal Koc, Elcin 01 September 2012 (has links) (PDF) In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a well-known nonparametric regression technique to approximate the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear basis functions which are separated from each other with breaking points (knots) for function estimation. The model estimating function is generated in two stepwise procedures: forward selection and backward elimination. In the first step, a general model including too many basis functions so the knot points are generated / and in the second one, the least contributing basis functions to the overall fit are eliminated. In the conventional adaptive spline procedure, knots are selected from a set of distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid these drawbacks, it is possible to select the knot points from a subset of data points, which leads to data reduction. In this study, a new method (called S-FMARS) is proposed to select the knot points by using a self organizing map-based approach which transforms the original data points to a lower dimensional space. Thus, less number of knot points is enabled to be evaluated for model building in the forward selection of MARS algorithm. The results obtained from simulated datasets and of six real-world datasets show that the proposed method is time efficient in model construction without degrading the model accuracy and prediction performance. In this study, the proposed approach is implemented to MARS and CMARS methods as an alternative to their forward step to improve them by decreasing their computing time QA General 15707
137	Systematic process development by simultaneous modeling and optimization of simulated moving bed chromatography Bentley, Jason A. 10 January 2013 (has links) Adsorption separation processes are extremely important to the chemical industry, especially in the manufacturing of food, pharmaceutical, and fine chemical products. This work addresses three main topics: first, systematic decision-making between rival gas phase adsorption processes for the same separation problem; second, process development for liquid phase simulated moving bed chromatography (SMB); third, accelerated startup for SMB units. All of the work in this thesis uses model-based optimization to answer complicated questions about process selection, process development, and control of transient operation. It is shown in this thesis that there is a trade-off between productivity and product recovery in the gaseous separation of enantiomers using SMB and pressure swing adsorption (PSA). These processes are considered as rivals for the same separation problem and it is found that each process has a particular advantage that may be exploited depending on the production goals and economics. The processes are compared on a fair basis of equal capitol investment and the same multi-objective optimization problem is solved with equal constraints on the operating parameters. Secondly, this thesis demonstrates by experiment a systematic algorithm for SMB process development that utilizes dynamic optimization, transient experimental data, and parameter estimation to arrive at optimal operating conditions for a new separation problem in a matter of hours. Comparatively, the conventional process development for SMB relies on careful system characterization using single-column experiments, and manual tuning of operating parameters, that may take days and weeks. The optimal operating conditions that are found by this new method ensure both high purity constraints and optimal productivity are satisfied. The proposed algorithm proceeds until the SMB process is optimized without manual tuning. In some case studies, it is shown with both linear and nonlinear isotherm systems that the optimal performance can be reached in only two changes of operating conditions following the proposed algorithm. Finally, it is shown experimentally that the startup time for a real SMB unit is significantly reduced by solving model-based startup optimization problems using the SMB model developed from the proposed algorithm. The startup acceleration with purity constraints is shown to be successful at reducing the startup time by about 44%, and it is confirmed that the product purities are maintained during the operation. Significant cost savings in terms of decreased processing time and increased average product concentration can be attained using a relatively simple startup acceleration strategy. Simulated moving bed chromatography Adsorption processes Model selection Parameter estimation Numerical optimization Transient process control Adsorption Separation (Technology) Chromatographic analysis
138	Exponential Smoothing for Forecasting and Bayesian Validation of Computer Models Wang, Shuchun 22 August 2006 (has links) Despite their success and widespread usage in industry and business, ES methods have received little attention from the statistical community. We investigate three types of statistical models that have been found to underpin ES methods. They are ARIMA models, state space models with multiple sources of error (MSOE), and state space models with a single source of error (SSOE). We establish the relationship among the three classes of models and conclude that the class of SSOE state space models is broader than the other two and provides a formal statistical foundation for ES methods. To better understand ES methods, we investigate the behaviors of ES methods for time series generated from different processes. We mainly focus on time series of ARIMA type. ES methods forecast a time series using only the series own history. To include covariates into ES methods for better forecasting a time series, we propose a new forecasting method, Exponential Smoothing with Covariates (ESCov). ESCov uses an ES method to model what left unexplained in a time series by covariates. We establish the optimality of ESCov, identify SSOE state space models underlying ESCov, and derive analytically the variances of forecasts by ESCov. Empirical studies show that ESCov outperforms ES methods and regression with ARIMA errors. We suggest a model selection procedure for choosing appropriate covariates and ES methods in practice. Computer models have been commonly used to investigate complex systems for which physical experiments are highly expensive or very time-consuming. Before using a computer model, we need to address an important question ``How well does the computer model represent the real system?" The process of addressing this question is called computer model validation that generally involves the comparison of computer outputs and physical observations. In this thesis, we propose a Bayesian approach to computer model validation. This approach integrates together computer outputs and physical observation to give a better prediction of the real system output. This prediction is then used to validate the computer model. We investigate the impacts of several factors on the performance of the proposed approach and propose a generalization to the proposed approach. State space model Prediction intervals ARIMA models Model selection Maximum likelihood estimation Gaussian process Smoothing (Statistics) Computer simulation
139	Population SAMC, ChIP-chip Data Analysis and Beyond Wu, Mingqi 2010 December 1900 (has links) This dissertation research consists of two topics, population stochastics approximation Monte Carlo (Pop-SAMC) for Baysian model selection problems and ChIP-chip data analysis. The following two paragraphs give a brief introduction to each of the two topics, respectively. Although the reversible jump MCMC (RJMCMC) has the ability to traverse the space of possible models in Bayesian model selection problems, it is prone to becoming trapped into local mode, when the model space is complex. SAMC, proposed by Liang, Liu and Carroll, essentially overcomes the difficulty in dimension-jumping moves, by introducing a self-adjusting mechanism. However, this learning mechanism has not yet reached its maximum efficiency. In this dissertation, we propose a Pop-SAMC algorithm; it works on population chains of SAMC, which can provide a more efficient self-adjusting mechanism and make use of crossover operator from genetic algorithms to further increase its efficiency. Under mild conditions, the convergence of this algorithm is proved. The effectiveness of Pop-SAMC in Bayesian model selection problems is examined through a change-point identification example and a large-p linear regression variable selection example. The numerical results indicate that Pop- SAMC outperforms both the single chain SAMC and RJMCMC significantly. In the ChIP-chip data analysis study, we developed two methodologies to identify the transcription factor binding sites: Bayesian latent model and population-based test. The former models the neighboring dependence of probes by introducing a latent indicator vector; The later provides a nonparametric method for evaluation of test scores in a multiple hypothesis test by making use of population information of samples. Both methods are applied to real and simulated datasets. The numerical results indicate the Bayesian latent model can outperform the existing methods, especially when the data contain outliers, and the use of population information can significantly improve the power of multiple hypothesis tests. Markov Chain Monte Carlo Stochastic Approximation Metropolis-Hastings Algorithm Bayesian Model Selection ChIP-chip Latent Variable Multiple Hypothesis Test
140	Model Selection for Real-Time Decision Support Systems Lee, Ching-Chang 29 July 2002 (has links) In order to cope with the turbulent environments in digital age, an enterprise should response to the changes quickly. Therefore, an enterprise must improve her ability of real-time decision-making. One way to increase the competence of real-time decision-making is to use Real-Time Decision Support Systems (RTDSS). A key feature for a Decision Support Systems (DSS) to successfully support real-time decision-making is to help decision-makers selecting the best models within deadline. This study focuses on developing methods to support the mechanism of model selection in DSS. There are five results in this study. Firstly, we have developed a time-based framework to evaluate models. This framework can help decision-makers to evaluate the quality and cost of model solutions. Secondly, based on the framework of models evaluation, we also developed three models selection strategies. These strategies can help decision-makers to select the best model within deadline. Thirdly, according the definitions of parameter value precision and model solution precision in this study, we conduct a simulation analysis to understand the impacts of the precision of parameter values to the precision of a model solution. Fourthly, in order to understand the interaction among the model selection variables, we also simulate the application of model selection strategies. The results of simulation indicate our study can support models selection well. Finally, we developed a structure-based model retrieval method to help decision-makers find alternative models from model base efficiently and effectively. In conclusion, the results of this research have drawn a basic skeleton for the development of models selection. This research also reveals much insight into the development of real-time decision support systems. Simulation Analysis Real-Time Model Selection Model Management Systems Real-Time Decision Support Systems Decision Support Systems Decision Model

Search results