Spelling suggestions: "subject:"model selection"" "subject:"godel selection""
151 |
Hidden Markov model with application in cell adhesion experiment and Bayesian cubic splines in computer experimentsWang, Yijie Dylan 20 September 2013 (has links)
Estimation of the number of hidden states is challenging in hidden Markov models. Motivated by the analysis of a specific type of cell adhesion experiments, a new frame-work based
on hidden Markov model and double penalized order selection is proposed. The order selection procedure is shown to be consistent in estimating the number of states. A modified
Expectation-Maximization algorithm is introduced to efficiently estimate parameters in the model. Simulations show that the proposed framework outperforms existing methods. Applications of the proposed methodology to real data demonstrate the accuracy of estimating receptor-ligand bond lifetimes and waiting times which are essential in kinetic parameter estimation.
The second part of the thesis is concerned with prediction of a deterministic response function y at some untried sites given values of y at a chosen set of design sites. The intended application is to computer experiments in which y is the output from a computer simulation and
each design site represents a particular configuration of the input variables. A Bayesian version of
the cubic spline method commonly used in numerical analysis is proposed, in which the random
function that represents prior uncertainty about y is taken to be a specific stationary Gaussian
process. An MCMC procedure is given for updating the prior given the observed y values. Simulation examples and a real data application are given to compare the performance of the Bayesian cubic spline with that of two existing methods.
|
152 |
Mobile Services Based Traffic ModelingStrengbom, Kristoffer January 2015 (has links)
Traditionally, communication systems have been dominated by voice applications. Today with the emergence of smartphones, focus has shifted towards packet switched networks. The Internet provides a wide variety of services such as video streaming, web browsing, e-mail etc, and IP trac models are needed in all stages of product development, from early research to system tests. In this thesis, we propose a multi-level model of IP traffic where the user behavior and the actual IP traffic generated from different services are considered as being two independent random processes. The model is based on observations of IP packet header logs from live networks. In this way models can be updated to reflect the ever changing service and end user equipment usage. Thus, the work can be divided into two parts. The first part is concerned with modeling the traffic from different services. A subscriber is interested in enjoying the services provided on the Internet and traffic modeling should reflect the characteristics of these services. An underlying assumption is that different services generate their own characteristic pattern of data. The FFT is used to analyze the packet traces. We show that the traces contains strong periodicities and that some services are more or less deterministic. For some services this strong frequency content is due to the characteristics of cellular network and for other it is actually a programmed behavior of the service. The periodicities indicate that there are strong correlations between individual packets or bursts of packets. The second part is concerned with the user behavior, i.e. how the users access the different services in time. We propose a model based on a Markov renewal process and estimate the model parameters. In order to evaluate the model we compare it to two simpler models. We use model selection, using the model's ability to predict future observations as selection criterion. We show that the proposed Markov renewal model is the best of the three models in this sense. The model selection framework can be used to evaluate future models.
|
153 |
Model Selection via Minimum Description LengthLi, Li 10 January 2012 (has links)
The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level.
|
154 |
Model Selection via Minimum Description LengthLi, Li 10 January 2012 (has links)
The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level.
|
155 |
Spatio-temporal pattern discovery and hypothesis exploration using a delay reconstruction approachCampbell, Alexander B. January 2008 (has links)
This thesis investigates the computer-based modelling and simulation of complex geospatial phenomena. Geospatial systems are real world processes which extend over some meaningful extent of the Earth's surface, such as cities and fisheries. There are many problems that require urgent attention in this domain (for example relating to sustainability) but despite increasing amounts of data and computational power there is a significant gap between the potential for model-based analyses and their actual impact on real world policy and planning. Analytical methods are confounded by the high dimensionality and nonlinearity of spatio-temporal systems and/or are hard to relate to meaningful policy decisions. Simulation-based approaches on the other hand are more heuristic and policy oriented in nature, but they are difficult to validate and almost always over-fit the data: although a given model can be calibrated on a given set of data, it usually performs very poorly on new unseen data sets. The central contribution of this thesis is a framework which is formally grounded and able to be rigourously validated, yet at the same time is interpretable in terms of real world phenomena and thus has a strong connection to domain knowledge. The scope of the thesis spans both theory and practice, and three specific contributions range along this span. Starting at the theoretical end, the first contribution concerns the conceptual and theoretical basis of the framework, which is a technique known as delay reconstruction. The underlying theory is rooted in the rather technical field of dynamical systems (itself largely based on differential topology), which has hindered its wider application and the formation of strong links with other areas. Therefore, the first contribution is an exposition of delay reconstruction in non-technical language, with a focus on explaining how some recent extensions to this theory make the concept far more widely applicable than is often assumed. The second contribution uses this theoretical foundation to develop a practical, unified framework for pattern discovery and hypothesis exploration in geo-spatial data. The central aspect of this framework is the linking of delay reconstruction with domain knowledge. This is done via the notion that determinism is not an on-off quantity, but rather that a given data set may be ascribed a particular 'degree' of determinism, and that that degree may be increased through manipulation of the data set using domain knowledge. This leads to a framework which can handle spatiotemporally complex (including multi-scale) data sets, is sensitive to the amount of data that is available, and is naturally geared to be used interactively with qualitative feedback conveyed to the user via geometry. The framework is complementary to other techniques in that it forms a scaffold within which almost all modelling approaches - including agent-based modelling - can be cast as particular kinds of 'manipulations' of the data, and as such are easily integrated. The third contribution examines the practical efficacy of the framework in a real world case study. This involves a high resolution spatio-temporal record of fishcatch data from trawlers operating in a large fishery. The study is used to test two fundamental capabilities of the framework: (i) whether real world spatio-temporal phenomena can be identified in the degree-of-determinism signature of the data set, (ii) whether the determinism-level can then be increased by manipulating the data in response to this phenomena. One of the main outcomes of this study is a clear identification of the influence of the lunar cycle on the behaviour of Tiger and Endeavour prawns. The framework allows for this to be 'non-destructively subtracted', increasing the detect-ability of further phenomena.
|
156 |
Protein dynamics: a study of the model-free analysis of NMR relaxation datad'Auvergne, Edward J. Unknown Date (has links) (PDF)
The model-free analysis of NMR relaxation data, which is widely used for the study of protein dynamics, consists of the separation of the Brownian rotational diffusion from internal motions relative to the diffusion frame and the description of these internal motions by amplitude and timescale. Through parametric restriction and the addition of the Rex parameter a number of model-free models can be constructed. The model-free problem is often solved by initially estimating the diffusion tensor. The model-free models are then optimised and the best model is selected. Finally, the global model of all diffusion and model-free parameters is optimised. These steps are repeated until convergence. This thesis will investigate all aspects of the model-free data analysis chain. (For complete abstract open document)
|
157 |
Interfaces between Bayesian and Frequentist Multiplte TestingCHANG, SHIH-HAN January 2015 (has links)
<p>This thesis investigates frequentist properties of Bayesian multiple testing procedures in a variety of scenarios and depicts the asymptotic behaviors of Bayesian methods. Both Bayesian and frequentist approaches to multiplicity control are studied and compared, with special focus on understanding the multiplicity control behavior in situations of dependence between test statistics.</p><p>Chapter 2 examines a problem of testing mutually exclusive hypotheses with dependent data. The Bayesian approach is shown to have excellent frequentist properties and is argued to be the most effective way of obtaining frequentist multiplicity control without sacrificing power. Chapter 3 further generalizes the model such that multiple signals are acceptable, and depicts the asymptotic behavior of false positives rates and the expected number of false positives. Chapter 4 considers the problem of dealing with a sequence of different trials concerning some medical or scientific issue, and discusses the possibilities for multiplicity control of the sequence. Chapter 5 addresses issues and efforts in reconciling frequentist and Bayesian approaches in sequential endpoint testing. We consider the conditional frequentist approach in sequential endpoint testing and show several examples in which Bayesian and frequentist methodologies cannot be made to match.</p> / Dissertation
|
158 |
Essays in hierarchical time series forecasting and forecast combinationWeiss, Christoph January 2018 (has links)
This dissertation comprises of three original contributions to empirical forecasting research. Chapter 1 introduces the dissertation. Chapter 2 contributes to the literature on hierarchical time series (HTS) modelling by proposing a disaggregated forecasting system for both inflation rate and its volatility. Using monthly data that underlies the Retail Prices Index for the UK, we analyse the dynamics of the inflation process. We examine patterns in the time-varying covariation among product-level inflation rates that aggregate up to industry-level inflation rates that in turn aggregate up to the overall inflation rate. The aggregate inflation volatility closely tracks the time path of this covariation, which is seen to be driven primarily by the variances of common shocks shared by all products, and by the covariances between idiosyncratic product-level shocks. We formulate a forecasting system that comprises of models for mean inflation rate and its variance, and exploit the index structure of the aggregate inflation rate using the HTS framework. Using a dynamic model selection approach to forecasting, we obtain forecasts that are between 9 and 155 % more accurate than a SARIMA-GARCH(1,1) for the aggregate inflation volatility. Chapter 3 is on improving forecasts using forecast combinations. The paper documents the software implementation of the open source R package for forecast combination that we coded and published on the official R package depository, CRAN. The GeomComb package is the only R package that covers a wide range of different popular forecast combination methods. We implement techniques from 3 broad categories: (a) simple non-parametric methods, (b) regression-based methods, and (c) geometric (eigenvector) methods, allowing for static or dynamic estimation of each approach. Using S3 classes/methods in R, the package provides a user-friendly environment for applied forecasting, implementing solutions for typical issues related to forecast combination (multicollinearity, missing values, etc.), criterion-based optimisation for several parametric methods, and post-fit functions to rationalise and visualise estimation results. The package has been listed in the official R Task Views for Time Series Analysis and for Official Statistics. The brief empirical application in the paper illustrates the package’s functionality by estimating forecast combination techniques for monthly UK electricity supply. Chapter 4 introduces HTS forecasting and forecast combination to a healthcare staffing context. A slowdown of healthcare budget growth in the UK that does not keep pace with growth of demand for hospital services made efficient cost planning increasingly crucial for hospitals, in particular for staff which accounts for more than half of hospitals’ expenses. This is facilitated by accurate forecasts of patient census and churn. Using a dataset of more than 3 million observations from a large UK hospital, we show how HTS forecasting can improve forecast accuracy by using information at different levels of the hospital hierarchy (aggregate, emergency/electives, divisions, specialties), compared to the naïve benchmark: the seasonal random walk model applied to the aggregate. We show that forecast combination can improve accuracy even more in some cases, and leads to lower forecast error variance (decreasing forecasting risk). We propose a comprehensive parametric approach to use forecasts in a nurse staffing model that has the aim of minimising cost while satisfying that the care requirements (e.g. nurse hours per patient day thresholds) are met.
|
159 |
Automatic model selection for forecasting Brazilian stock returnsCunha, Ronan 27 March 2015 (has links)
Submitted by Ronan Cunha (cunha.ronan@gmail.com) on 2015-04-14T13:26:14Z
No. of bitstreams: 1
Dissertação - Ronan Cunha.pdf: 661334 bytes, checksum: dbb8ee6517fa128ea12981554ad549ad (MD5) / Rejected by Vera Lúcia Mourão (vera.mourao@fgv.br), reason: Prezado Ronan,
preciso que você faça algumas correções em seu arquivo:
Na list of tables, list of figures, contentes e no número de referência (no texto) aparece uma borda vermelha, é necessário retirar.
existe também uma página em branco, logo após essas lista, também tem que excluir.
att.
Vera
on 2015-04-14T18:00:45Z (GMT) / Submitted by Ronan Cunha (cunha.ronan@gmail.com) on 2015-04-14T20:06:50Z
No. of bitstreams: 1
Dissertação_ protocolo_final_Ronan Cunha.pdf: 659869 bytes, checksum: 55a00ce6030a561fa8370b341397ee03 (MD5) / Approved for entry into archive by Vera Lúcia Mourão (vera.mourao@fgv.br) on 2015-04-14T20:15:51Z (GMT) No. of bitstreams: 1
Dissertação_ protocolo_final_Ronan Cunha.pdf: 659869 bytes, checksum: 55a00ce6030a561fa8370b341397ee03 (MD5) / Made available in DSpace on 2015-04-15T12:32:12Z (GMT). No. of bitstreams: 1
Dissertação_ protocolo_final_Ronan Cunha.pdf: 659869 bytes, checksum: 55a00ce6030a561fa8370b341397ee03 (MD5)
Previous issue date: 2015-03-27 / This study aims to contribute on the forecasting literature in stock return for emerging markets. We use Autometrics to select relevant predictors among macroeconomic, microeconomic and technical variables. We develop predictive models for the Brazilian market premium, measured as the excess return over Selic interest rate, Itaú SA, Itaú-Unibanco and Bradesco stock returns. We nd that for the market premium, an ADL with error correction is able to outperform the benchmarks in terms of economic performance. For individual stock returns, there is a trade o between statistical properties and out-of-sample performance of the model.
|
160 |
Variants of compound models and their application to citation analysisLow, Wan Jing January 2017 (has links)
This thesis develops two variant statistical models for count data based upon compound models for contexts when the counts may be viewed as derived from two generations, which may or may not be independent. Unlike standard compound models, the variants model the sum of both generations. We consider cases where both generations are negative binomial or one is Poisson and the other is negative binomial. The first variant, denoted SVA, follows a zero restriction, where a zero in the first generation will automatically be followed by a zero in the second generation. The second variant, denoted SVB, is a convolution model that does not possess this zero restriction. The main properties of the SVA and SVB models are outlined and compared with standard compound models. The results show that the SVA distributions are similar to standard compound distributions for some fixed parameters. Comparisons of SVA, Poisson hurdle, negative binomial hurdle and their zero-inflated counterpart using simulated SVA data indicate that different models can give similar results, as the generating models are not always selected as the best fitting. This thesis focuses on the use of the variant models to model citation counts. We show that the SVA models are more suitable for modelling citation data than other previously used models such as the negative binomial model. Moreover, the application of SVA and SVB models may be used to describe the citation process. This thesis also explores model selection techniques based on log-likelihood methods, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The suitability of the models is also assessed using two diagrammatic methods, randomised quantile residual plots and Christmas tree plots. The Christmas tree plots clearly illustrate whether the observed data are within fluctuation bounds under the fitted model, but the randomised quantile residual plots utilise the cumulative distribution, and hence are insensitive to individual data values. Both plots show the presence of citation counts that are larger than expected under the fitted model in the data sets.
|
Page generated in 0.1207 seconds