Global ETD Search

81	Design of a census coverage survey and its use in the estimation and adjustment of census underenumeration : a contribution towards creating a one-number census in the UK in 2001 Brown, James John January 2001 (has links) No description available. 519.5 HA Statistics
82	Understanding and dealing with unit nonresponse during and post survey data collection D'Arrigo, Julia January 2011 (has links) Nonresponse in sample surveys is a longstanding concern among social researchers and survey methodologists. In addition to potential biases in point estimates, nonresponse can result in inflation of the variances of such estimates. This thesis focuses on understanding and dealing with unit nonresponse in sample surveys during and post data collection. In particular it looks at modelling the process leading to nonresponse using call record data; developing weighting adjustments for clustered nonresponse; and investigating variance estimation methods in the presence of nonresponse. During data collection, effective interviewer calling behaviours are critical in achieving contact and subsequent cooperation. Recent developments in the survey data collection process have led to the collection of so-called paradata, which greatly extend the basic information on interviewer calls. The first part of the thesis develops multilevel models based on a particular type of paradata, call record data and interviewer observations, to predict the likelihood of contact and cooperation conditioning on household and interviewer characteristics. The research is based on the UK 2001 Census Link Study dataset. The results have implications for survey practice and, among others, inform the design of effective interviewer calling strategies, including responsive survey designs. Post-survey estimation methods to adjust and account for nonresponse, such as weighting methods, include inverse probability weighting and generalized raking estimation. The second part of the thesis investigates alternative inverse probability weighted estimators for clustered nonresponse through a simulation study. Results from an empirical application using data from the Expenditure and Food Survey 2001 are presented. It also discusses three forms of generalized raking estimator in the presence of nonresponse. Weighting methods might result in increased variability in the weights and thereby lower the precision of the survey estimates. This thesis explores alternative forms of linearization and replication variance estimators for generalized raking estimators under nonresponse that allow for variation in the weights. 300 HA Statistics
83	Robust and optimal experimental designs for non-linear models in chemical kinetics Martin, Kieran James January 2012 (has links) This thesis considers the problem of selecting robust and optimal experimental designs for accurately estimating the unknown mean parameters of non-linear models in chemical kinetics. The design selection criteria used are local, Bayesian and maximin D-optimality. The thesis focuses on an example provided by GlaxoSmithKline which concerns a chemical reaction where the temperature at which runs of the reaction are conducted and the times at which observations can be made during the reaction are to be varied. Optimal designs for non-linear models are usually dependent on the unknown values of the model parameters. This problem may be overcome by finding designs whose performance is robust to a range of values for each model parameter. Optimal designs are investigated for situations when observations are independent and when correlation exists between observations made on the same run of the process; different forms and strengths of correlation between observations are considered. Designs robust to the correlation and mean parameters are found and assessed via both theoretical measures and a large simulation study which compares the designs found to alternatives currently used in practice. Designs for the situation when the error variables have non-constant variance are obtained by use of a model formed via a power transformation on the response and its expected value. Designs robust to the value of the transformation parameter as well as the correlation and mean parameters are found and assessed. Analytic results are established for obtaining locally D-optimal designs when the model is assumed to have independent observations and the response and expected response have been transformed to remove heteroscedasticity. Where analytic results are not available, numerical methods are used to obtain optimal designs. The differing costs of a run of a reaction and of making an observation on a run are incorporated into design selection. A criterion which includes the cost of the time taken to run a reaction in an experiment is formulated and used to find designs. 519.5 HA Statistics
84	Robust methods in univariate time series models Whitehouse, Emily J. January 2017 (has links) The size and power properties of a hypothesis test typically depend on a series of factors which are unobservable in practice. A branch of the econometric literature therefore considers robust testing methodologies that achieve good size-control and competitive power across a range of differing circumstances. In this thesis we discuss robust tests in three areas of time series econometrics: detection of explosive processes, unit root testing against nonlinear alternatives, and forecast evaluation in small samples. Recent research has proposed a method of detecting explosive processes that is based on forward recursions of OLS, right-tailed, Dickey-Fuller [DF] unit root tests. In Chapter 2 an alternative approach using GLS DF tests is considered. We derive limiting distributions for both mean-invariant and trend-invariant versions of OLS and GLS variants of the Phillips, Wu and Yu (2011) [PWY] test statistic under a temporary, locally explosive alternative. These limits are dependent on both the value of the initial condition and the start and end points of the temporary explosive regime. Local asymptotic power simulations show that a GLS version of the PWY statistic offers superior power when a large proportion of the data is explosive, but that the OLS approach is preferred for explosive periods of short duration. These power differences are magnified by the presence of an asymptotically non-negligible initial condition. We propose a union of rejections procedure that capitalises on the respective power advantages of both OLS and GLS-based approaches. This procedure achieves power close to the effective envelope provided by the two individual PWY tests across all settings of the initial condition and length of the explosive period considered in this chapter. We show that these results are also robust to the point in the sample at which the temporary explosive regime occurs. An application of the union procedure to NASDAQ daily prices confirms the empirical value of this testing strategy. Chapter 3 examines the local power of unit root tests against globally stationary exponential smooth transition autoregressive [ESTAR] alternatives under two sources of uncertainty: the degree of nonlinearity in the ESTAR model, and the presence of a linear deterministic trend. First, we show that the Kapetanios, Shin and Snell (2003) [KSS] test for nonlinear stationarity has local asymptotic power gains over standard Dickey-Fuller [DF] tests for certain degrees of nonlinearity in the ESTAR model, but that for other degrees of nonlinearity, the linear DF test has superior power. Second, we derive limiting distributions of demeaned, and demeaned and detrended KSS and DF tests under a local ESTAR alternative when a local trend is present in the DGP. We show that the power of the demeaned tests outperforms that of the detrended tests when no trend is present in the DGP, but deteriorates as the magnitude of the trend increases. We propose a union of rejections testing procedure that combines all four individual tests and show that this captures most of the power available from the individual tests across different degrees of nonlinearity and trend magnitudes. We also show that incorporating a trend detection procedure into this union testing strategy can result in higher power when a large trend is present in the DGP. An empirical application of our proposed union of rejections procedures to energy consumption data in 180 countries shows the value of these procedures in practical cases. In Chapter 4 we show that when computing standard Diebold-Mariano-type tests for equal forecast accuracy and forecast encompassing, the long-run variance can frequently be negative when dealing with multi-step-ahead predictions in small, but empirically relevant, sample sizes. We subsequently consider a number of alternative approaches to dealing with this problem, including direct inference in the problem cases and use of long-run variance estimators that guarantee positivity. The finite sample size and power of the different approaches are evaluated using an extensive Monte Carlo simulation exercise. Overall, for multi-step-ahead forecasts, we find that the recently proposed Coroneo and Iacone (2015) test, which is based on a weighted periodogram long-run variance estimator, offers the best finite sample size and power performance. 519.5 HA Statistics
85	Autocorrelation-based factor analysis and nonlinear shrinkage estimation of large integrated covariance matrix Hu, Qilin January 2016 (has links) The first part of my thesis deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. we allow the dimension of time series N to be as large as, or even larger than the sample size of the time series. The estimation of the factor loading matrix and subsequently the factors are done via an eigenanalysis on a non-negative definite matrix constructed from autocorrelation matrix. The method is dubbed as AFA. We give explicit comparison of the convergence rates between AFA with PCA. We show that AFA possesses the advantage over PCA when dealing with small dimension time series for both one step and two step estimations, while at large dimension, the performance is still comparable. The second part of my thesis considers large integrated covariance matrix estimation. While the use of intra-day price data increases the sample size substantially for asset allocation, the usual realized covariance matrix still suffers from bias contributed from the extreme eigenvalues when the number of assets is large. We introduce a novel nonlinear shrinkage estimator for the integrated volatility matrix which shrinks the extreme eigenvalues of a realized covariance matrix back to acceptable level, and enjoys a certain asymptotic efficiency at the same time, all at a high dimensional setting where the number of assets can have the same order as the number of data points. Compared to a time-variation adjusted realized covariance estimator and the usual realized covariance matrix, our estimator demonstrates favorable performance in both simulations and a real data analysis in portfolio allocation. This include a novel maximum exposure bound and an actual risk bound when our estimator is used in constructing the minimum variance portfolio. 519.5 HA Statistics
86	Detecting semi-plausible response patterns Terzi, Tayfun January 2017 (has links) New challenges concerning bias from measurement error have arisen due to the increasing use of paid participants: semi-plausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments/questionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently overcome objective plausibility checks and process other items only superficially, if at all. Thus, those participants produce not only useless but detrimental data, because they attempt to conceal their malpractice. The potential consequences are biased estimation and misleading statistical inference. The statistical nature of specific invalid response strategies and applications are discussed, effectually deriving a meta-theory of response strategy, process, and plausibility. A new test measure to detect SpRPs was developed to accommodate data of survey type, without the need of a priori implemented mechanisms. Under a latent class latent variable framework, the effectiveness of the test measure was empirically and theoretically evaluated. The empirical evaluation is based on an experimental and online questionnaire study. These studies operate under a very well established psychological framework on five stable personality traits. The measure was theoretically evaluated through simulations. It was concluded that the measure is successfully discriminating between valid responders and invalid responders under certain conditions. Indicators for optimal settings of high discriminatory power were identified and limitations discussed. 519.5 HA Statistics
87	Bayesian complementary clustering, MCMC and Anglo-Saxon placenames Zanella, Giacomo January 2015 (has links) Common cluster models for multi-type point processes model the aggregation of points of the same type. In complete contrast, in the study of Anglo-Saxon settlements it is hypothesized that administrative clusters involving complementary names tend to appear. We investigate the evidence for such a hypothesis by developing a Bayesian Random Partition Model based on clusters formed by points of different types (complementary clustering). As a result we obtain an intractable posterior distribution on the space of matchings contained in a k-partite hypergraph. We use the Metropolis-Hastings (MH) algorithm to sample from such a distribution. We consider the problem of what is the optimal, informed MH proposal distribution given a fixed set of allowed moves. To answer such a question we de ne the notion of balanced proposals and we prove that, under some assumptions, such proposals are maximal in the Peskun sense. Using such ideas we obtain substantial mixing improvements compared to other choices found in the literature. Simulated Tempering techniques can be used to overcome multimodality and a multiple proposal scheme is developed to allow for parallel programming. Finally, we discuss results arising from the careful use of convergence diagnostic techniques. This allows us to study a dataset including locations and placenames of 1316 Anglo-Saxon settlements dated around 750-850 AD. Without strong prior knowledge, the model allows for explicit estimation of the number of clusters, the average intra-cluster dispersion and the level of interaction among placenames. The results support the hypothesis of organization of settlements into administrative clusters based on complementary names. 519.5 HA Statistics
88	Essays in high-dimensional nonlinear time series analysis Habibnia, Ali January 2016 (has links) In this thesis, I study high-dimensional nonlinear time series analysis, and its applications in financial forecasting and identifying risk in highly interconnected financial networks. The first chapter is devoted to the testing for nonlinearity in financial time series. I present a tentative classification of the various linearity tests that have been proposed in the literature. Then I investigate nonlinear features of real financial series to determine if the data justify the use of nonlinear techniques, such as those inspired by machine learning theories. In Chapter 3 & 5, I develop forecasting strategies with a high-dimensional panel of predictors while considering nonlinear dynamics. Combining these two elements is a developing area of research. In the third chapter, I propose a nonlinear generalization of the statistical factor models. As a first step, factor estimation, I employ an auto-associative neural network to estimate nonlinear factors from predictors. In the second step, forecasting equation, I apply a nonlinear function -feedforward neural networkon estimated factors for prediction. I show that these features can go beyond covariance analysis and enhance forecast accuracy. I apply this approach to forecast equity returns, and show that capturing nonlinear dynamics between equities significantly improves the quality of forecasts over current univariate and multivariate factor models. In Chapter 5, I propose a high-dimensional learning based on a shrinkage estimation of a backpropagation algorithm for skip-layer neural networks. This thesis emphasizes that linear models can be represented as special cases of these two aforementioned models, which basically means that if there is no nonlinearity between series, the proposed models will reduce to a linear model. This thesis also includes a chapter (chapter 4, with Negar Kiyavash and Seyedjalal Etesami), which in this chapter, we propose a new approach for identifying and measuring systemic risk in financial networks by introducing a nonlinearly modified Granger-causality network based on directed information graphs. The suggested method allows for nonlinearity and has predictive power over future economic activity through a time-varying network of interconnections. We apply the method to the daily returns of U.S. financial Institutions including banks, brokers and insurance companiesto identifythe level of systemic risk inthe financial sector and the contribution of each financial institution. 519.5 HA Statistics
89	Statistical data mining for Sina Weibo, a Chinese micro-blog : sentiment modelling and randomness reduction for topic modelling Cheng, Wenqian January 2017 (has links) Before the arrival of modern information and communication technology, it was not easy to capture people’s thoughts and sentiments; however, the development of statistical data mining techniques and the prevalence of mass social media provide opportunities to capture those trends. Among all types of social media, micro-blogs make use of the word limit of 140 characters to force users to get straight to thepoint, thus making the posts brief but content-rich resources for investigation. The data mining object of this thesis is Weibo, the most popular Chinese micro-blog. In the first part of the thesis, we attempt to perform various exploratory data mining on Weibo. After the literature review of micro-blogs, the initial steps of data collection and data pre-processing are introduced. This is followed by analysis of the time of the posts, analysis between intensity of the post and share price, term frequency and cluster analysis. Secondly, we conduct time series modelling on the sentiment of Weibo posts. Considering the properties of Weibo sentiment, we mainly adopt the framework of ARMA mean with GARCH type conditional variance to fit the patterns. Other distinct models are also considered for negative sentiment for its complexity. Model selection and validation are introduced to verify the fitted models. Thirdly, Latent Dirichlet Allocation (LDA) is explained in depth as a way to discover topics from large sets of textual data. The major contribution is creating a Randomness Reduction Algorithm applied to post-process the output of topic models, filtering out the insignificant topics and utilising topic distributions to find out the most persistent topics. At the end of this chapter, evidence of the effectiveness of the Randomness Reduction is presented from empirical studies. The topic classification and evolution is also unveiled. 006.3 HA Statistics
90	Quantification of air quality in space and time and its effects on health Huang, Guowen January 2016 (has links) The long-term adverse effects on health associated with air pollution exposure can be estimated using either cohort or spatio-temporal ecological designs. In a cohort study, the health status of a cohort of people are assessed periodically over a number of years, and then related to estimated ambient pollution concentrations in the cities in which they live. However, such cohort studies are expensive and time consuming to implement, due to the long-term follow up required for the cohort. Therefore, spatio-temporal ecological studies are also being used to estimate the long-term health effects of air pollution as they are easy to implement due to the routine availability of the required data. Spatio-temporal ecological studies estimate the health impact of air pollution by utilising geographical and temporal contrasts in air pollution and disease risk across $n$ contiguous small-areas, such as census tracts or electoral wards, for multiple time periods. The disease data are counts of the numbers of disease cases occurring in each areal unit and time period, and thus Poisson log-linear models are typically used for the analysis. The linear predictor includes pollutant concentrations and known confounders such as socio-economic deprivation. However, as the disease data typically contain residual spatial or spatio-temporal autocorrelation after the covariate effects have been accounted for, these known covariates are augmented by a set of random effects. One key problem in these studies is estimating spatially representative pollution concentrations in each areal which are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model. The aim of this thesis is to investigate the health effects of long-term exposure to Nitrogen Dioxide (NO2) and Particular matter (PM10) in mainland Scotland, UK. In order to have an initial impression about the air pollution health effects in mainland Scotland, chapter 3 presents a standard epidemiological study using a benchmark method. The remaining main chapters (4, 5, 6) cover the main methodological focus in this thesis which has been threefold: (i) how to better estimate pollution by developing a multivariate spatio-temporal fusion model that relates monitored and modelled pollution data over space, time and pollutant; (ii) how to simultaneously estimate the joint effects of multiple pollutants; and (iii) how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. Specifically, chapters 4 and 5 are developed to achieve (i), while chapter 6 focuses on (ii) and (iii). In chapter 4, I propose an integrated model for estimating the long-term health effects of NO2, that fuses modelled and measured pollution data to provide improved predictions of areal level pollution concentrations and hence health effects. The air pollution fusion model proposed is a Bayesian space-time linear regression model for relating the measured concentrations to the modelled concentrations for a single pollutant, whilst allowing for additional covariate information such as site type (e.g. roadside, rural, etc) and temperature. However, it is known that some pollutants might be correlated because they may be generated by common processes or be driven by similar factors such as meteorology. The correlation between pollutants can help to predict one pollutant by borrowing strength from the others. Therefore, in chapter 5, I propose a multi-pollutant model which is a multivariate spatio-temporal fusion model that extends the single pollutant model in chapter 4, which relates monitored and modelled pollution data over space, time and pollutant to predict pollution across mainland Scotland. Considering that we are exposed to multiple pollutants simultaneously because the air we breathe contains a complex mixture of particle and gas phase pollutants, the health effects of exposure to multiple pollutants have been investigated in chapter 6. Therefore, this is a natural extension to the single pollutant health effects in chapter 4. Given NO2 and PM10 are highly correlated (multicollinearity issue) in my data, I first propose a temporally-varying linear model to regress one pollutant (e.g. NO2) against another (e.g. PM10) and then use the residuals in the disease model as well as PM10, thus investigating the health effects of exposure to both pollutants simultaneously. Another issue considered in chapter 6 is to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. There are in total four approaches being developed to adjust the exposure uncertainty. Finally, chapter 7 summarises the work contained within this thesis and discusses the implications for future research. 363.739 HA Statistics

Search results