Global ETD Search

81	Robust methods in univariate time series models Whitehouse, Emily J. January 2017 (has links) The size and power properties of a hypothesis test typically depend on a series of factors which are unobservable in practice. A branch of the econometric literature therefore considers robust testing methodologies that achieve good size-control and competitive power across a range of differing circumstances. In this thesis we discuss robust tests in three areas of time series econometrics: detection of explosive processes, unit root testing against nonlinear alternatives, and forecast evaluation in small samples. Recent research has proposed a method of detecting explosive processes that is based on forward recursions of OLS, right-tailed, Dickey-Fuller [DF] unit root tests. In Chapter 2 an alternative approach using GLS DF tests is considered. We derive limiting distributions for both mean-invariant and trend-invariant versions of OLS and GLS variants of the Phillips, Wu and Yu (2011) [PWY] test statistic under a temporary, locally explosive alternative. These limits are dependent on both the value of the initial condition and the start and end points of the temporary explosive regime. Local asymptotic power simulations show that a GLS version of the PWY statistic offers superior power when a large proportion of the data is explosive, but that the OLS approach is preferred for explosive periods of short duration. These power differences are magnified by the presence of an asymptotically non-negligible initial condition. We propose a union of rejections procedure that capitalises on the respective power advantages of both OLS and GLS-based approaches. This procedure achieves power close to the effective envelope provided by the two individual PWY tests across all settings of the initial condition and length of the explosive period considered in this chapter. We show that these results are also robust to the point in the sample at which the temporary explosive regime occurs. An application of the union procedure to NASDAQ daily prices confirms the empirical value of this testing strategy. Chapter 3 examines the local power of unit root tests against globally stationary exponential smooth transition autoregressive [ESTAR] alternatives under two sources of uncertainty: the degree of nonlinearity in the ESTAR model, and the presence of a linear deterministic trend. First, we show that the Kapetanios, Shin and Snell (2003) [KSS] test for nonlinear stationarity has local asymptotic power gains over standard Dickey-Fuller [DF] tests for certain degrees of nonlinearity in the ESTAR model, but that for other degrees of nonlinearity, the linear DF test has superior power. Second, we derive limiting distributions of demeaned, and demeaned and detrended KSS and DF tests under a local ESTAR alternative when a local trend is present in the DGP. We show that the power of the demeaned tests outperforms that of the detrended tests when no trend is present in the DGP, but deteriorates as the magnitude of the trend increases. We propose a union of rejections testing procedure that combines all four individual tests and show that this captures most of the power available from the individual tests across different degrees of nonlinearity and trend magnitudes. We also show that incorporating a trend detection procedure into this union testing strategy can result in higher power when a large trend is present in the DGP. An empirical application of our proposed union of rejections procedures to energy consumption data in 180 countries shows the value of these procedures in practical cases. In Chapter 4 we show that when computing standard Diebold-Mariano-type tests for equal forecast accuracy and forecast encompassing, the long-run variance can frequently be negative when dealing with multi-step-ahead predictions in small, but empirically relevant, sample sizes. We subsequently consider a number of alternative approaches to dealing with this problem, including direct inference in the problem cases and use of long-run variance estimators that guarantee positivity. The finite sample size and power of the different approaches are evaluated using an extensive Monte Carlo simulation exercise. Overall, for multi-step-ahead forecasts, we find that the recently proposed Coroneo and Iacone (2015) test, which is based on a weighted periodogram long-run variance estimator, offers the best finite sample size and power performance. 519.5 HA Statistics
82	Autocorrelation-based factor analysis and nonlinear shrinkage estimation of large integrated covariance matrix Hu, Qilin January 2016 (has links) The first part of my thesis deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. we allow the dimension of time series N to be as large as, or even larger than the sample size of the time series. The estimation of the factor loading matrix and subsequently the factors are done via an eigenanalysis on a non-negative definite matrix constructed from autocorrelation matrix. The method is dubbed as AFA. We give explicit comparison of the convergence rates between AFA with PCA. We show that AFA possesses the advantage over PCA when dealing with small dimension time series for both one step and two step estimations, while at large dimension, the performance is still comparable. The second part of my thesis considers large integrated covariance matrix estimation. While the use of intra-day price data increases the sample size substantially for asset allocation, the usual realized covariance matrix still suffers from bias contributed from the extreme eigenvalues when the number of assets is large. We introduce a novel nonlinear shrinkage estimator for the integrated volatility matrix which shrinks the extreme eigenvalues of a realized covariance matrix back to acceptable level, and enjoys a certain asymptotic efficiency at the same time, all at a high dimensional setting where the number of assets can have the same order as the number of data points. Compared to a time-variation adjusted realized covariance estimator and the usual realized covariance matrix, our estimator demonstrates favorable performance in both simulations and a real data analysis in portfolio allocation. This include a novel maximum exposure bound and an actual risk bound when our estimator is used in constructing the minimum variance portfolio. 519.5 HA Statistics
83	Detecting semi-plausible response patterns Terzi, Tayfun January 2017 (has links) New challenges concerning bias from measurement error have arisen due to the increasing use of paid participants: semi-plausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments/questionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently overcome objective plausibility checks and process other items only superficially, if at all. Thus, those participants produce not only useless but detrimental data, because they attempt to conceal their malpractice. The potential consequences are biased estimation and misleading statistical inference. The statistical nature of specific invalid response strategies and applications are discussed, effectually deriving a meta-theory of response strategy, process, and plausibility. A new test measure to detect SpRPs was developed to accommodate data of survey type, without the need of a priori implemented mechanisms. Under a latent class latent variable framework, the effectiveness of the test measure was empirically and theoretically evaluated. The empirical evaluation is based on an experimental and online questionnaire study. These studies operate under a very well established psychological framework on five stable personality traits. The measure was theoretically evaluated through simulations. It was concluded that the measure is successfully discriminating between valid responders and invalid responders under certain conditions. Indicators for optimal settings of high discriminatory power were identified and limitations discussed. 519.5 HA Statistics
84	Bayesian complementary clustering, MCMC and Anglo-Saxon placenames Zanella, Giacomo January 2015 (has links) Common cluster models for multi-type point processes model the aggregation of points of the same type. In complete contrast, in the study of Anglo-Saxon settlements it is hypothesized that administrative clusters involving complementary names tend to appear. We investigate the evidence for such a hypothesis by developing a Bayesian Random Partition Model based on clusters formed by points of different types (complementary clustering). As a result we obtain an intractable posterior distribution on the space of matchings contained in a k-partite hypergraph. We use the Metropolis-Hastings (MH) algorithm to sample from such a distribution. We consider the problem of what is the optimal, informed MH proposal distribution given a fixed set of allowed moves. To answer such a question we de ne the notion of balanced proposals and we prove that, under some assumptions, such proposals are maximal in the Peskun sense. Using such ideas we obtain substantial mixing improvements compared to other choices found in the literature. Simulated Tempering techniques can be used to overcome multimodality and a multiple proposal scheme is developed to allow for parallel programming. Finally, we discuss results arising from the careful use of convergence diagnostic techniques. This allows us to study a dataset including locations and placenames of 1316 Anglo-Saxon settlements dated around 750-850 AD. Without strong prior knowledge, the model allows for explicit estimation of the number of clusters, the average intra-cluster dispersion and the level of interaction among placenames. The results support the hypothesis of organization of settlements into administrative clusters based on complementary names. 519.5 HA Statistics
85	Essays in high-dimensional nonlinear time series analysis Habibnia, Ali January 2016 (has links) In this thesis, I study high-dimensional nonlinear time series analysis, and its applications in financial forecasting and identifying risk in highly interconnected financial networks. The first chapter is devoted to the testing for nonlinearity in financial time series. I present a tentative classification of the various linearity tests that have been proposed in the literature. Then I investigate nonlinear features of real financial series to determine if the data justify the use of nonlinear techniques, such as those inspired by machine learning theories. In Chapter 3 & 5, I develop forecasting strategies with a high-dimensional panel of predictors while considering nonlinear dynamics. Combining these two elements is a developing area of research. In the third chapter, I propose a nonlinear generalization of the statistical factor models. As a first step, factor estimation, I employ an auto-associative neural network to estimate nonlinear factors from predictors. In the second step, forecasting equation, I apply a nonlinear function -feedforward neural networkon estimated factors for prediction. I show that these features can go beyond covariance analysis and enhance forecast accuracy. I apply this approach to forecast equity returns, and show that capturing nonlinear dynamics between equities significantly improves the quality of forecasts over current univariate and multivariate factor models. In Chapter 5, I propose a high-dimensional learning based on a shrinkage estimation of a backpropagation algorithm for skip-layer neural networks. This thesis emphasizes that linear models can be represented as special cases of these two aforementioned models, which basically means that if there is no nonlinearity between series, the proposed models will reduce to a linear model. This thesis also includes a chapter (chapter 4, with Negar Kiyavash and Seyedjalal Etesami), which in this chapter, we propose a new approach for identifying and measuring systemic risk in financial networks by introducing a nonlinearly modified Granger-causality network based on directed information graphs. The suggested method allows for nonlinearity and has predictive power over future economic activity through a time-varying network of interconnections. We apply the method to the daily returns of U.S. financial Institutions including banks, brokers and insurance companiesto identifythe level of systemic risk inthe financial sector and the contribution of each financial institution. 519.5 HA Statistics
86	Statistical data mining for Sina Weibo, a Chinese micro-blog : sentiment modelling and randomness reduction for topic modelling Cheng, Wenqian January 2017 (has links) Before the arrival of modern information and communication technology, it was not easy to capture people’s thoughts and sentiments; however, the development of statistical data mining techniques and the prevalence of mass social media provide opportunities to capture those trends. Among all types of social media, micro-blogs make use of the word limit of 140 characters to force users to get straight to thepoint, thus making the posts brief but content-rich resources for investigation. The data mining object of this thesis is Weibo, the most popular Chinese micro-blog. In the first part of the thesis, we attempt to perform various exploratory data mining on Weibo. After the literature review of micro-blogs, the initial steps of data collection and data pre-processing are introduced. This is followed by analysis of the time of the posts, analysis between intensity of the post and share price, term frequency and cluster analysis. Secondly, we conduct time series modelling on the sentiment of Weibo posts. Considering the properties of Weibo sentiment, we mainly adopt the framework of ARMA mean with GARCH type conditional variance to fit the patterns. Other distinct models are also considered for negative sentiment for its complexity. Model selection and validation are introduced to verify the fitted models. Thirdly, Latent Dirichlet Allocation (LDA) is explained in depth as a way to discover topics from large sets of textual data. The major contribution is creating a Randomness Reduction Algorithm applied to post-process the output of topic models, filtering out the insignificant topics and utilising topic distributions to find out the most persistent topics. At the end of this chapter, evidence of the effectiveness of the Randomness Reduction is presented from empirical studies. The topic classification and evolution is also unveiled. 006.3 HA Statistics
87	Quantification of air quality in space and time and its effects on health Huang, Guowen January 2016 (has links) The long-term adverse effects on health associated with air pollution exposure can be estimated using either cohort or spatio-temporal ecological designs. In a cohort study, the health status of a cohort of people are assessed periodically over a number of years, and then related to estimated ambient pollution concentrations in the cities in which they live. However, such cohort studies are expensive and time consuming to implement, due to the long-term follow up required for the cohort. Therefore, spatio-temporal ecological studies are also being used to estimate the long-term health effects of air pollution as they are easy to implement due to the routine availability of the required data. Spatio-temporal ecological studies estimate the health impact of air pollution by utilising geographical and temporal contrasts in air pollution and disease risk across $n$ contiguous small-areas, such as census tracts or electoral wards, for multiple time periods. The disease data are counts of the numbers of disease cases occurring in each areal unit and time period, and thus Poisson log-linear models are typically used for the analysis. The linear predictor includes pollutant concentrations and known confounders such as socio-economic deprivation. However, as the disease data typically contain residual spatial or spatio-temporal autocorrelation after the covariate effects have been accounted for, these known covariates are augmented by a set of random effects. One key problem in these studies is estimating spatially representative pollution concentrations in each areal which are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model. The aim of this thesis is to investigate the health effects of long-term exposure to Nitrogen Dioxide (NO2) and Particular matter (PM10) in mainland Scotland, UK. In order to have an initial impression about the air pollution health effects in mainland Scotland, chapter 3 presents a standard epidemiological study using a benchmark method. The remaining main chapters (4, 5, 6) cover the main methodological focus in this thesis which has been threefold: (i) how to better estimate pollution by developing a multivariate spatio-temporal fusion model that relates monitored and modelled pollution data over space, time and pollutant; (ii) how to simultaneously estimate the joint effects of multiple pollutants; and (iii) how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. Specifically, chapters 4 and 5 are developed to achieve (i), while chapter 6 focuses on (ii) and (iii). In chapter 4, I propose an integrated model for estimating the long-term health effects of NO2, that fuses modelled and measured pollution data to provide improved predictions of areal level pollution concentrations and hence health effects. The air pollution fusion model proposed is a Bayesian space-time linear regression model for relating the measured concentrations to the modelled concentrations for a single pollutant, whilst allowing for additional covariate information such as site type (e.g. roadside, rural, etc) and temperature. However, it is known that some pollutants might be correlated because they may be generated by common processes or be driven by similar factors such as meteorology. The correlation between pollutants can help to predict one pollutant by borrowing strength from the others. Therefore, in chapter 5, I propose a multi-pollutant model which is a multivariate spatio-temporal fusion model that extends the single pollutant model in chapter 4, which relates monitored and modelled pollution data over space, time and pollutant to predict pollution across mainland Scotland. Considering that we are exposed to multiple pollutants simultaneously because the air we breathe contains a complex mixture of particle and gas phase pollutants, the health effects of exposure to multiple pollutants have been investigated in chapter 6. Therefore, this is a natural extension to the single pollutant health effects in chapter 4. Given NO2 and PM10 are highly correlated (multicollinearity issue) in my data, I first propose a temporally-varying linear model to regress one pollutant (e.g. NO2) against another (e.g. PM10) and then use the residuals in the disease model as well as PM10, thus investigating the health effects of exposure to both pollutants simultaneously. Another issue considered in chapter 6 is to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. There are in total four approaches being developed to adjust the exposure uncertainty. Finally, chapter 7 summarises the work contained within this thesis and discusses the implications for future research. 363.739 HA Statistics
88	Three aspects of mathematical models for asymmetric information in financial market Li, Cheng January 2016 (has links) The thesis consists of three parts. The first part studies the Glosten-Milgrom model [25] where the risky asset value admits an arbitrary discrete distribution. In contrast to existing results on insider model, the insiders optimal strategy in this model, if it exists, is not of feedback type. Therefore, a weak formulation of equilibrium is proposed. In this weak formulation, the inconspicuous trade theorem still holds, but the optimality for the insiders strategy is not enforced. However, the insider can employ some feedback strategies whose associated expected profit are close to the optimal value, when the order size is small. Moreover, this discrepancy converges to zero when the order size diminishes. The second part extends Peng’s monotone convergence result [37] to backward stochastic differential equations (BSDEs in short) driven by marked point processes. We apply this result to give a stochastic representation to the value function of the insiders problem in the previous part. The last part studies an optimal trading problem in limit order market with asymmetry information. The market consists of a strategic trader and a group of noisy traders. The strategic trader has private prediction on the fundamental value of a risk asset, and aims to maximise her expected profit. Both types of market participants are allowed to place market and limit orders. We aim to find a trading strategy for the strategic trader who uses both limit and market orders. This is formulated as a stochastic control problem that we characterise in terms of a HJB system. We also provide a numerical algorithm to obtain its solution and prove its convergence. Finally, we consider an example to illustrate the optimal trading strategy of the strategic trader. 332 HA Statistics
89	Estimation of covariance, correlation and precision matrices for high-dimensional data Huang, Na January 2016 (has links) The thesis concerns estimating large correlation and covariance matrices and their inverses. Two new methods are proposed. First, tilting-based methods are proposed to estimate the precision matrix of a p-dimensional random variable, X, when p is possibly much larger than the sample size n. Each 2 by 2 block indexed by (i, j) of the precision matrix can be estimated by the inversion of the pairwise sample conditional covariance matrix of Xi and Xj controlling for all the other variables. However, in the high dimensional setting, including too many or irrelevant controlling variables may distort the results. To determine the controlling subsets, the tilting technique is applied to measure the contribution of each remaining variable to the covariance matrix of Xi and Xj , and only puts the (hopefully) highly relevant remaining variables into the controlling subsets. Four types of tilting-based methods are introduced and the properties are demonstrated. The simulation results are presented under different scenarios for the underlying precision matrix. The second method NOVEL Integration of the Sample and Thresholded covariance estimators (NOVELIST) performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when p and n satisfy log p/n → 0. In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models. An automatic algorithm for NOVELIST is developed. Comprehensive applications and real data examples of NOVELIST are presented. Moreover, intensive real data applications of NOVELIST are presented. 310 HA Statistics
90	Describing size and shape changes in the human mandible from 9 to 15 years : comparison of elliptical Fourier function and Procrustes methods Easton, Valerie J. January 2000 (has links) In the past, there have been many attempts to capture the size and shape information inherent in complex irregular objects by numerical representation. There is much to be gained in a biological sense by numerical description of complex forms, like the craniofacial complex, in the field of dentistry. This thesis aims to review, utilise and build on past research methods in an attempt to describe the size and shape changes of a sample of human mandibles from the age of 9 through to 15 years. Specifically, two methods are considered and contrasted, the elliptical Fourier function and Procrustes analysis (including Bookstein co-ordinates). In chapter 1 the background and motivation for such an investigation is introduced, describing the need for a mathematical description of complex irregular forms with an emphasis on the importance of such models in dentistry with particular reference to the way in which the human mandible grows over time. The methodological and clinical issues of the problem are outlined, including a summary of up to date methods that have been used to capture size and shape information of a growing complex morphological form and an overview of the way in which the mandible grows. Both the so-called landmark dependent and landmark independent (boundary outline) methods are summarised. Whilst all the methods considered are not without constraint in describing the size and shape of complex forms, all have been seen in the past to be beneficial in some way in that they all model 'form' in one way or another at the very least. Chapter 2 then considers in more depth, what is fast becoming a much-promoted method of describing irregular forms, the elliptical Fourier function (EFF). The use of conventional Fourier methods, as well as the newer EFF method in describing size and shape changes is reviewed. A suite of programs that have been specially written to apply the EFF method in the description of complex irregular forms is introduced and an overview of the specific routines available in the package is given. The data sample available for investigation, which consists of a series of lateral head cephalograms (x-rays) from the BC Leighton Growth Study, is described in Chapter 3. The way in which a subset is selected following certain inclusion and exclusion criteria from the available x-rays is outlined. The way in which the mandibular data is then prepared for subsequent use in the EFF suite of programs, as well as with the method of Procrustes (and Bookstein co-ordinates) is also described in some detail. In Chapter 4, an error study is undertaken to investigate the reproducibility of the tracings of the sample of mandibular outlines prepared in the previous chapter. Both 11 within- and between-rater studies are looked at. The EFF method is then applied to the sample of tracings collected by one observer to explore any changes in size and shape that may occur as the mandible grows, concentrating on ages 9, 11, 13 and 15 years. As well as producing some very informative plots of the observed and predicted mandibular outlines, and centroid to boundary outline distances, the usefulness of the harmonic information available from the EFF procedure for numerically describing size and shape changes of a complex irregular form is investigated. Whether or not there are differences between males and females in the data sample, in terms of the size and / or shape of the mandible is also explored. Finally, the method of Procrustes analysis (and Bookstein co-ordinates) is described in more depth in Chapter 5. This particular method is also applied to the same sample of mandibular outlines in order to investigate its usefulness in describing size and shape changes of the human mandible from age 9 to 15 years. Shape variability within samples is also explored by way of principal components analysis. In addition, the method of thin plate splines (TPS) is applied in order to examine shape change between males and females. Similar observations were made about mandibular growth in the sample investigated using both the EFF and Procrustes (along with Bookstein co-ordinates) procedures. Overall, the mandible was observed to be 'growing' between ages 9 and 15 i.e. changing in both size and shape over a period of time. There was no difference in terms of the size and shape of the bone between males and females in the sample, for each age. Further, using Procrustes analysis (and Bookstein co-ordinates) there did not appear to be any association between the size and shape of the mandibular outlines in either the male or female samples, for all ages. In addition, investigating shape variability using Procrustes methods by way of principal components analysis, resulted in broadly similar patterns for males and females, as well as combined samples, and different age groups. It is concluded in Chapter 6 that the methods of elliptical Fourier function and Procrustes (and Bookstein co-ordinates) both provide a very useful framework in which to describe the size and shape of complex irregular forms like the mandible. Although both methods have their advantages and disadvantages, Procrustes (including the very useful method using Bookstein co-ordinates) is preferred for statistical purposes. 519 HA Statistics

Search results